This repository accompanies the research study entitled "Building Generative AI Agents with Deep Reinforcement Learning". The objective of this project is to evaluate the effectiveness of integrating deep reinforcement learning (DRL) with generative large language models (LLMs) for constructing autonomous agents capable of sequential, multimodal reasoning and decision-making in financial environments.
We empirically compare two agent architectures:
- A DRL-enhanced generative agent that combines policy learning with generative modeling.
- A static generative agent that operates without reinforcement-based optimization.
The agents are assessed in the context of stock market trading simulations using historical financial data spanning four decades.
This study investigates the following core research questions:
-
RQ1: How can deep reinforcement learning be effectively combined with generative large language models to construct autonomous agents capable of complex, multi-modal reasoning and decision-making?
-
RQ2: What architectural frameworks and parallelization techniques are most effective in accelerating the training of generative AI agents using DRL, without sacrificing sample efficiency or convergence stability?
-
RQ3: How can multi-agent reinforcement learning (MARL) frameworks be enhanced with generative capabilities to support cooperative, competitive, or hierarchical behavior in complex environments?
The methodology consists of the following major components:
-
Data Source: Historical stock market data from over 1,000 publicly listed firms across a 40-year period. Derived features include technical indicators, price volatility, and temporal embeddings.
-
DRL Agent Architecture: Implements a policy optimized using Proximal Policy Optimization (PPO), operating within a Gym-compatible trading environment. The reward function encodes portfolio returns adjusted for risk and transaction costs.
-
Static Agent Baseline: Uses the same input features but selects actions via a supervised learning model, trained to predict directional price movement without reinforcement signals.
-
Evaluation Metrics: Performance is assessed using cumulative return, Sharpe ratio, maximum drawdown, and, where applicable, action entropy and interpretability of generative outputs.
-
Implementation Strategy: Both agents are trained and evaluated in controlled simulation environments. Statistical comparisons are conducted over multiple seeds and market segments to ensure robustness.
This repository enables reproduction of the experiments described in the paper, including:
- Training both DRL-based and static agents under equivalent data and environmental conditions.
- Evaluating agent behavior in high-volatility, non-stationary time-series domains.
- Analyzing the comparative advantages of reinforcement-based adaptation in generative systems.
To run the experiments:
- Clone the repository.
- Create a Python virtual environment and install dependencies listed in
requirements.txt
. - Execute the training and evaluation scripts (
train.py
andevaluate.py
) with appropriate configuration files. - Results will be logged in designated output directories for statistical and visual analysis.
If you use this codebase or build upon this work, please cite the following:
@article{your2025generative,
title={Building Generative AI Agents with Deep Reinforcement Learning},
author={Aditya Saxena},
year={2025},
note={Manuscript, Toronto Metropolitan University}
}