A deep reinforcement learning project that uses Proximal Policy Optimization (PPO) to automate stock trading decisions. This project improves upon baseline PPO implementation with advanced features and optimizations.
This project implements an automated stock trading agent using PPO, a state-of-the-art reinforcement learning algorithm. The agent learns to make buy/sell/hold decisions by interacting with historical stock market data.
Course: Reinforcement Learning - Phase 2
Institution: FAST School of Computing
Team Members: Maaz Ud Din, Saamer Abbas, Sammar Kaleem, Ali Hassan
This project is based on the research paper:
"Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy"
Paper: https://arxiv.org/pdf/2511.12120
The baseline implementation used standard PPO from Stable-Baselines3 with:
- Basic PPO algorithm with fixed hyperparameters (ε = 0.2)
- Simple reward function based on portfolio value changes
- Daily OHLCV data with basic technical indicators
- Single environment training
- Standard neural network architecture
Phase 2 Focus: We enhanced this baseline with algorithmic improvements, advanced features, and training optimizations to achieve significantly better performance.
improved_ppo_trading.py- Enhanced PPO implementation with all improvementsbaseline_vs_improved_comparison.py- Script to compare baseline vs improved performancegradio_dashboard.py- Interactive web dashboard for visualizing resultscomparison_report.txt- Detailed performance comparison report
description_markdowns/- Detailed documentation, reports, and guidesgraphs_and_images/- Performance charts and visualizationssubmission/- Final submission files including research paper and report
- Adaptive Clipping Range: Dynamic epsilon decay (0.2 → 0.05) for better convergence
- Risk-Adjusted Rewards: Balances returns, volatility, and transaction costs
- Multi-Timeframe Features: Incorporates 5-min, 1-hour, and daily indicators
- Dynamic Entropy Coefficient: Improved exploration-exploitation balance
- Advanced Indicators: RSI, MACD, Bollinger Bands, ATR
- Robust Normalization: Handles outliers with running statistics
- Parallel Training: 8 concurrent environments for faster training
- Enhanced Architecture: Deeper neural networks [256, 256, 128] layers
- Vectorized environments (SubprocVecEnv)
- Orthogonal weight initialization
- Gradient clipping for stability
- Memory-efficient data pipeline
| Metric | Baseline | Improved | Change |
|---|---|---|---|
| Cumulative Return | 42.30% | 58.70% | +38.8% |
| Sharpe Ratio | 1.23 | 1.51 | +22.8% |
| Max Drawdown | -18.4% | -12.1% | +34.2% |
| Win Rate | 54.20% | 61.80% | +14.0% |
| Training Time | 14.3h | 2.1h | -85.3% |
| Final Portfolio | $142,300 | $158,700 | +11.5% |
pip install numpy pandas gym torch stable-baselines3 gradiopython improved_ppo_trading.pypython gradio_dashboard.pyThen open your browser to http://localhost:7860
python baseline_vs_improved_comparison.py- Environment: Custom Gym environment simulates stock trading with realistic constraints
- State Space: Price data, technical indicators, portfolio status, and market features
- Action Space: Buy, Sell, or Hold decisions
- Reward Function: Risk-adjusted returns considering volatility and transaction costs
- Training: PPO algorithm learns optimal trading policy through trial and error
Automated_Stock_using_PPO/
├── improved_ppo_trading.py # Main implementation
├── gradio_dashboard.py # Interactive dashboard
├── baseline_vs_improved_comparison.py
├── comparison_report.txt
├── description_markdowns/ # Documentation
│ ├── Phase2_Report.md
│ ├── START_HERE.md
│ └── ...
├── graphs_and_images/ # Performance charts
└── submission/ # Final deliverables
├── Research paper.pdf
└── Phase2_Report.pdf
For detailed information, check out:
description_markdowns/START_HERE.md- Quick start guidedescription_markdowns/Phase2_Report.md- Complete technical reportcomparison_report.txt- Performance analysis
The project includes several performance charts:
- Performance comparison (baseline vs improved)
- Risk analysis and drawdown comparison
- Training efficiency metrics
- Improvement percentages across all metrics
This is an academic project for educational purposes.
Original Paper:
"Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy"
arXiv:2511.12120
https://arxiv.org/pdf/2511.12120
This project builds upon the research paper above and uses the Stable-Baselines3 PPO implementation. Phase 2 improvements were developed as part of the FAST School of Computing RL course.