This project uses Reinforcement Learning (PPO) to optimize stock portfolio allocation and combines it with sentiment analysis from real-world financial news.
All of the following environment requirements can be found in env.yaml
python >= 3.10
gymnasium
pandas
numpy
yfinance
stable-baselines3
tqdm
pip
transformers
datasets
kaggle
Developers used the following devices:
- Lenovo Thinkpad T14s Gen2
- Macbook
We utilized no GPUs while training. Sentiment analysis can be faster with gpu, it currently takes around 30 minutes. Training with 100,000 timesteps takes around 2 minutes.
The following arguments are (optional) used in the program when running main.py.
| Argument | Type | Default | Description |
|---|---|---|---|
--num_portfolio_stocks |
int |
20 |
Number of stocks to include in the portfolio. |
--start_date |
str |
"2019-05-01" |
Starting date for the portfolio's timeframe (format: YYYY-MM-DD). |
--end_date |
str |
"2020-03-25" |
Ending date for the portfolio's timeframe (format: YYYY-MM-DD). |
--stock_index |
str |
"nasdaq" |
The stock index to fetch tickers from. Options: "nasdaq", "nyse", or "all". |
--random_seed |
int |
42 |
Random seed for reproducibility. |
--cache_dir |
str |
"./cache/" |
Directory path to store cached content. |
--use_sentiment |
int |
0 |
Whether to include news sentiment in the optimization strategy. Set to 1 to use sentiment, or 0 to ignore. |
--best_model_path |
str |
"./cache/best_model" |
Directory path to store the best-performing model. |
--eval_dir |
str |
"./cache/eval" |
Directory path to store evaluation callback results. |
python main.py --num_portfolio_stocks 25 --start_date 2020-01-01 --end_date 2021-01-01 --stock_index nyse --use_sentiment 1- Fetch current list of tickers available on NASDAQ, NYSE, or both
- Verify presence of tickers in yfinance api
- All verified tickers are treated as
valid_tickers
Results are cached, both the valid and invalid tickers, to speed up developement.
- Samples
ntickers fromvalid_tickersand treats them as theportfolio_stocks
- The tickers are re-validated when sampled due to issues with some tickers getting past verification 1st time
The environment uses OpenAI's Gymnasium and stable-baseline3 for logging, evaluation, and model implementation.
The portfolio is the both the stock portfolio and market, as it considers assets and market conditions.
The agent uses a Multi-Layer Perceptron policy and PPO algorithm to optimize asset allocation.
We use FinBERT (a financial-domain BERT model) to classify the sentiment of stock-related news articles.
The sentiment signal is then used to inform portfolio decisions, reflecting the daily tone of the market based on recent news coverage.
The pipeline:
- Downloads a large dataset of historical stock news headlines
- Filters to the busiest month
- Applies FinBERT sentiment analysis
- Aggregates into a daily sentiment time series
Output:
daily_sentiment.csv— one row per day with an average sentiment score (float between -1 and 1)
Run these commands from the project root:
make: Show available commandsmake get-data: Download the stock news dataset from Kagglemake filter-data n=10000: Filter to the busiest month and keep 10,000 random rowsmake get-sentiments: Run FinBERT sentiment pipeline on filtered (or full) datasetmake sentiment n=10000: Run full pipeline: download --> filter --> analyze
See env.yaml to set up the Conda environment:
conda env create -f env.yaml
conda activate RLAlso make sure to configure your Kaggle API credentials:
- Go to
https://www.kaggle.com/account - Create a new API token
- Place the downloaded
kaggle.jsonfile into:
~/.kaggle/kaggle.jsonOr set the following environment variables:
export KAGGLE_USERNAME=your_username
export KAGGLE_KEY=your_keyNote: You must download the data from kaggle, through API, then run sentiment pipeline separtely of the market data pipeline. After creating the daily_sentiment.csv file in the data directory, you can run main.py.
| Sentiment | Period | Cum. Return | Avg Return | Volatility | Sharpe (Simple) | Sharpe (Log) |
|---|---|---|---|---|---|---|
| No | 2019-05-01 to 2020-03-25 | 0.8391 | -0.0097 | 0.0617 | -0.6289 | -0.7474 |
| Yes | 2019-05-01 to 2020-03-25 | 0.8574 | -0.0084 | 0.0598 | -0.5619 | -0.6804 |
| No | 2019-05-01 to 2019-12-31 | 1.0140 | 0.0024 | 0.0090 | 0.6957 | 0.6839 |
| Yes | 2019-05-01 to 2019-12-31 | 1.0302 | 0.0050 | 0.0088 | 1.5024 | 1.4964 |
- More algorithms outside of PPO
- Implementation of PPO (not using stable-baseline3)
- Larger range of sentiment data used.
- Store sentiment data and cached data in database.
- More robust evaluation pipeline.
- Better logging.
- Use of technical indicators in data.