.
├─ README.md
├─ data
│ └─ clean_full_bbo_data.parquet
├─ data_analysis.ipynb
├─ get_smaller_dataset.ipynb
├─ load_data.py
├─ main.ipynb
├─ requirements.txt
└─ utils.py
Install required packages with :
pip install -r requirements.txt
Set up a data folder as following :
mkdir -p data
Subsequently, download data from the drive provided by the Professor. The data used is called sp100_2004-8, which contains data of the SP100 from 2004 to 2008. After downloading the data, store the folder sp100_2004-8 inside data.
Then, download from this drive the file raw_full_bbo_data.parquet and store it into the folder data you have created.
This project explores how denoising the covariance matrix can impact the variance of the optimal portfolio, computed using the Markowitz formula.
The project workflow is as follows:
-
Main Analysis:
The analysis is performed in main.ipynb, using the datasetclean_full_bbo_data.parquet. -
Dataset Preparation:
- The dataset
clean_full_bbo_data.parquetis generated by running get_smaller_dataset.ipynb on the fileraw_full_bbo_data.parquet. - The file
raw_full_bbo_data.parquetis derived from the SP100 dataset and can be obtained in two ways:- Download it using the second drive link provided above.
- Generate it directly by running load_data.py.
- The dataset
- Input: SP100 dataset.
- Intermediate Files:
raw_full_bbo_data.parquet: Preprocessed dataset.clean_full_bbo_data.parquet: Final dataset used for analysis.
For any question and/or curiosity, feel free to reach