Using the Alpaca API
I created this project using the anaconda distribution. You can install it here. If you prefer a lightweight version, you can install Miniconda instead.
Create the conda environment with:
$ conda env create -f environment.yml
Activate the conda environment with:
$ conda activate alpaca
If you'd like to play around with generating the data, follow these steps:
-
$ touch api_key.py -
Add APCA_API_KEY_ID and APCA_API_SECRET_KEY to your
api_key.pyfile.- You have to get your own API keys from Alpaca.
api_key.pyhas already been added to the.gitignorefile. This is important because you should not be sharing your API keys with anyone. - An example of what this file should look like is given below:
- You have to get your own API keys from Alpaca.
APCA_API_KEY_ID = 'XXXXXXXXXXXXXXXXXXXX'
APCA_API_SECRET_KEY = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
-
$ python download_data.py- First, this program downloads OHLCV data from Alpaca for the past 253 days for AMZN.
- While Alpaca will let you retrieve OHLCV for up to 200 assets at a time, I found the responses to be weird. It seems that a request is made for 253 days of valid data for each stock. This means that if an asset is missing data from the most recent 253 days, then the returned pandas DataFrame will have > 253 rows with NaN values for the dates an asset is missing data but another asset has data.
- To counteract this, I retrieve OHLCV data for each asset and make sure the timeframe matches the timeframe for AMZN. Note that this means you'll easily be hitting the rate limit.
- I also filter out assets that aren't tradable or have a mean closing price lower than $10. Feel free to edit these filters. For context, I found these filters reduced the number of assets from ~8,000 to ~1,300.
- Saves each asset to a .csv file in the directory
data/
-
$ python accumulate_data.py- This program reads the .csv files in the directory
data/ - It accumulates the close data for all those files into one pandas Dataframe and saves it into a file called
my_close_data.csvin theaccumulated_data/directory. - And voila! You now have
my_close_data.csv, your data that you can use to perform stock prediction.
- This program reads the .csv files in the directory