This project aims to predict the future performance of ETFs (Exchange Traded Funds) based on historical data, using the SARIMA (Seasonal Autoregressive Integrated Moving Average) model. The script processes an ETF portfolio, fetches historical data from Yahoo Finance, predicts future values for up to 15 years, and generates statistics for the portfolio's predicted performance.
This Python script automates the analysis of an ETF portfolio and provides predictions of future values. The project follows these key steps:
- Reading and normalizing the portfolio data.
- Fetching ETF historical data from Yahoo Finance.
- Enriching the portfolio with calculated metrics (e.g., Moving Average, Volatility, etc.).
- Predicting the future performance of the ETFs using the SARIMA model.
- Saving the predictions to a CSV file.
- Printing summarized statistics for the portfolio today and in the future.
Before running the script, make sure you have the following dependencies installed:
yfinance: To fetch historical stock and ETF data.pandas: For data manipulation.numpy: For numerical computations.statsmodels: To use the SARIMA model for time series prediction.
You can install these libraries via pip:
pip install yfinance pandas numpy statsmodelsFunction: read_and_normalize_portfolio(file_path)
The first step is to load the portfolio data from a CSV file. The script expects the portfolio to have a column named Weight, which will be normalized to ensure that the total weights sum to 1.
portfolio = read_and_normalize_portfolio(PORTFOLIO_PATH)- CSV file containing ETF symbols, weights, and other necessary information.
- Normalized portfolio DataFrame.
Function: fetch_and_enrich_etf_data(portfolio)
This step fetches historical data for each ETF from Yahoo Finance using the yfinance API. For each ETF, the following metrics are calculated and added to the portfolio:
- Last Close Price: The latest closing price of the ETF.
- 50-Day Moving Average (MA50): Average of the last 50 days of prices.
- Volatility: Calculated using the standard deviation of the daily returns.
- Expense Ratio: The ETF's expense ratio, if available.
- Currency Conversion: Prices are converted to USD if necessary using an exchange rate API (if the ETF is priced in another currency).
portfolio, etf_data = fetch_and_enrich_etf_data(portfolio)- Enriched portfolio DataFrame with additional columns for the calculated metrics.
- A dictionary of historical closing prices for each ETF.
Function: save_enriched_portfolio(portfolio)
After enriching the portfolio, the data is saved to a CSV file (enriched_portfolio.csv). This provides a reference for the next steps and allows for tracking the portfolio's data.
save_enriched_portfolio(portfolio)Function: predict_portfolio(portfolio, etf_data, years)
The core of the project is to predict future ETF prices. This is done using the SARIMA model, which is well-suited for time series forecasting. The prediction is made for 5, 10, and 15 years ahead (as defined by YEARS_TO_PREDICT).
SARIMA parameters used:
order=(1,1,1): Non-seasonal components (AR, differencing, MA).seasonal_order=(1,1,1,12): Seasonal components (AR, differencing, MA, seasonal period).
The model predicts daily prices for the number of years specified and extracts values for the specified future years.
portfolio_predictions = predict_portfolio(portfolio, etf_data, max(YEARS_TO_PREDICT))- A list of predictions for each ETF in the portfolio.
Function: save_predictions(portfolio, predictions)
The predictions generated in the previous step are saved to a CSV file (predictions.csv). This file contains both the current ETF data and the predicted values for future years.
save_predictions(portfolio, portfolio_predictions)Function: print_statistics(df_predictions)
Finally, the script prints statistics for the portfolio. The total value of the portfolio is calculated for today and for the future years (5, 10, and 15 years). It also prints a detailed breakdown of each ETF's performance.
df_predictions = pd.read_csv(OUTPUT_PATH, index_col=0)
print_statistics(df_predictions)- Console output of the portfolio's total value today and the predicted future values.
- Breakdown of each ETF's quantity and total predicted value.
The SARIMA model used for prediction is a powerful time series forecasting method that takes into account both seasonal and non-seasonal factors. The model is defined by two sets of parameters:
-
Non-seasonal components (
p,d,q):p: Number of lag observations included (AutoRegressive part).d: Degree of differencing (number of times the data is differenced).q: Size of the moving average window.
-
Seasonal components (
P,D,Q,m):P: Seasonal autoregressive terms.D: Seasonal differencing.Q: Seasonal moving average terms.m: Number of time steps for a single seasonal period (12 for monthly data).
/Users/caioteixeira/PycharmProjects/etfperformance/portfolio_data/
│
├── path_to_your_yahoo_finance_portfolio.csv # Input: Initial ETF portfolio
├── enriched_portfolio.csv # Output: Portfolio with enriched ETF data
├── predictions.csv # Output: Portfolio with future predictions
└── main.py # Python script (this project)
This project provides a framework for predicting the future performance of an ETF portfolio using historical data and time series forecasting. By leveraging the SARIMA model and Yahoo Finance data, you can gain insights into the potential future value of your portfolio.