This project demonstrates how to use Linear Regression to predict stock market Close Prices based on Open Prices using real-time stock data. The model is built using the scikit-learn library and evaluates its performance using common metrics like Mean Squared Error (MSE) and R-squared. A visualization of the regression line along with actual data points is also provided.
The dataset is fetched in real-time from stock data APIs (like Yahoo Finance/Alpha Vantage API/IEX Cloud API) and contains the following columns:
- Datetime: Timestamp of the stock price
- Open: Opening price of the stock
- High: Highest price during the minute
- Low: Lowest price during the minute
- Close: Closing price of the stock at the end of the minute
- Adj Close: Adjusted closing price
- Volume: Volume of trades during the minute
We focus on predicting the Close
price using the Open
price.
- Python (v3.8 or later)
- Pandas (for data manipulation)
- Matplotlib (for data visualization)
- Scikit-learn (for building the Linear Regression model)
- YahooFinance API (for fetching real-time stock data)
git clone https://github.com/SupremeEvilGod/ML3-Stock-Price-Prediction-Using-Linear-Regression.git
cd ML3-Stock-Price-Prediction-Using-Linear-Regression
Make sure you have Python installed. Then, install the required libraries using pip
.
pip install -r requirements.txt
The requirements.txt
file includes:
numpy
pandas
scikit-learn
matplotlib
yfinance
You can modify the script to fetch data for your preferred stock ticker using the yfinance
library.
import yfinance as yf
# Fetch data for a specific stock (e.g., MSFT)
data = yf.download(tickers="MSFT", period="1d", interval="1m")
-
Run the Model:
You can execute the Python script or Jupyter notebook to fetch the stock data, build the Linear Regression model, and evaluate its performance.
-
Model Outputs:
- Model Coefficient (Slope): How much the
Close Price
changes with respect toOpen Price
. - Model Intercept: Where the regression line intercepts the y-axis.
- Mean Squared Error (MSE): A measure of the model's error (lower is better).
- R-squared Value: Indicates how well the model explains the variance in the data.
- Model Coefficient (Slope): How much the
-
Model Coefficients:
- Coefficient (Slope):
0.971
- Intercept:
6.313
- Coefficient (Slope):
-
Performance Metrics:
- Mean Squared Error (MSE):
0.0148
- R-squared:
0.977
- Mean Squared Error (MSE):
These metrics show that the model performs very well, with 97.7% of the variance in the Close Price
explained by the Open Price
.
The scatter plot below shows the actual data points (in blue) and the predicted regression line (in red).