This project analyzes the relationship between advertising spending across different media channels (TV, Radio, Newspaper) and sales performance. It implements multiple linear regression models to predict sales based on advertising expenditure.
├── data_analysis.py # Data loading and preprocessing
├── model_training.py # Model training and evaluation
├── visualisation.py # Visualization functions
├── main.py # Main execution script
├── requirements.txt # Project dependencies
└── SalesPrediction.ipynb # Jupyter notebook with analysis
- Data preprocessing and cleaning
- Multiple linear regression models:
- Basic model (all features)
- Normalized model
- Subset model (Radio + Newspaper)
- Model performance comparison
- Visualization of results
- Feature importance analysis
- TV advertising shows highest correlation with sales
- Full model achieves R² score of 0.902
- Feature importance ranking:
- Radio (0.107)
- TV (0.054)
- Newspaper (0.0003)
- Install dependencies:
pip install -r requirements.txt- Run the analysis:
python main.py- Python 3.7+
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
- openpyxl
- jupyterlab