=======
This project is an end-to-end data science pipeline that includes data processing, model training, and evaluation. The
project is structured to facilitate easy understanding and modification.
-
config/: Contains configuration files for the project.
config.yaml: Main configuration file.
-
src/DS/: Main source directory for the project.
- components/: Contains the core components of the data processing and model pipeline.
- config/: Configuration-related modules.
- constants/: Defines constant values used across the project.
- entity/: Contains entity definitions and configurations.
- pipeline/: Manages the execution of the data science pipeline.
- utils/: Utility functions for common tasks such as reading and writing files.
-
research/: Contains Jupyter notebooks for exploratory data analysis and research.
-
templates/: HTML templates for any web-based components.
- Data Processing: Includes utilities for reading and writing YAML and JSON files, and managing directories.
- Model Training: Configurable pipeline for training machine learning models.
- Logging: Integrated logging for tracking the execution of the pipeline.
- Install Dependencies: Use the
requirements.txtto install necessary Python packages.pip install -r requirements.txt
2 Run the Pipeline: Execute the main script to start the data science pipeline.
python main.py
3 Explore Notebooks: Use the Jupyter notebooks in the research/ directory for data exploration and model evaluation.