Project/
├── main.py # Main execution script
├── pyproject.toml # Project dependencies
├── uv.lock # Dependency lock file
├── README.md # Project documentation
├── .gitignore # Git ignore rules
│
├── src/ # Source code modules
│ ├── __init__.py
│ ├── DataPreprocessor.py # Data loading and preprocessing
│ ├── ModelTrainer.py # Machine learning model training
│ └── Cluster.py # Clustering analysis
│
├── notebook/ # Jupyter notebooks
│ └── COMP20008_A2.ipynb # notebook same with Colab
│
└── images/ # Generated visualizations
├── waste_time_distribution.png
├── model_performance_comparison.png
├── confusion_matrices.png
├── clustering_features_correlation.png
├── cluster_visualization.png
└── cluster_commuter_profiles.png
- Install dependencies:
pip install -e .Or
uv syncOr install packages directly:
pip install matplotlib numpy pandas scikit-learn seaborn xgboostWe highly recommend using Google Colab to review our code and results, as it provides a convenient way to display and interact with the outputs.
Access the notebook here: COMP20008 Assignment 2 Colab
Or
Run the main analysis:
python main.py > output.log 2>&1This will:
- Load and preprocess Victoria transport data
- Perform correlation analysis
- Train and evaluate classification models
- Generate clustering analysis
- Save all visualizations to
./images/
Note: There are a difference between notebook and script, notebook will not save result in the folder, but the script will.