Welcome to the CMT 429 Group Assignment repository! This project aims to explore fundamental concepts in Data Science and apply them to real-world problems.
- Project Overview
- Objectives
- Technologies Used
- Installation Instructions
- Data Analysis
- Visualizations
- Contributors
- License
This project involves analyzing datasets to extract meaningful insights and build predictive models. We will cover various aspects of Data Science, including data cleaning, visualization, and machine learning.
- Understand the data science workflow.
- Gain hands-on experience with data analysis tools.
- Develop skills in statistical analysis and machine learning.
- Programming Language: R
- Libraries:
tidyversefor data manipulation and visualizationggplot2for creating visualizationsrandomForestfor machine learning modelscorrplotfor correlation analysis
To set up the project locally, follow these steps:
- Clone the repository:
git clone https://github.com/ismailanyi/Assignment.git
- Navigate to the project directory:
cd Assignment - Install the required packages:
install.packages(c("tidyverse", "randomForest", "corrplot", "dplyr"))
The project includes various scripts for data analysis, including:
- Data Loading: Loading datasets and performing initial checks.
- Data Cleaning: Handling missing values and duplicates.
- Correlation Analysis: Analyzing relationships between variables.
Visualizations are created using ggplot2 to illustrate key findings, including:
- Average sales volume by region.
- Trends in used BMW prices across regions.
- Correlation heatmaps.
- Mailanyi Ismail - 1049453
- Renish Amondi - 1049526
- Victor Ochieng -1049246
- Maxwell Muguna -1049417
- Reagan Machuki - 1049420
- Enock Ikhavi - 1049071