GitHub - ilanatr/Project_Portfolio: Coding Tasks

About this Repo

This is a portfolio of my work applying fundamental data analytic and data science techniques. It includes different code tasks, which are outlined in more detail below.

1. Organising and Filtering Datasets

Organising and Filtering Datasets

Objective: Develop a Python script that filters and organizes a dataset based on specific criteria.

Requirements:

Pandas library
Use dictionary, list or set
Implement functions that filter data

Method: The file Data_Filtering_Organising includes three key user-defined functions:

Reads the dataset and outputs the values associated with a specific key in the dataset
Outputs a frequency table for a specified column name
Creates a filtered data set just including name and age based on a specific age range

2. Rock Paper Scissors Game

Rock, Paper, Scissors Game

Objective:

To create a programme that plays the rock, paper, scissors game which seeks input from the user and a random choice from the computer to determine a winner.

Method:

Import Random library
Utilises exception handling to highlight input (value) errors.
Define three functions: user's choice, computer's choice, determine a winner
Create final function: play the game based on previous three functions

3. Data visualisation of car data

Data Visualisation

Objective:

To create and interpret simple data visualisations for car manufacturer data

Requirements:

Pandas library
Matplotlib.pyplot libray
NumPy library
Seaborn library
Include boxplots, histograms, lineplots and barplots.

Results:

The manufacturer with the highest revs per mile is the Geo, which has an average revs per mile of 3755
The distribution of the histogram indicates that fuel consumption is generally higher on the highway as the bar charts are skewed towards the right while city fuel consumption is skewed more towards the left and lower end of the x axis indicating more cars have lower MPG.
There is a linear relationship between a car's turning circle and its wheelbase. As the wheelbase gets larger the turning circle also gets larger, although there are a few outliers. Where the wheelbase is 115mm, the turning circle angle is at 38, this is lower than for a car with a wheelbase of around 112, which has a higher turning circle of 43 degrees
Vans have a smaller average horsepower than a sporty or midsized vehicle but a larger car does indeed have a larger average horsepower than both small and compact cars.

4. Analyse Customer Data with SQL

SQL Customer Data Analysis

Objective:

To analyse the SQL Lite Sakila database which includes data on DVD rental stores globally.

Output: The SQL file implements queries to create and read the database in order to calculate information such as total payments and average payment information.

5. Data Statistics and Data Visualisation of country CO2 emissions

Objective: carry out exploratory data analysis of CO2 emissions for countries

Data Source: Kaggle CO2 emissions: 1960-2019

Notebook: Country CO2 Emissions EDA

Requirements:

Use pandas, numpy, matplotlib and seaborn libraries
Compute basic stats (mean, median, mode, sd, quartiles)
Create visualisations (hists, boxplots, scatterplots)
Apply basic inferential stats (e.g. confidence intervals)

This tool computes descriptive statistics for a dataset on CO2 emissions by countries and visualises these statistics through various charts and graphs. The tool helps in understanding the distribution, central tendency, and variability of the data using timeseries analysis.

Insights:

The average CO2 emissions for 2019 across all countries in this dataset is 190969.22 kilotons.
The country with the highest CO2 emissions in 2019 was China. China had 10707219.73 kilotons of CO2 emissions.
The standard deviation for the CO2 emissions in the 2019 country dataset is 892755.94. This indicates a large variation in data across countries.
China's CO2 emissions have increased dramatically since the year 2000. In the year 2000, China's CO2 emissions were 3,346,530 kilotons, which rose to 10,707,220 in 2019.

6. Linear Regression analysis on Algerian Forest Fires

Algerian Forest Fires

Objective:

To examine the relationship between specific variables and forest fires in Algeria

Requirements:

Use pandas, numpy, matplotlib, sklearn and seaborn libraries

Method:

This analysis involved cleaning a dataset of forest fires in Algeria to handle missing values and inconsistencies.
Exploratory Data Analysis (EDA) was conducted to examine the relationships between meteorological variables (temperature, wind, and relative humidity) and the area burned by fires.
Visualisations, including scatter plots and correlation analyses, are used to illustrate these relationships.
Predictive modeling with Linear Regression and Random Forest Regressor is applied to predict the burned area, revealing the complexity of fire prediction and the need for more sophisticated modeling.

Results: Reveals weak correlations between meteorological factors (temperature, wind, and relative humidity) and the area burned by forest fires. Despite applying Linear Regression models, the predictive accuracy was low, indicating that predicting forest fire areas is complex and may require more advanced models or additional data for better accuracy.

7. Socioeconomic Country Clustering with K-Means

K Means Clustering of Countries by Socio-Economic Features Objective:

To group countries using socio-economic and health factors to determine the development status of the country.

Requirements:

Use pandas, numpy, matplotlib, sklearn and seaborn libraries

Method: The Socioeconomic_Country_Clustering.ipynb notebook applies K-means clustering to group countries based on various socioeconomic indicators. The analysis involves:

Data Preprocessing: Cleaning and preparing the dataset for analysis.
Feature Selection: Selecting relevant socioeconomic indicators for clustering.
K-means Clustering: Applying the K-means algorithm to identify distinct clusters of countries.
Cluster Analysis: Interpreting the clusters and visualizing the results using various plots.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
1. Data_Filtering_Organising.ipynb		1. Data_Filtering_Organising.ipynb
2. Rock, Paper, Scissors Game		2. Rock, Paper, Scissors Game
3. Data Vis_Manufacturer Data.ipynb		3. Data Vis_Manufacturer Data.ipynb
4. Analyse Customer DVD Rentals in SQL		4. Analyse Customer DVD Rentals in SQL
5. Country CO2 Emissions.ipynb		5. Country CO2 Emissions.ipynb
6. Algerian Forest Fires.ipynb		6. Algerian Forest Fires.ipynb
7. Socioeconomic_Country_Clustering.ipynb		7. Socioeconomic_Country_Clustering.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About this Repo

1. Organising and Filtering Datasets

2. Rock Paper Scissors Game

3. Data visualisation of car data

4. Analyse Customer Data with SQL

5. Data Statistics and Data Visualisation of country CO2 emissions

6. Linear Regression analysis on Algerian Forest Fires

7. Socioeconomic Country Clustering with K-Means

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About this Repo

1. Organising and Filtering Datasets

2. Rock Paper Scissors Game

3. Data visualisation of car data

4. Analyse Customer Data with SQL

5. Data Statistics and Data Visualisation of country CO2 emissions

6. Linear Regression analysis on Algerian Forest Fires

7. Socioeconomic Country Clustering with K-Means

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages