- Initial DataSet from Kaggle is linked in this readme file
- Set up GitHub Repository and connection with VSCode to Remote Push Files and Updates to the Repository for this project and maintain source control.
- SQL Queries and Cleaning Steps are documented and available in the Repo (Queries.py)
- Clean Excel dataset and SQL Temporary Tables are Avaialble in the uploaded Clean_Six_Nations. excel file available in the Repo. This will be used for Tableau and Power BI visualisations.
- Visualisations currently available under the Rugby_visualisations.pbix file, and Tableau_Rugby_Visualisations file in this GitHub repo.
- Construction of a Short Presentation / Story hi-lighting some of the insights gained.
- Microsoft Excel
- Power Query
- Tableau
- Power BI
- SQL
- VSCode
The Intention behind this project is to Analyse some Rugby Union Data. Specifically to clean, query / analyse and visualise data behind Six Nations Campaigns from the Year 2000-2023. Then to draw some conclusions if applicable and construct a data driven story in PowerPoint including findings.
The Data was obtained from a Kaggle Dataset - https://www.kaggle.com/datasets/lylebegbie/international-rugby-union-results-from-18712022
I Downloaded the Dataset from Kaggle and Opened the file in Microsoft Excel.
The first step was to profile the data, and get a bearing of the data and its strucutre.
After Profiling the data and recognising some errors in standardisation and duplicates. I set about cleaning the data in Excel.
Firstly this involved removing the duplicates from the data with the inbuilt funcitons.
- TRIM & PROPER to standardise Text and Spacing
- Find / Replace to remove special characters
- Standardising Date format to YYYY-MM-DD.
See Queries.py for more detail.
This involved standardising the Stadium Names, Competition Names, Creating a New Table and inserting data from the Rugby Union data set WHERE 'Six Nations' was the competition.
This Involved:
-
- Count Home Wins vs. Count Away Wins
-
- Biggest Points Difference - Home and Away Win
-
- Total Score (Trends over Time?)
-
- Home & Away Wins Grouped by Nation
-
- Team Win Rate / %, Per Year
-
- Team Win Rate per Stadium (If Multi stadiums)
See Queries.py for more detail.
- Intended to be Visualised in Power BI and Tableau, and will be linked in this Repo.
Power BI Interactive Report: What I Visualised:
Overview of the Six Nations Total Wins for Each Nation, with Drill Down for Home vs Away Record. Home vs. Away Wins for Each Nation Win Percentage by Year for Each nation Total Home Scores Total Away Scores Average Points Difference by Year (Filtered with Slider) Points Difference for Winner (Filtered with a Slider) Nation Performance by Location / Stadium (Drill down for City)
Techniques used:
- DAX: SUM, COUNT, SUMX, AVERAGE
- Data Moddeling, Bridging new connections, Cardinality and Cross Filter Direction between 7 Different Tables.
- Created New Columns, Dropped Columns
- Changed Data-Types
- Created New Measures and Calculated Columns
Tableau Report: What I Visualised:
- Stadium Points Difference Average
- Stadium Volume (Count of Games Played at Each Stadium)
- Home & Away Points Difference contextualised by the Home Team and Away Team.