Skip to content

This project focuses on analyzing ICC Cricket World Cup statistics through data preparation, transformation, analysis, machine learning model deployment, and creating an interactive dashboard.

Notifications You must be signed in to change notification settings

KavishkaVenuka/PyData_Assessment

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyData Project Repository

This project focuses on analyzing ICC Cricket World Cup statistics through data preparation, transformation, analysis, machine learning model deployment, and creating an interactive dashboard.

📁 Case Study

The project is based on the case study: [ICC Cricket World Cup Stats]

Objective

[Description The ICC Cricket World Cup is one of the most highly regarded events in the cricketing world, where teams from around the world compete for glory. This case study shall explore past data from ICC Cricket World Cups to gain insight, find trends, and predict outcomes using this information. By preparing, transforming, analyzing, and visualizing cricket statistics, this project will narrate interesting data-driven stories, exploring at the same time how machine learning models can be used to gain deeper insights.

Problem Statement Cricket generates an enormous amount of data: player performances, match results, team strategies, and historical statistics. However, much of this data exists in raw and unstructured forms, making it difficult to garner actionable insights from it.

The major problems we try to solve are the following:

Data Preparation Challenges: Cleaning, merging, and transforming raw datasets into a state ready for analysis. Identifying Key Insights: Analyzing player and team performance metrics to uncover trends and outliers. Model Deployment: To identify a reliable Hugging Face model that can be used in making predictions or classifying certain aspects of the cricket statistics. Visualization and Storytelling: Develop an interactive dashboard to effectively communicate findings and let the user explore them. Objective The goal will be to develop an intensive yet user-friendly platform, integrating statistical analysis, machine learning, and visualization for insight into the performances of the ICC Cricket World Cup. This platform will:

Allow for the efficient processing and analysis of data. Predict trends or outcomes using AI/ML models. Visualize data interactively, enhancing understanding and engagement for cricket enthusiasts and analysts.]


🛠️ Project Structure

|-- data/                  # Folder containing datasets
|-- notebooks/
|   |-- Task_2.ipynb       # Data preparation and analysis notebook
|   |-- Task_3.ipynb       # NLP model notebook
|-- dashboard/
|   |-- app.py             # Plotly Dash dashboard script
|-- README.md              # Project documentation
|-- requirements.txt       # Python dependencies

🧑‍💻 Tasks

Task 1: Git Basics

  • Repository maintained with proper branch management.
  • Meaningful commits and conflict-free main branch.
  • Includes:
    • At least 2 commits per member.
    • One completed pull request per branch.

Task 2: Data Preparation and Analysis

  • Notebook: Task_2.ipynb
  • Key Steps:
    • Data cleaning (removal of duplicates and handling missing values).
    • Data transformation (pivoting and grouping).
    • Insights and explanations documented using Markdown.

Task 3: NLP with Hugging Face

  • Notebook: Task_3.ipynb
  • Objective:
    • Deploy a suitable Hugging Face model for [insert NLP task: e.g., sentiment analysis].
    • Validate the model's reliability and relevance to the dataset.

Task 4: Visualization Dashboard

  • Script: dashboard/app.py
  • Features:
    • At least 5 chart types.
    • Interactive filters for dynamic exploration of data.
    • Clear storytelling through visualizations.

🛠️ Setup Instructions

1. Clone the Repository

git clone https://github.com/[your-username]/[repo-name].git
cd [repo-name]

2. Install Dependencies

pip install -r requirements.txt

3. Run the Dashboard

Navigate to the dashboard/ directory and execute:

python app.py

Access the dashboard at http://localhost:8050.


📊 Deliverables

  1. Public GitHub Repository - Link: Repository Link
  2. Screen Recording - A demonstration of the dashboard's functionality.
  3. PowerPoint Presentation - A 5-slide summary of the project.

🗂️ Datasets

  • The datasets for this project are located in the data/ folder.
  • Data was sourced from [insert source, if applicable].

🧑‍🤝‍🧑 Team Members

  • Senidu ravihara - dashboard
  • Tharidu thilakarathna - dashboard
  • Tharidu Nimsara - NLP model
  • Kavishka venuka - NLP model
  • dulan jeewantha - Data preparation handle outilers and missing values
  • amantha sandun - remove any duplicate
  • piyumi madushika - editing readme file and remove null records
  • W.P sudasun - Adding new coloumns and to dataframe
  • ravidu yehan - concaternate files

📝 License

This project is licensed under the MIT License.


About

This project focuses on analyzing ICC Cricket World Cup statistics through data preparation, transformation, analysis, machine learning model deployment, and creating an interactive dashboard.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.2%
  • Python 2.8%