Skip to content

A structured learning repository for Data Science using Python. Covers Data Cleaning, EDA, and visualization with Pandas, NumPy, Matplotlib, and Seaborn and more

License

Notifications You must be signed in to change notification settings

ggauravky/Data-Science-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

85 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Data Science Learning Journey

Master Data Science from Fundamentals to Machine Learning

Python NumPy Pandas Matplotlib Seaborn

Jupyter MySQL BeautifulSoup Scikit-Learn

GitHub stars GitHub forks Visitor Count

πŸš€ Get Started β€’ πŸ“š Curriculum β€’ 🎯 Projects β€’ πŸ’‘ Skills β€’ 🀝 Connect


🌟 About This Course

This repository is a comprehensive, hands-on data science curriculum designed to take you from absolute beginner to proficient data scientist. With 100+ Jupyter notebooks, real-world projects, and structured learning paths, you'll build a strong foundation in:

  • βœ… Python Programming - Master the language of data science
  • βœ… Data Analysis & Manipulation - Work with NumPy and Pandas
  • βœ… Data Visualization - Create stunning charts and insights
  • βœ… Web Scraping - Collect data from any website
  • βœ… SQL Databases - Query and manage data efficiently
  • βœ… Statistics & Probability - Build ML foundations
  • βœ… Machine Learning - Train your first ML models

πŸ“ˆ Status: 🟒 Active & Growing - New content added regularly!


πŸ“š Curriculum

πŸ“– Complete Course Modules

No. Module Topics Covered Content Skills
01 πŸŽ“ Data Science Intro Tools, Environment Setup, Career Paths, DS Lifecycle 1 PDF Guide Foundation setup
02 🐍 Python Fundamentals Variables, Data Types, Operators, Control Flow, Loops, Data Structures, OOP, Lambda 18 Notebooks Complete Python
03 πŸš€ Project: Social Network Recommendation Algorithms, Graph Theory, JSON Processing 3 Notebooks Real-world application
04 πŸ”’ NumPy Mastery Arrays, Indexing, Slicing, Broadcasting, Vectorization 5 Notebooks Numerical computing
05 🐼 Pandas Deep Dive DataFrames, Series, Grouping, Merging, Time Series 2 Notebooks Data manipulation
06 πŸ“Š Data Visualization Line, Bar, Pie, Scatter, Histogram, Heatmaps, Seaborn 8 Notebooks Visual storytelling
07 πŸ•·οΈ Web Scraping HTTP Requests, HTML Parsing, BeautifulSoup, Data Extraction 2 Notebooks + 49 HTML samples Web data collection
08 πŸ—„οΈ SQL & Databases CRUD Operations, Joins, Subqueries, Views, Stored Procedures 20 Tutorials Database management
09 πŸ“ˆ Probability & Stats Conditional Probability, Bayes Theorem, Distributions 3 Tutorials + Practice Statistical thinking
10 πŸ€– ML Introduction How Machines Learn, ML History, Traditional vs ML PPT + Notes ML fundamentals
11 πŸ”§ Sklearn Basics First ML Models, Training, Prediction, Model Selection 3 Notebooks Scikit-learn
12 πŸ“‹ ML Algorithm Types Supervised vs Unsupervised Learning, Use Cases 3 Guides Algorithm selection
13 🎯 ML Practice Iris Classification, Model Evaluation, RMSE, MAE, Test Sets 5+ Notebooks End-to-end ML

πŸš€ Quick Start

Prerequisites

  • πŸ’» Basic computer skills
  • 🧠 Curiosity and willingness to learn
  • ⏰ 8-10 hours per week commitment
  • ❌ No prior programming experience needed!

Installation

Step 1: Clone the repository

git clone https://github.com/ggauravky/Data-Science-Learning.git
cd Data-Science-Learning

Step 2: Set up Python environment

# Option A: Using Conda (Recommended)
conda create -n datasci python=3.11 -y
conda activate datasci
conda install numpy pandas matplotlib seaborn jupyter scikit-learn -y
pip install beautifulsoup4 requests

# Option B: Using pip
pip install numpy pandas matplotlib seaborn jupyter beautifulsoup4 requests scikit-learn

Step 3: Launch Jupyter

jupyter notebook

Step 4: Start learning! πŸŽ‰

Navigate to 002 Python refresher/01_python_basic.ipynb and begin your journey!


πŸ“– Learning Path

🎯 Recommended 12-Week Roadmap

graph LR
    A[Week 1-2: Python] --> B[Week 3-4: NumPy & Pandas]
    B --> C[Week 5-6: Visualization]
    C --> D[Week 7: Web Scraping]
    D --> E[Week 8: SQL]
    E --> F[Week 9-10: Probability]
    F --> G[Week 11-12: Machine Learning]
Loading
πŸ“… Week-by-Week Breakdown (Click to expand)

🌱 Phase 1: Foundation (Weeks 1-4)

Week 1-2: Python Programming

  • Complete all 18 Python notebooks
  • Focus: Variables, loops, functions, OOP
  • Practice: Daily coding exercises
  • Milestone: Build a simple calculator app

Week 3: NumPy

  • Master array operations
  • Learn vectorization techniques
  • Practice: Matrix manipulations

Week 4: Pandas & First Project

  • DataFrame operations
  • Data cleaning techniques
  • Project: Coders of Delhi recommendation system

🌿 Phase 2: Intermediate (Weeks 5-8)

Week 5-6: Data Visualization

  • All chart types in Matplotlib
  • Statistical plots with Seaborn
  • Practice: Visualize real datasets

Week 7: Web Scraping

  • HTTP requests and responses
  • HTML parsing with BeautifulSoup
  • Project: Book scraper

Week 8: SQL Databases

  • CRUD operations
  • Complex joins and queries
  • Practice: Build a movie database

🌳 Phase 3: Advanced (Weeks 9-12)

Week 9-10: Statistics & SQL Advanced

  • Probability distributions
  • Bayes theorem applications
  • Stored procedures and optimization

Week 11-12: Machine Learning

  • ML fundamentals
  • First models with Scikit-learn
  • Project: Iris classification
  • Model evaluation and metrics

🎯 Projects

Featured Real-World Projects

🌐 Coders of Delhi

Social Network Recommendation System

Build algorithms similar to Facebook's "People You May Know" feature.

Tech Stack: Python, JSON, Graph Algorithms
Complexity: Intermediate
Skills: Data structures, algorithms, recommendation engines

Files:

  • data_read.ipynb
  • people_you_may_know.ipynb
  • pages_you_might_like.ipynb

πŸ“š Book Data Scraper

Web Scraping Pipeline

Scrape 49 pages of book data from an online bookstore.

Tech Stack: Requests, BeautifulSoup, Pandas
Complexity: Beginner-Intermediate
Skills: HTTP, HTML parsing, data extraction

Output: Structured CSV with titles, prices, ratings

🌸 Iris Classification

Machine Learning Project

Train and evaluate ML models on the classic Iris dataset.

Tech Stack: Scikit-learn, NumPy, Pandas
Complexity: Intermediate
Skills: Model training, evaluation, accuracy metrics

Notebooks:

  • Quick training
  • Accuracy measurement
  • Data analysis
  • Test set creation
  • Stratified sampling

πŸ“Š Data Analysis Suite

Pandas Practice Projects

Analyze real-world datasets with advanced techniques.

Tech Stack: Pandas, Matplotlib, Seaborn
Complexity: Beginner-Intermediate
Skills: Grouping, merging, aggregation, visualization

Features:

  • Data cleaning pipelines
  • Statistical analysis
  • Trend visualization

πŸ’‘ Skills You'll Gain

🐍 Programming

  • βœ… Python syntax & semantics
  • βœ… Object-oriented programming
  • βœ… Functional programming
  • βœ… List comprehensions
  • βœ… Lambda expressions
  • βœ… File I/O operations
  • βœ… JSON data handling
  • βœ… Error handling

πŸ“Š Data Science

  • βœ… NumPy array operations
  • βœ… Pandas DataFrames
  • βœ… Data cleaning & preprocessing
  • βœ… Statistical analysis
  • βœ… Data visualization
  • βœ… Exploratory data analysis
  • βœ… Feature engineering
  • βœ… Data transformation

πŸ€– Machine Learning

  • βœ… ML fundamentals
  • βœ… Supervised learning
  • βœ… Unsupervised learning
  • βœ… Model training
  • βœ… Model evaluation
  • βœ… Scikit-learn library
  • βœ… Algorithm selection
  • βœ… Performance metrics

πŸ—„οΈ Databases

  • βœ… SQL queries (SELECT, JOIN)
  • βœ… Database design
  • βœ… CRUD operations
  • βœ… Aggregations & grouping
  • βœ… Subqueries
  • βœ… Views & indexes
  • βœ… Stored procedures
  • βœ… Query optimization

πŸ•·οΈ Web Scraping

  • βœ… HTTP protocol
  • βœ… HTML structure
  • βœ… CSS selectors
  • βœ… BeautifulSoup parsing
  • βœ… Requests library
  • βœ… Data extraction
  • βœ… Ethical scraping
  • βœ… Pipeline building

πŸ“ˆ Statistics

  • βœ… Probability theory
  • βœ… Distributions
  • βœ… Conditional probability
  • βœ… Bayes theorem
  • βœ… Hypothesis testing
  • βœ… Statistical inference
  • βœ… Sampling techniques
  • βœ… Error metrics

πŸ› οΈ Technology Stack

Core Technologies

Category Tools
πŸ’» Language Python 3.11+
πŸ“Š Data Analysis NumPy, Pandas
πŸ“ˆ Visualization Matplotlib, Seaborn
πŸ•ΈοΈ Web Scraping Requests, BeautifulSoup4
πŸ—„οΈ Database MySQL
πŸ€– Machine Learning Scikit-learn
πŸ““ IDE Jupyter Notebook, VS Code

πŸ“ˆ Progress Tracker

Use this checklist to track your learning journey:

Core Modules

  • πŸŽ“ Introduction to Data Science
  • 🐍 Python Fundamentals (18 notebooks)
  • πŸ”’ NumPy Mastery (5 notebooks)
  • 🐼 Pandas Deep Dive (2 notebooks)
  • πŸ“Š Data Visualization (8 notebooks)
  • πŸ•·οΈ Web Scraping (2 notebooks)
  • πŸ—„οΈ SQL & Databases (20 tutorials)
  • πŸ“ˆ Probability & Statistics
  • πŸ€– Machine Learning Introduction
  • πŸ”§ Scikit-learn Basics
  • πŸ“‹ ML Algorithm Types
  • 🎯 ML Practice (5+ notebooks)

Projects

  • 🌐 Coders of Delhi - Social Network
  • πŸ“š Book Data Scraper
  • 🌸 Iris Classification
  • πŸ“Š Data Analysis Projects

Milestones

  • πŸŽ–οΈ Completed first 50 notebooks
  • πŸ† Built 3 portfolio projects
  • πŸš€ Trained first ML model
  • ⭐ Contributed to the repo

🀝 Connect

Let's Learn Together!

LinkedIn GitHub Instagram

Questions? Suggestions? Want to collaborate?
Feel free to open an issue or reach out directly!


🀝 Contributing

We welcome contributions from the community! Here's how you can help:

Ways to Contribute

  • πŸ› Report Bugs: Found an error? Let us know!
  • πŸ’‘ Suggest Features: Have ideas for new content?
  • πŸ“ Improve Documentation: Help make explanations clearer
  • 🎨 Add Examples: Share your own projects and solutions
  • 🌐 Translate: Help make content accessible in other languages

How to Contribute

  1. Fork this repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

TL;DR: You can use, modify, and distribute this content freely. Attribution appreciated! πŸ™


⭐ Show Your Support

If this repository helped you in your data science journey:

  • ⭐ Star this repository
  • 🍴 Fork it for your own learning
  • πŸ“’ Share with fellow learners
  • πŸ’¬ Spread the word on social media

πŸ“Š Repository Stats

GitHub contributors GitHub last commit GitHub repo size


πŸ™ Acknowledgments

  • πŸŽ“ Inspired by various data science courses and bootcamps
  • πŸ“š Built with passion for the data science community
  • 🌟 Thanks to all contributors and learners

Made with ❀️ for Data Science Learners Worldwide

Happy Learning! πŸš€

Footer

About

A structured learning repository for Data Science using Python. Covers Data Cleaning, EDA, and visualization with Pandas, NumPy, Matplotlib, and Seaborn and more

Topics

Resources

License

Stars

Watchers

Forks