📊 Data Science Learning Journey

Master Data Science from Fundamentals to Machine Learning

🚀 Get Started • 📚 Curriculum • 🎯 Projects • 💡 Skills • 🤝 Connect

🌟 About This Course

This repository is a comprehensive, hands-on data science curriculum designed to take you from absolute beginner to proficient data scientist. With 100+ Jupyter notebooks, real-world projects, and structured learning paths, you'll build a strong foundation in:

✅ Python Programming - Master the language of data science
✅ Data Analysis & Manipulation - Work with NumPy and Pandas
✅ Data Visualization - Create stunning charts and insights
✅ Web Scraping - Collect data from any website
✅ SQL Databases - Query and manage data efficiently
✅ Statistics & Probability - Build ML foundations
✅ Machine Learning - Train your first ML models

📈 Status: 🟢 Active & Growing - New content added regularly!

📚 Curriculum

📖 Complete Course Modules

No.	Module	Topics Covered	Content	Skills
01	🎓 Data Science Intro	Tools, Environment Setup, Career Paths, DS Lifecycle	1 PDF Guide	Foundation setup
02	🐍 Python Fundamentals	Variables, Data Types, Operators, Control Flow, Loops, Data Structures, OOP, Lambda	18 Notebooks	Complete Python
03	🚀 Project: Social Network	Recommendation Algorithms, Graph Theory, JSON Processing	3 Notebooks	Real-world application
04	🔢 NumPy Mastery	Arrays, Indexing, Slicing, Broadcasting, Vectorization	5 Notebooks	Numerical computing
05	🐼 Pandas Deep Dive	DataFrames, Series, Grouping, Merging, Time Series	2 Notebooks	Data manipulation
06	📊 Data Visualization	Line, Bar, Pie, Scatter, Histogram, Heatmaps, Seaborn	8 Notebooks	Visual storytelling
07	🕷️ Web Scraping	HTTP Requests, HTML Parsing, BeautifulSoup, Data Extraction	2 Notebooks + 49 HTML samples	Web data collection
08	🗄️ SQL & Databases	CRUD Operations, Joins, Subqueries, Views, Stored Procedures	20 Tutorials	Database management
09	📈 Probability & Stats	Conditional Probability, Bayes Theorem, Distributions	3 Tutorials + Practice	Statistical thinking
10	🤖 ML Introduction	How Machines Learn, ML History, Traditional vs ML	PPT + Notes	ML fundamentals
11	🔧 Sklearn Basics	First ML Models, Training, Prediction, Model Selection	3 Notebooks	Scikit-learn
12	📋 ML Algorithm Types	Supervised vs Unsupervised Learning, Use Cases	3 Guides	Algorithm selection
13	🎯 ML Practice	Iris Classification, Model Evaluation, RMSE, MAE, Test Sets	5+ Notebooks	End-to-end ML

🚀 Quick Start

Prerequisites

💻 Basic computer skills
🧠 Curiosity and willingness to learn
⏰ 8-10 hours per week commitment
❌ No prior programming experience needed!

Installation

Step 1: Clone the repository

git clone https://github.com/ggauravky/Data-Science-Learning.git
cd Data-Science-Learning

Step 2: Set up Python environment

# Option A: Using Conda (Recommended)
conda create -n datasci python=3.11 -y
conda activate datasci
conda install numpy pandas matplotlib seaborn jupyter scikit-learn -y
pip install beautifulsoup4 requests

# Option B: Using pip
pip install numpy pandas matplotlib seaborn jupyter beautifulsoup4 requests scikit-learn

Step 3: Launch Jupyter

jupyter notebook

Step 4: Start learning! 🎉

Navigate to 002 Python refresher/01_python_basic.ipynb and begin your journey!

📖 Learning Path

🎯 Recommended 12-Week Roadmap

graph LR
    A[Week 1-2: Python] --> B[Week 3-4: NumPy & Pandas]
    B --> C[Week 5-6: Visualization]
    C --> D[Week 7: Web Scraping]
    D --> E[Week 8: SQL]
    E --> F[Week 9-10: Probability]
    F --> G[Week 11-12: Machine Learning]

📅 Week-by-Week Breakdown (Click to expand)

🌱 Phase 1: Foundation (Weeks 1-4)

Week 1-2: Python Programming

Complete all 18 Python notebooks
Focus: Variables, loops, functions, OOP
Practice: Daily coding exercises
Milestone: Build a simple calculator app

Week 3: NumPy

Master array operations
Learn vectorization techniques
Practice: Matrix manipulations

Week 4: Pandas & First Project

DataFrame operations
Data cleaning techniques
Project: Coders of Delhi recommendation system

🌿 Phase 2: Intermediate (Weeks 5-8)

Week 5-6: Data Visualization

All chart types in Matplotlib
Statistical plots with Seaborn
Practice: Visualize real datasets

Week 7: Web Scraping

HTTP requests and responses
HTML parsing with BeautifulSoup
Project: Book scraper

Week 8: SQL Databases

CRUD operations
Complex joins and queries
Practice: Build a movie database

🌳 Phase 3: Advanced (Weeks 9-12)

Week 9-10: Statistics & SQL Advanced

Probability distributions
Bayes theorem applications
Stored procedures and optimization

Week 11-12: Machine Learning

ML fundamentals
First models with Scikit-learn
Project: Iris classification
Model evaluation and metrics

🎯 Projects

Featured Real-World Projects

🌐 Coders of Delhi

Social Network Recommendation System

Build algorithms similar to Facebook's "People You May Know" feature.

Tech Stack: Python, JSON, Graph Algorithms
Complexity: Intermediate
Skills: Data structures, algorithms, recommendation engines

Files:

data_read.ipynb
people_you_may_know.ipynb
pages_you_might_like.ipynb

📚 Book Data Scraper

Web Scraping Pipeline

Scrape 49 pages of book data from an online bookstore.

Tech Stack: Requests, BeautifulSoup, Pandas
Complexity: Beginner-Intermediate
Skills: HTTP, HTML parsing, data extraction

Output: Structured CSV with titles, prices, ratings

🌸 Iris Classification

Machine Learning Project

Train and evaluate ML models on the classic Iris dataset.

Tech Stack: Scikit-learn, NumPy, Pandas
Complexity: Intermediate
Skills: Model training, evaluation, accuracy metrics

Notebooks:

Quick training
Accuracy measurement
Data analysis
Test set creation
Stratified sampling

📊 Data Analysis Suite

Pandas Practice Projects

Analyze real-world datasets with advanced techniques.

Tech Stack: Pandas, Matplotlib, Seaborn
Complexity: Beginner-Intermediate
Skills: Grouping, merging, aggregation, visualization

Features:

Data cleaning pipelines
Statistical analysis
Trend visualization

💡 Skills You'll Gain

🐍 Programming ✅ Python syntax & semantics ✅ Object-oriented programming ✅ Functional programming ✅ List comprehensions ✅ Lambda expressions ✅ File I/O operations ✅ JSON data handling ✅ Error handling	📊 Data Science ✅ NumPy array operations ✅ Pandas DataFrames ✅ Data cleaning & preprocessing ✅ Statistical analysis ✅ Data visualization ✅ Exploratory data analysis ✅ Feature engineering ✅ Data transformation	🤖 Machine Learning ✅ ML fundamentals ✅ Supervised learning ✅ Unsupervised learning ✅ Model training ✅ Model evaluation ✅ Scikit-learn library ✅ Algorithm selection ✅ Performance metrics
🗄️ Databases ✅ SQL queries (SELECT, JOIN) ✅ Database design ✅ CRUD operations ✅ Aggregations & grouping ✅ Subqueries ✅ Views & indexes ✅ Stored procedures ✅ Query optimization	🕷️ Web Scraping ✅ HTTP protocol ✅ HTML structure ✅ CSS selectors ✅ BeautifulSoup parsing ✅ Requests library ✅ Data extraction ✅ Ethical scraping ✅ Pipeline building	📈 Statistics ✅ Probability theory ✅ Distributions ✅ Conditional probability ✅ Bayes theorem ✅ Hypothesis testing ✅ Statistical inference ✅ Sampling techniques ✅ Error metrics

🛠️ Technology Stack

Core Technologies

Category	Tools
💻 Language	Python 3.11+
📊 Data Analysis	NumPy, Pandas
📈 Visualization	Matplotlib, Seaborn
🕸️ Web Scraping	Requests, BeautifulSoup4
🗄️ Database	MySQL
🤖 Machine Learning	Scikit-learn
📓 IDE	Jupyter Notebook, VS Code

📈 Progress Tracker

Use this checklist to track your learning journey:

Core Modules

Projects

🌐 Coders of Delhi - Social Network
📚 Book Data Scraper
🌸 Iris Classification
📊 Data Analysis Projects

Milestones

🎖️ Completed first 50 notebooks
🏆 Built 3 portfolio projects
🚀 Trained first ML model
⭐ Contributed to the repo

🤝 Connect

Let's Learn Together!

Questions? Suggestions? Want to collaborate?
Feel free to open an issue or reach out directly!

🤝 Contributing

We welcome contributions from the community! Here's how you can help:

Ways to Contribute

🐛 Report Bugs: Found an error? Let us know!
💡 Suggest Features: Have ideas for new content?
📝 Improve Documentation: Help make explanations clearer
🎨 Add Examples: Share your own projects and solutions
🌐 Translate: Help make content accessible in other languages

How to Contribute

Fork this repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

TL;DR: You can use, modify, and distribute this content freely. Attribution appreciated! 🙏

⭐ Show Your Support

If this repository helped you in your data science journey:

⭐ Star this repository
🍴 Fork it for your own learning
📢 Share with fellow learners
💬 Spread the word on social media

📊 Repository Stats

🙏 Acknowledgments

🎓 Inspired by various data science courses and bootcamps
📚 Built with passion for the data science community
🌟 Thanks to all contributors and learners

Made with ❤️ for Data Science Learners Worldwide

Happy Learning! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
001 Data Science Intro		001 Data Science Intro
002 Python refresher		002 Python refresher
003 Project 001 - Coders of Delhi		003 Project 001 - Coders of Delhi
004 NumPy		004 NumPy
005 Pandas		005 Pandas
006 Matplotlib and Seaborn		006 Matplotlib and Seaborn
007 Web Scrapping		007 Web Scrapping
008 Databases		008 Databases
009 Probability		009 Probability
010 Machine Learning for Data Scientists		010 Machine Learning for Data Scientists
011 sklearn demo		011 sklearn demo
012 Types of ML Algorithms		012 Types of ML Algorithms
013 Demo Practice ML using Scikit Learn		013 Demo Practice ML using Scikit Learn
014 Practical ML using Scikit-learn		014 Practical ML using Scikit-learn
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

ggauravky/Data-Science-Learning

Folders and files

Latest commit

History

Repository files navigation