π Get Started β’ π Curriculum β’ π― Projects β’ π‘ Skills β’ π€ Connect
This repository is a comprehensive, hands-on data science curriculum designed to take you from absolute beginner to proficient data scientist. With 100+ Jupyter notebooks, real-world projects, and structured learning paths, you'll build a strong foundation in:
- β Python Programming - Master the language of data science
- β Data Analysis & Manipulation - Work with NumPy and Pandas
- β Data Visualization - Create stunning charts and insights
- β Web Scraping - Collect data from any website
- β SQL Databases - Query and manage data efficiently
- β Statistics & Probability - Build ML foundations
- β Machine Learning - Train your first ML models
π Status: π’ Active & Growing - New content added regularly!
| No. | Module | Topics Covered | Content | Skills |
|---|---|---|---|---|
| 01 | π Data Science Intro | Tools, Environment Setup, Career Paths, DS Lifecycle | 1 PDF Guide | Foundation setup |
| 02 | π Python Fundamentals | Variables, Data Types, Operators, Control Flow, Loops, Data Structures, OOP, Lambda | 18 Notebooks | Complete Python |
| 03 | π Project: Social Network | Recommendation Algorithms, Graph Theory, JSON Processing | 3 Notebooks | Real-world application |
| 04 | π’ NumPy Mastery | Arrays, Indexing, Slicing, Broadcasting, Vectorization | 5 Notebooks | Numerical computing |
| 05 | πΌ Pandas Deep Dive | DataFrames, Series, Grouping, Merging, Time Series | 2 Notebooks | Data manipulation |
| 06 | π Data Visualization | Line, Bar, Pie, Scatter, Histogram, Heatmaps, Seaborn | 8 Notebooks | Visual storytelling |
| 07 | π·οΈ Web Scraping | HTTP Requests, HTML Parsing, BeautifulSoup, Data Extraction | 2 Notebooks + 49 HTML samples | Web data collection |
| 08 | ποΈ SQL & Databases | CRUD Operations, Joins, Subqueries, Views, Stored Procedures | 20 Tutorials | Database management |
| 09 | π Probability & Stats | Conditional Probability, Bayes Theorem, Distributions | 3 Tutorials + Practice | Statistical thinking |
| 10 | π€ ML Introduction | How Machines Learn, ML History, Traditional vs ML | PPT + Notes | ML fundamentals |
| 11 | π§ Sklearn Basics | First ML Models, Training, Prediction, Model Selection | 3 Notebooks | Scikit-learn |
| 12 | π ML Algorithm Types | Supervised vs Unsupervised Learning, Use Cases | 3 Guides | Algorithm selection |
| 13 | π― ML Practice | Iris Classification, Model Evaluation, RMSE, MAE, Test Sets | 5+ Notebooks | End-to-end ML |
- π» Basic computer skills
- π§ Curiosity and willingness to learn
- β° 8-10 hours per week commitment
- β No prior programming experience needed!
Step 1: Clone the repository
git clone https://github.com/ggauravky/Data-Science-Learning.git
cd Data-Science-LearningStep 2: Set up Python environment
# Option A: Using Conda (Recommended)
conda create -n datasci python=3.11 -y
conda activate datasci
conda install numpy pandas matplotlib seaborn jupyter scikit-learn -y
pip install beautifulsoup4 requests
# Option B: Using pip
pip install numpy pandas matplotlib seaborn jupyter beautifulsoup4 requests scikit-learnStep 3: Launch Jupyter
jupyter notebookStep 4: Start learning! π
Navigate to 002 Python refresher/01_python_basic.ipynb and begin your journey!
graph LR
A[Week 1-2: Python] --> B[Week 3-4: NumPy & Pandas]
B --> C[Week 5-6: Visualization]
C --> D[Week 7: Web Scraping]
D --> E[Week 8: SQL]
E --> F[Week 9-10: Probability]
F --> G[Week 11-12: Machine Learning]
π Week-by-Week Breakdown (Click to expand)
Week 1-2: Python Programming
- Complete all 18 Python notebooks
- Focus: Variables, loops, functions, OOP
- Practice: Daily coding exercises
- Milestone: Build a simple calculator app
Week 3: NumPy
- Master array operations
- Learn vectorization techniques
- Practice: Matrix manipulations
Week 4: Pandas & First Project
- DataFrame operations
- Data cleaning techniques
- Project: Coders of Delhi recommendation system
Week 5-6: Data Visualization
- All chart types in Matplotlib
- Statistical plots with Seaborn
- Practice: Visualize real datasets
Week 7: Web Scraping
- HTTP requests and responses
- HTML parsing with BeautifulSoup
- Project: Book scraper
Week 8: SQL Databases
- CRUD operations
- Complex joins and queries
- Practice: Build a movie database
Week 9-10: Statistics & SQL Advanced
- Probability distributions
- Bayes theorem applications
- Stored procedures and optimization
Week 11-12: Machine Learning
- ML fundamentals
- First models with Scikit-learn
- Project: Iris classification
- Model evaluation and metrics
|
Social Network Recommendation System Build algorithms similar to Facebook's "People You May Know" feature. Tech Stack: Python, JSON, Graph Algorithms Files:
|
Web Scraping Pipeline Scrape 49 pages of book data from an online bookstore. Tech Stack: Requests, BeautifulSoup, Pandas Output: Structured CSV with titles, prices, ratings |
|
Machine Learning Project Train and evaluate ML models on the classic Iris dataset. Tech Stack: Scikit-learn, NumPy, Pandas Notebooks:
|
Pandas Practice Projects Analyze real-world datasets with advanced techniques. Tech Stack: Pandas, Matplotlib, Seaborn Features:
|
|
|
|
|
|
|
| Category | Tools |
|---|---|
| π» Language | Python 3.11+ |
| π Data Analysis | NumPy, Pandas |
| π Visualization | Matplotlib, Seaborn |
| πΈοΈ Web Scraping | Requests, BeautifulSoup4 |
| ποΈ Database | MySQL |
| π€ Machine Learning | Scikit-learn |
| π IDE | Jupyter Notebook, VS Code |
Use this checklist to track your learning journey:
- π Introduction to Data Science
- π Python Fundamentals (18 notebooks)
- π’ NumPy Mastery (5 notebooks)
- πΌ Pandas Deep Dive (2 notebooks)
- π Data Visualization (8 notebooks)
- π·οΈ Web Scraping (2 notebooks)
- ποΈ SQL & Databases (20 tutorials)
- π Probability & Statistics
- π€ Machine Learning Introduction
- π§ Scikit-learn Basics
- π ML Algorithm Types
- π― ML Practice (5+ notebooks)
- π Coders of Delhi - Social Network
- π Book Data Scraper
- πΈ Iris Classification
- π Data Analysis Projects
- ποΈ Completed first 50 notebooks
- π Built 3 portfolio projects
- π Trained first ML model
- β Contributed to the repo
Questions? Suggestions? Want to collaborate?
Feel free to open an issue or reach out directly!
We welcome contributions from the community! Here's how you can help:
- π Report Bugs: Found an error? Let us know!
- π‘ Suggest Features: Have ideas for new content?
- π Improve Documentation: Help make explanations clearer
- π¨ Add Examples: Share your own projects and solutions
- π Translate: Help make content accessible in other languages
- Fork this repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
TL;DR: You can use, modify, and distribute this content freely. Attribution appreciated! π
If this repository helped you in your data science journey:
- β Star this repository
- π΄ Fork it for your own learning
- π’ Share with fellow learners
- π¬ Spread the word on social media
- π Inspired by various data science courses and bootcamps
- π Built with passion for the data science community
- π Thanks to all contributors and learners