Name	Name	Last commit message	Last commit date
parent directory ..
01_top_demanded_skills.sql	01_top_demanded_skills.sql
02_top_paying_skills.sql	02_top_paying_skills.sql
03_optimal_skills.sql	03_optimal_skills.sql
README.md	README.md

🔍 Exploratory Data Analysis w/ SQL: Job Market Analytics

A SQL project analyzing the data engineer job market using real world job posting data. It demonstrates my ability to write production-quality analytical SQL, design efficient queries, and turn business questions into data-driven insights.

🧾 Executive Summary (For Hiring Managers)

✅ Project scope: Built 3 analytical queries that answer key questions about the data engineer job market
✅ Data modeling: Used multi-table joins across fact and dimension tables to extract insights
✅ Analytics: Applied aggregations, filtering, and sorting to find top skills by demand, salary, and overall value
✅ Outcomes: Delivered actionable insights on SQL/Python dominance, cloud trends, and salary patterns

If you only have a minute, review these:

01_top_demanded_skills.sql – demand analysis with multi-table joins
02_top_paying_skills.sql – salary analysis with aggregations
03_optimal_skills.sql – combined demand/salary optimization query

🧩 Problem & Context

Job market analysts need to answer questions like:

🎯 Most in-demand: Which skills are most in-demand for data engineers?
💰 Highest paid: Which skills command the highest salaries?
⚖️ Best trade-off: What is the optimal skill set balancing demand and compensation?

This project analyzes a data warehouse built using a star schema design. The warehouse structure consists of:

Fact Table: job_postings_fact - Central table containing job posting details (job titles, locations, salaries, dates, etc.)
Dimension Tables:
- company_dim - Company information linked to job postings
- skills_dim - Skills catalog with skill names and types
Bridge Table: skills_job_dim - Resolves the many-to-many relationship between job postings and skills

By querying across these interconnected tables, I extracted insights about skill demand, salary patterns, and optimal skill combinations for data engineering roles.

🧰 Tech Stack

🐤 Query Engine: DuckDB for fast OLAP-style analytical queries
🧮 Language: SQL (ANSI-style with analytical functions)
📊 Data Model: Star schema with fact + dimension + bridge tables
🛠️ Development: VS Code for SQL editing + Terminal for DuckDB CLI
📦 Version Control: Git/GitHub for versioned SQL scripts

📂 Repository Structure

1_EDA/
├── 01_top_demanded_skills.sql    # Demand analysis query
├── 02_top_paying_skills.sql      # Salary analysis query
├── 03_optimal_skills.sql         # Combined demand/salary optimization
└── README.md                     # You are here

🏗 Analysis Overview

Query Structure

Top Demanded Skills – Identifies the 10 most in-demand skills for remote data engineer positions
Top Paying Skills – Analyzes the 25 highest-paying skills with salary and demand metrics
Optimal Skills – Calculates an optimal score using natural log of demand combined with median salary to identify the most valuable skills to learn

Key Insights

🧠 Core languages: SQL and Python each appear in ~29,000 job postings, making them the most demanded skills
☁️ Cloud platforms: AWS and Azure are critical for modern data engineering roles-
🧱 Infra & tooling: Kubernetes, Docker, and Terraform are associated with premium salaries
🔥 Big data tools: Apache Spark shows strong demand with competitive compensation

💻 SQL Skills Demonstrated

Query Design & Optimization

Complex Joins: Multi-table INNER JOIN operations across job_postings_fact, skills_job_dim, and skills_dim
Aggregations: COUNT(), MEDIAN(), ROUND() for statistical analysis
Filtering: Boolean logic with WHERE clauses and multiple conditions (job_title_short, job_work_from_home, salary_year_avg IS NOT NULL)
Sorting & Limiting: ORDER BY with DESC and LIMIT for top-N analysis

Data Analysis Techniques

Grouping: GROUP BY for categorical analysis by skill
Mathematical Functions: LN() for natural logarithm transformation to normalize demand metrics
Calculated Metrics: Derived optimal score combining log-transformed demand with median salary
HAVING Clause: Filtering aggregated results (skills with >= 100 postings)
NULL Handling: Proper filtering of incomplete records (salary_year_avg IS NOT NULL)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

🔍 Exploratory Data Analysis w/ SQL: Job Market Analytics

🧾 Executive Summary (For Hiring Managers)

🧩 Problem & Context

🧰 Tech Stack

📂 Repository Structure

🏗 Analysis Overview

Query Structure

Key Insights

💻 SQL Skills Demonstrated

Query Design & Optimization

Data Analysis Techniques

FilesExpand file tree

1_EDA

Directory actions

More options

Directory actions

More options

Latest commit

History

1_EDA

Folders and files

parent directory

README.md

🔍 Exploratory Data Analysis w/ SQL: Job Market Analytics

🧾 Executive Summary (For Hiring Managers)

🧩 Problem & Context

🧰 Tech Stack

📂 Repository Structure

🏗 Analysis Overview

Query Structure

Key Insights

💻 SQL Skills Demonstrated

Query Design & Optimization

Data Analysis Techniques