A SQL project analyzing the data engineer job market using real world job posting data. It demonstrates my ability to write production-quality analytical SQL, design efficient queries, and turn business questions into data-driven insights.
- ✅ Project scope: Built 3 analytical queries that answer key questions about the data engineer job market
- ✅ Data modeling: Used multi-table joins across fact and dimension tables to extract insights
- ✅ Analytics: Applied aggregations, filtering, and sorting to find top skills by demand, salary, and overall value
- ✅ Outcomes: Delivered actionable insights on SQL/Python dominance, cloud trends, and salary patterns
If you only have a minute, review these:
01_top_demanded_skills.sql– demand analysis with multi-table joins02_top_paying_skills.sql– salary analysis with aggregations03_optimal_skills.sql– combined demand/salary optimization query
Job market analysts need to answer questions like:
- 🎯 Most in-demand: Which skills are most in-demand for data engineers?
- 💰 Highest paid: Which skills command the highest salaries?
- ⚖️ Best trade-off: What is the optimal skill set balancing demand and compensation?
This project analyzes a data warehouse built using a star schema design. The warehouse structure consists of:
- Fact Table:
job_postings_fact- Central table containing job posting details (job titles, locations, salaries, dates, etc.) - Dimension Tables:
company_dim- Company information linked to job postingsskills_dim- Skills catalog with skill names and types
- Bridge Table:
skills_job_dim- Resolves the many-to-many relationship between job postings and skills
By querying across these interconnected tables, I extracted insights about skill demand, salary patterns, and optimal skill combinations for data engineering roles.
- 🐤 Query Engine: DuckDB for fast OLAP-style analytical queries
- 🧮 Language: SQL (ANSI-style with analytical functions)
- 📊 Data Model: Star schema with fact + dimension + bridge tables
- 🛠️ Development: VS Code for SQL editing + Terminal for DuckDB CLI
- 📦 Version Control: Git/GitHub for versioned SQL scripts
1_EDA/
├── 01_top_demanded_skills.sql # Demand analysis query
├── 02_top_paying_skills.sql # Salary analysis query
├── 03_optimal_skills.sql # Combined demand/salary optimization
└── README.md # You are here
- Top Demanded Skills – Identifies the 10 most in-demand skills for remote data engineer positions
- Top Paying Skills – Analyzes the 25 highest-paying skills with salary and demand metrics
- Optimal Skills – Calculates an optimal score using natural log of demand combined with median salary to identify the most valuable skills to learn
- 🧠 Core languages: SQL and Python each appear in ~29,000 job postings, making them the most demanded skills
- ☁️ Cloud platforms: AWS and Azure are critical for modern data engineering roles-
- 🧱 Infra & tooling: Kubernetes, Docker, and Terraform are associated with premium salaries
- 🔥 Big data tools: Apache Spark shows strong demand with competitive compensation
- Complex Joins: Multi-table
INNER JOINoperations acrossjob_postings_fact,skills_job_dim, andskills_dim - Aggregations:
COUNT(),MEDIAN(),ROUND()for statistical analysis - Filtering: Boolean logic with
WHEREclauses and multiple conditions (job_title_short,job_work_from_home,salary_year_avg IS NOT NULL) - Sorting & Limiting:
ORDER BYwithDESCandLIMITfor top-N analysis
- Grouping:
GROUP BYfor categorical analysis by skill - Mathematical Functions:
LN()for natural logarithm transformation to normalize demand metrics - Calculated Metrics: Derived optimal score combining log-transformed demand with median salary
- HAVING Clause: Filtering aggregated results (skills with >= 100 postings)
- NULL Handling: Proper filtering of incomplete records (
salary_year_avg IS NOT NULL)

