A SQL project analyzing the data engineer job market using real world job posting data. It demonstrates my ability to write production-quality analytical SQL, design efficient queries, and turn business questions into data-driven insights.
- โ Project scope: Built 3 analytical queries that answer key questions about the data engineer job market
- โ Data modeling: Used multi-table joins across fact and dimension tables to extract insights
- โ Analytics: Applied aggregations, filtering, and sorting to find top skills by demand, salary, and overall value
- โ Outcomes: Delivered actionable insights on SQL/Python dominance, cloud trends, and salary patterns
If you only have a minute, review these:
01_top_demanded_skills.sqlโ demand analysis with multi-table joins02_top_paying_skills.sqlโ salary analysis with aggregations03_optimal_skills.sqlโ combined demand/salary optimization query
Job market analysts need to answer questions like:
- ๐ฏ Most in-demand: Which skills are most in-demand for data engineers?
- ๐ฐ Highest paid: Which skills command the highest salaries?
- โ๏ธ Best trade-off: What is the optimal skill set balancing demand and compensation?
This project analyzes a data warehouse built using a star schema design. The warehouse structure consists of:
- Fact Table:
job_postings_fact- Central table containing job posting details (job titles, locations, salaries, dates, etc.) - Dimension Tables:
company_dim- Company information linked to job postingsskills_dim- Skills catalog with skill names and types
- Bridge Table:
skills_job_dim- Resolves the many-to-many relationship between job postings and skills
By querying across these interconnected tables, I extracted insights about skill demand, salary patterns, and optimal skill combinations for data engineering roles.
- ๐ค Query Engine: DuckDB for fast OLAP-style analytical queries
- ๐งฎ Language: SQL (ANSI-style with analytical functions)
- ๐ Data Model: Star schema with fact + dimension + bridge tables
- ๐ ๏ธ Development: VS Code for SQL editing + Terminal for DuckDB CLI
- ๐ฆ Version Control: Git/GitHub for versioned SQL scripts
1_EDA/
โโโ 01_top_demanded_skills.sql # Demand analysis query
โโโ 02_top_paying_skills.sql # Salary analysis query
โโโ 03_optimal_skills.sql # Combined demand/salary optimization
โโโ README.md # You are here
- Top Demanded Skills โ Identifies the 10 most in-demand skills for remote data engineer positions
- Top Paying Skills โ Analyzes the 25 highest-paying skills with salary and demand metrics
- Optimal Skills โ Calculates an optimal score using natural log of demand combined with median salary to identify the most valuable skills to learn
- ๐ง Core languages: SQL and Python each appear in ~29,000 job postings, making them the most demanded skills
- โ๏ธ Cloud platforms: AWS and Azure are critical for modern data engineering roles-
- ๐งฑ Infra & tooling: Kubernetes, Docker, and Terraform are associated with premium salaries
- ๐ฅ Big data tools: Apache Spark shows strong demand with competitive compensation
- Complex Joins: Multi-table
INNER JOINoperations acrossjob_postings_fact,skills_job_dim, andskills_dim - Aggregations:
COUNT(),MEDIAN(),ROUND()for statistical analysis - Filtering: Boolean logic with
WHEREclauses and multiple conditions (job_title_short,job_work_from_home,salary_year_avg IS NOT NULL) - Sorting & Limiting:
ORDER BYwithDESCandLIMITfor top-N analysis
- Grouping:
GROUP BYfor categorical analysis by skill - Mathematical Functions:
LN()for natural logarithm transformation to normalize demand metrics - Calculated Metrics: Derived optimal score combining log-transformed demand with median salary
- HAVING Clause: Filtering aggregated results (skills with >= 100 postings)
- NULL Handling: Proper filtering of incomplete records (
salary_year_avg IS NOT NULL)

