This project demonstrates the implementation of a complete data engineering pipeline that collects weather data, processes it through an ETL process, and stores it in a PostgreSQL database. The project leverages Python, PostgreSQL, and Docker to handle and expose weather data for cities in Brazil. The pipeline fetches weather data, transforms it, and loads it into a data warehouse designed to store and analyze temperature data over time.
The architecture follows the ETL (Extract, Transform, Load) pattern, utilizing the following components:
- Data Extraction: Weather data is fetched via the Visual Crossing Weather API for multiple Brazilian cities.
- Data Transformation: The data is cleaned and formatted in Python using Pandas before being inserted into PostgreSQL.
- Data Loading: Data is loaded into a PostgreSQL database using SQL, where it is organized into multiple tables with a star schema design.
- Docker: The entire environment, including the etl app and PostgreSQL database, is containerized using Docker.
The PostgreSQL database schema is designed to store weather data in a star schema format with the following key tables:
- dim.city: Stores information about cities.
- dim.conditions: Stores weather conditions (e.g., "sunny", "cloudy").
- dim.country: Stores country information (only Brazil in this case).
- dim.date: Stores date-related information (e.g., day, month, weekday, is_weekend).
- fact.temperature: Stores the actual temperature measurements for each city on a given date and condition.
- stage.weathervc: This table holds the raw weather data fetched from the Visual Crossing API before it is transformed and loaded into the main tables.
- Several stored procedures are used to load data into dimension tables (
dim.city,dim.conditions,dim.country,dim.date) and merge the temperature data into the fact table (fact.temperature).
The dimensional model in this project follows a star schema, which is ideal for analytical workloads and reporting. It separates the data into dimension tables (describing the context) and a fact table (storing measurable events). This approach improves query performance and supports easy filtering, aggregation, and slicing of data.
- Each temperature reading in the
fact.temperaturetable is linked to:- a specific city in
dim.city, which is related to adim.country - a specific date in
dim.date, which includes calendar attributes - a weather condition in
dim.conditions, which includes descriptive information
- a specific city in
- The
country_name,city_name,condition_name, andfull_datefields are unique in their respective tables to avoid duplication. - The combination of
date_idandcity_idis unique in the fact table, ensuring that each city has only one temperature record per date.
This model enables the generation of time-series analyses, condition-based aggregations, and city-wise comparisons with high flexibility and performance.
A unified query (also used to power visualizations) joins all tables to produce complete weather facts enriched by date, location, and conditions.
The ETL process is orchestrated in Python, where:
- Data Extraction: Weather data is fetched using the Visual Crossing API for a predefined list of Brazilian capitals.
- Data Transformation: The raw data is cleaned, and necessary transformations are applied (e.g., ensuring data integrity, formatting the data properly).
- Data Loading: The data is inserted into the
stage.weathervctable in PostgreSQL, and then dimension and fact tables are populated using stored procedures that handle the data transformations and merging.
-
main.py: This file is responsible for:- Fetching weather data using the API.
- Inserting the data into the staging table (
stage.weathervc). - Calling stored procedures to load data into dimension tables and merge into the fact table.
-
Stored Procedures: These are PL/pgSQL scripts in the
database_metadata.sqlfile that manage the data transformation and insertion logic for the various tables.
The project is containerized using Docker to ensure easy deployment and environment consistency. The Dockerfile sets up the necessary environment for running the application.
- It uses the official Python image to install required Python dependencies.
- The PostgreSQL instance runs locally within the Docker container, ensuring that the application can interact with the database seamlessly.
To run the project locally using Docker, follow these steps:
-
Build the Docker image:
docker build -t weather-etl . -
Run the Docker container:
You can create a .env file to execute the docker locally in a proper way like this.
VC_API_KEY=your visual crossing key DB_HOST=your pg database DB_PORT=5432 DB_USER=your db user DB_PASSWORD=your db password DB_NAME=database name
For local executions only
docker run -it weather-etl bash
To input the information from a env file
docker run --env-file .env weather-etl
-
The Streamlit Showcase:
- You can visit the Streamlit app for Data Visualization here https://weatherstreamlit.onrender.com/
- Source code for that https://github.com/dbconsultoria/weatherstreamlit
This project is deployed on Render.com for cloud hosting. The deployment process is automated using Docker, making it easy to deploy the application and database on Render's cloud infrastructure. The API app is accessible via a public URL and the PostgreSQL database is managed within the Render environment.
The request app exposes the following endpoints:
- GET
/weather/: Fetches weather data for all cities. - GET
/weather/{city}: Fetches weather data for a specific city.
For detailed API documentation, visit the /docs endpoint after running the app locally or on Render.
- Docker
- Python 3.9+
- PostgreSQL
This project demonstrates how to create a complete ETL pipeline, manage data with a star schema in PostgreSQL, and expose an API to access the processed data. It serves as an excellent example of building scalable data engineering solutions with Python, PostgreSQL and Docker.
This README.md is structured to provide a clear explanation of the project components, data model, and how to run the project both locally and in the cloud. Let me know if you need further adjustments!
