A demonstration project for reactive data pipeline using Dagster, showcasing modern data orchestration patterns with ETF data collection and analysis.
This project is part of the talk "The rise of Dataset and its applications" at Grill the data event.
This project is configured for development using Dev Containers, providing a consistent development environment across different machines.
Choose one of the following development environments:
Option 1: Gitpod (Recommended)
Option 2: VS Code with Dev Containers
Option 3: Local Development
- Python 3.8+
- uv package manager
-
Clone the repository:
git clone <repository-url> cd reactive_data_pipeline
-
Open in VS Code:
code . -
When prompted, click "Reopen in Container" or use Command Palette:
- Press
Ctrl+Shift+P(orCmd+Shift+Pon Mac) - Type "Dev Containers: Reopen in Container"
- Select the option
- Press
-
The container will build automatically and install all dependencies using uv
-
Install uv package manager:
curl -LsSf https://astral.sh/uv/install.sh | sh -
Clone and setup the project:
git clone <repository-url> cd reactive_data_pipeline
-
Install dependencies:
uv sync
-
Create Dagster home directory:
mkdir dagster_home
The easiest way to run the application is using the dagster dev command, which starts both the web UI and daemon in a single process:
export DAGSTER_HOME="$(pwd)/dagster_home"
uv run dagster devThe Dagster web interface will be available at http://localhost:3000
If you prefer to run components separately:
Start Dagster Web Server:
export DAGSTER_HOME="$(pwd)/dagster_home"
uv run dagster-webserverStart Dagster Daemon (for Schedules & Sensors): In a separate terminal:
export DAGSTER_HOME="$(pwd)/dagster_home"
uv run dagster-daemon run# Install dependencies
uv sync
# Install development dependencies
uv sync --group dev
# Run tests
uv run pytest
# Format code
uv run black .
# Lint code
uv run flake8
# Run the standalone script
uv run python main.py
# Add new dependencies
uv add package-name
# Add development dependencies
uv add --group dev package-name├── .devcontainer/ # Dev container configuration
├── dagster_repository/ # Main Dagster package
│ ├── assets/ # Asset definitions
│ ├── jobs.py # Job definitions
│ ├── schedules.py # Scheduled executions
│ ├── sensors.py # Event-driven triggers
│ └── ...
├── dagster_repository_tests/ # Test suite
├── main.py # Standalone script
├── pyproject.toml # Project configuration
└── uv.lock # Dependency lock file
- Reactive Data Pipeline: ETF data collection with event-driven processing
- Daily Partitioning: Time-series data management
- Scheduled Jobs: Automated data collection at 9:00 AM daily
- Sensors: Reactive triggers for downstream analysis
- SQLite Storage: Local database for development
- Modern Tooling: uv for fast dependency management
Container Issues:
- Rebuild container:
Ctrl+Shift+P→ "Dev Containers: Rebuild Container" - Check Docker is running and has sufficient resources
Dependency Issues:
- Regenerate lock file:
uv lock --upgrade - Clear cache:
uv cache clean
Dagster Issues:
- Clear Dagster home:
rm -rf dagster_home && mkdir dagster_home - Check logs in the Dagit interface
- Make changes in your dev container
- Run tests:
uv run pytest - Format code:
uv run black . - Lint:
uv run flake8 - Commit and push changes
The dev container ensures all contributors use the same Python version, dependencies, and development tools.