Skip to content

yothinix/reactive_data_pipeline

Repository files navigation

Reactive Data Pipeline

A demonstration project for reactive data pipeline using Dagster, showcasing modern data orchestration patterns with ETF data collection and analysis.

This project is part of the talk "The rise of Dataset and its applications" at Grill the data event.

Development Setup with Dev Containers

This project is configured for development using Dev Containers, providing a consistent development environment across different machines.

Prerequisites

Choose one of the following development environments:

Option 1: Gitpod (Recommended)

  • Open in Gitpod: Open in Gitpod
  • Everything is pre-configured and ready to use

Option 2: VS Code with Dev Containers

Option 3: Local Development

  • Python 3.8+
  • uv package manager

Getting Started

Using Dev Containers (VS Code)

  1. Clone the repository:

    git clone <repository-url>
    cd reactive_data_pipeline
  2. Open in VS Code:

    code .
  3. When prompted, click "Reopen in Container" or use Command Palette:

    • Press Ctrl+Shift+P (or Cmd+Shift+P on Mac)
    • Type "Dev Containers: Reopen in Container"
    • Select the option
  4. The container will build automatically and install all dependencies using uv

Local Development Setup

  1. Install uv package manager:

    curl -LsSf https://astral.sh/uv/install.sh | sh
  2. Clone and setup the project:

    git clone <repository-url>
    cd reactive_data_pipeline
  3. Install dependencies:

    uv sync
  4. Create Dagster home directory:

    mkdir dagster_home

Running the Application

Start Dagster Development Server (Recommended)

The easiest way to run the application is using the dagster dev command, which starts both the web UI and daemon in a single process:

export DAGSTER_HOME="$(pwd)/dagster_home"
uv run dagster dev

The Dagster web interface will be available at http://localhost:3000

Alternative: Start Components Separately

If you prefer to run components separately:

Start Dagster Web Server:

export DAGSTER_HOME="$(pwd)/dagster_home"
uv run dagster-webserver

Start Dagster Daemon (for Schedules & Sensors): In a separate terminal:

export DAGSTER_HOME="$(pwd)/dagster_home"
uv run dagster-daemon run

Development Commands

# Install dependencies
uv sync

# Install development dependencies
uv sync --group dev

# Run tests
uv run pytest

# Format code
uv run black .

# Lint code
uv run flake8

# Run the standalone script
uv run python main.py

# Add new dependencies
uv add package-name

# Add development dependencies
uv add --group dev package-name

Project Structure

├── .devcontainer/           # Dev container configuration
├── dagster_repository/      # Main Dagster package
│   ├── assets/             # Asset definitions
│   ├── jobs.py             # Job definitions
│   ├── schedules.py        # Scheduled executions
│   ├── sensors.py          # Event-driven triggers
│   └── ...
├── dagster_repository_tests/ # Test suite
├── main.py                 # Standalone script
├── pyproject.toml          # Project configuration
└── uv.lock                 # Dependency lock file

Key Features

  • Reactive Data Pipeline: ETF data collection with event-driven processing
  • Daily Partitioning: Time-series data management
  • Scheduled Jobs: Automated data collection at 9:00 AM daily
  • Sensors: Reactive triggers for downstream analysis
  • SQLite Storage: Local database for development
  • Modern Tooling: uv for fast dependency management

Troubleshooting

Container Issues:

  • Rebuild container: Ctrl+Shift+P → "Dev Containers: Rebuild Container"
  • Check Docker is running and has sufficient resources

Dependency Issues:

  • Regenerate lock file: uv lock --upgrade
  • Clear cache: uv cache clean

Dagster Issues:

  • Clear Dagster home: rm -rf dagster_home && mkdir dagster_home
  • Check logs in the Dagit interface

Contributing

  1. Make changes in your dev container
  2. Run tests: uv run pytest
  3. Format code: uv run black .
  4. Lint: uv run flake8
  5. Commit and push changes

The dev container ensures all contributors use the same Python version, dependencies, and development tools.

About

This project is part of the talk "The rise of Dataset and its applications" at Grill the data event.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages