Skip to content

WillTheProgrammer/poly_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Poly Pipeline

Data pipeline that fetches Polymarket data from official APIs and loads it into PostgreSQL.

Data Sources

  • Markets: Polymarket Gamma API (REST) - market metadata, questions, tokens
  • Orders: Goldsky GraphQL API - on-chain order fill events from the CLOB

Database Schema

The pipeline creates:

Core Tables:

  • markets - Market metadata (question, tokens, volume, etc.)
  • raw_orders - Raw order fill events from Goldsky
  • trades - Processed trades (joined with markets, normalized prices)
  • pipeline_state - Tracks ingestion progress for resumability

Materialized Views:

  • mv_trader_pnl_by_market - P&L per trader per market
  • mv_trader_stats - Aggregate trader statistics
  • mv_market_positions - Current positions per trader per market
  • mv_avg_prices - Average entry prices per market side
  • mv_sharp_traders - High-performing traders (20+ markets, 200%+ ROI)
  • mv_sharp_positions - Positions held by sharp traders

Quick Start

1. Set up PostgreSQL

You need a PostgreSQL database. Options:

  • Local: docker run -p 5432:5432 -e POSTGRES_PASSWORD=password postgres:16
  • Cloud: Supabase, Railway, Neon, etc.

2. Configure Environment

cp .env.example .env
# Edit .env with your DATABASE_URL

3. Install Dependencies

uv sync

4. Initialize Database

uv run poly-pipeline init

5. Run the Pipeline

# Run everything (recommended for first run)
uv run poly-pipeline run-all

# Or run individual steps:
uv run poly-pipeline ingest-markets
uv run poly-pipeline ingest-orders
uv run poly-pipeline process-trades
uv run poly-pipeline refresh-views

CLI Commands

Command Description
init Initialize database schema and materialized views
ingest-markets Fetch markets from Gamma API
ingest-orders Fetch orders from Goldsky GraphQL
process-trades Transform raw orders into trades
refresh-views Refresh all materialized views
run-all Run complete pipeline
status Show database status and row counts

Options

  • --full-refresh: Start from the beginning instead of resuming from last position
# Resume from where we left off (default)
uv run poly-pipeline ingest-orders

# Start fresh from timestamp 0
uv run poly-pipeline ingest-orders --full-refresh

Resumability

The pipeline tracks progress in the pipeline_state table:

  • markets_last_offset: Last processed offset for markets
  • orders_last_timestamp: Last processed timestamp for orders

This allows the pipeline to resume from where it left off if interrupted.

Scheduling

For production, schedule the pipeline to run periodically:

# Example cron job (every hour)
0 * * * * cd /path/to/poly_pipeline && uv run poly-pipeline run-all >> /var/log/poly-pipeline.log 2>&1

Architecture

Gamma API (REST)          Goldsky (GraphQL)
      │                         │
      ▼                         ▼
  markets table            raw_orders table
                                │
                                ▼
                           trades table
                                │
                                ▼
                        Materialized Views
                                │
                                ▼
                          PolySite App

Environment Variables

Variable Required Default Description
DATABASE_URL Yes - PostgreSQL connection string
GAMMA_API_URL No (Polymarket) Markets API endpoint
GOLDSKY_GRAPHQL_URL No (Goldsky) Orders GraphQL endpoint
MARKETS_BATCH_SIZE No 100 Markets per API call
ORDERS_BATCH_SIZE No 1000 Orders per API call
MAX_RETRIES No 3 Retry attempts on failure
RETRY_DELAY_SECONDS No 5 Delay between retries
RATE_LIMIT_DELAY_SECONDS No 60 Delay on rate limit

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages