Data pipeline that fetches Polymarket data from official APIs and loads it into PostgreSQL.
- Markets: Polymarket Gamma API (REST) - market metadata, questions, tokens
- Orders: Goldsky GraphQL API - on-chain order fill events from the CLOB
The pipeline creates:
Core Tables:
markets- Market metadata (question, tokens, volume, etc.)raw_orders- Raw order fill events from Goldskytrades- Processed trades (joined with markets, normalized prices)pipeline_state- Tracks ingestion progress for resumability
Materialized Views:
mv_trader_pnl_by_market- P&L per trader per marketmv_trader_stats- Aggregate trader statisticsmv_market_positions- Current positions per trader per marketmv_avg_prices- Average entry prices per market sidemv_sharp_traders- High-performing traders (20+ markets, 200%+ ROI)mv_sharp_positions- Positions held by sharp traders
You need a PostgreSQL database. Options:
- Local:
docker run -p 5432:5432 -e POSTGRES_PASSWORD=password postgres:16 - Cloud: Supabase, Railway, Neon, etc.
cp .env.example .env
# Edit .env with your DATABASE_URLuv syncuv run poly-pipeline init# Run everything (recommended for first run)
uv run poly-pipeline run-all
# Or run individual steps:
uv run poly-pipeline ingest-markets
uv run poly-pipeline ingest-orders
uv run poly-pipeline process-trades
uv run poly-pipeline refresh-views| Command | Description |
|---|---|
init |
Initialize database schema and materialized views |
ingest-markets |
Fetch markets from Gamma API |
ingest-orders |
Fetch orders from Goldsky GraphQL |
process-trades |
Transform raw orders into trades |
refresh-views |
Refresh all materialized views |
run-all |
Run complete pipeline |
status |
Show database status and row counts |
--full-refresh: Start from the beginning instead of resuming from last position
# Resume from where we left off (default)
uv run poly-pipeline ingest-orders
# Start fresh from timestamp 0
uv run poly-pipeline ingest-orders --full-refreshThe pipeline tracks progress in the pipeline_state table:
markets_last_offset: Last processed offset for marketsorders_last_timestamp: Last processed timestamp for orders
This allows the pipeline to resume from where it left off if interrupted.
For production, schedule the pipeline to run periodically:
# Example cron job (every hour)
0 * * * * cd /path/to/poly_pipeline && uv run poly-pipeline run-all >> /var/log/poly-pipeline.log 2>&1Gamma API (REST) Goldsky (GraphQL)
│ │
▼ ▼
markets table raw_orders table
│
▼
trades table
│
▼
Materialized Views
│
▼
PolySite App
| Variable | Required | Default | Description |
|---|---|---|---|
DATABASE_URL |
Yes | - | PostgreSQL connection string |
GAMMA_API_URL |
No | (Polymarket) | Markets API endpoint |
GOLDSKY_GRAPHQL_URL |
No | (Goldsky) | Orders GraphQL endpoint |
MARKETS_BATCH_SIZE |
No | 100 | Markets per API call |
ORDERS_BATCH_SIZE |
No | 1000 | Orders per API call |
MAX_RETRIES |
No | 3 | Retry attempts on failure |
RETRY_DELAY_SECONDS |
No | 5 | Delay between retries |
RATE_LIMIT_DELAY_SECONDS |
No | 60 | Delay on rate limit |