This dataset contains all active events and markets from Polymarket, fetched via their public REST API.
The dataset is stored as a JSON file (polymarket_dataset.json) with the following structure:
{
"events": [...],
"markets": [...],
"total_events": 7811,
"total_markets": 30797
}An event is a top-level object representing a question (e.g., "Who will win the 2024 Presidential Election?"). Each event object contains:
id: Unique event identifierslug: URL-friendly identifier (e.g., "event-slug-name")title: Event question/titledescription: Detailed description of the eventimage: Image URL for the eventactive: Boolean indicating if the event is currently activeclosed: Boolean indicating if the event is closedstartDate: ISO timestamp for event startendDate: ISO timestamp for event endvolume: Total trading volumevolume24hr: 24-hour trading volumeliquidity: Current liquiditymarkets: Array of associated market objects (nested within event)tags: Array of tag objects for categorizationseries: Series/grouping information (if applicable)
A market is a specific tradable binary outcome within an event. Each market object contains:
id: Unique market identifierslug: URL-friendly identifierquestion: The market questionconditionId: Condition identifierquestionId: Question identifiermarketMakerAddress: Market maker contract addressoutcomes: JSON string array of possible outcomes (e.g.,["Yes", "No"])outcomePrices: JSON string array of implied probabilities matching outcomes (e.g.,["0.20", "0.80"])clobTokenIds: Object mapping outcomes to CLOB token addresses for tradingenableOrderBook: Boolean indicating if the market can be traded via CLOBactive: Boolean indicating if the market is currently activeclosed: Boolean indicating if the market is closedvolume: Total trading volumevolume24hr: 24-hour trading volumeliquidity: Current liquidityevent: Reference to parent event (contains event id, slug, title)
outcomesandoutcomePricesare stored as JSON strings and need to be parsed- Prices represent implied probabilities (should sum to ~1.0 across all outcomes)
- The arrays map 1:1: index 0 of
outcomescorresponds to index 0 ofoutcomePrices - Markets can be single-outcome (Yes/No) or multi-outcome (e.g., multiple candidates)
- Total Events: 7,811 active events
- Total Markets: 30,797 active markets
- File Size: ~200+ MB (varies based on current Polymarket activity)
import json
with open('polymarket_dataset.json', 'r') as f:
dataset = json.load(f)
# Access events
events = dataset['events']
print(f"Total events: {dataset['total_events']}")
# Access markets
markets = dataset['markets']
print(f"Total markets: {dataset['total_markets']}")
# Parse market outcomes and prices
for market in markets[:5]:
outcomes = json.loads(market['outcomes'])
prices = json.loads(market['outcomePrices'])
print(f"{market['question']}: {dict(zip(outcomes, prices))}")Run the ingestor script to fetch fresh data:
python backend/ingestor/ingestor.pyThis will:
- Fetch all active events from Polymarket API
- Fetch all active markets from Polymarket API
- Combine them into a single dataset
- Save to
polymarket_dataset.json
Note: The fetch process may take several minutes depending on the number of active markets.
Data is fetched from Polymarket's public REST API:
- Base URL:
https://gamma-api.polymarket.com - Endpoints:
/eventsand/markets - Authentication: None required (public API)
This dataset is useful for:
- Arbitrage opportunity detection
- Market analysis and research
- Price trend analysis
- Market liquidity analysis
- Event categorization and filtering