Turning raw Polymarket data into AI/ML-ready datasets using automated multi-agent pipelines.
This project demonstrates how to:
- Collect raw market data from the Polymarket GAMMA API, where no public dataset previously existed.
- Transform it into a machine-learning-ready dataset with financial features.
- Use AI agents to automatically discover and extract relevant external information about each market.
- Enable downstream ML tasks such as sentiment analysis, market analysis, and decision optimization.
A custom pipeline gathers comprehensive raw market data, including:
- General market information
- Hourly market price data
- Market volume and liquidity
- Daily trades
- Number of holders and their positions
- Daily order books
From this, additional financial scoring features are engineered for each market.
โก๏ธ The complete dataset is available on Kaggle.
A Search Strategist Agent (powered by crewAI) takes as input a market_question
and market_description
, then:
- Generates optimized queries using Tavily Search.
- Scores results by source credibility, direct relevance, timeline, and data quality.
- Ranks and outputs the Top 10 high-quality URLs for each market.
๐ Config details:
- Agent:
search_strategist
- Task:
url_discovery_task
- Tool:
tavily_search.py
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ง Agent Tool Execution โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โ
โ Agent: Search Strategist and URL Analyst โ
โ โ
โ Thought: I have a comprehensive plan to analyze the market question regarding the 2024 Taiwanese presidential election. I've generated a list of targeted search queries to โ
โ gather relevant information. Now, I need to execute these queries using the Tavily Search Tool to collect the URLs and their content for analysis. โ
โ โ
โ Using Tool: Tavily Search Tool โ
โ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Tool Input โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โ
โ "{\"queries\": [{\"query\": \"2024 Taiwan election polls Hou Yu-ih Ko Wen-je\"}, {\"query\": \"Taiwan presidential election Hou Yu-ih vs Ko Wen-je\"}, ... ]}" โ
โ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Tool Output โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โ
โ { โ
โ "results": [ โ
โ { "url": "https://apnews.com/article/taiwan-election-hou-talks...", "title": "Taiwan presidential hopeful Hou promises...", "score": 0.98524 }, โ
โ { "url": "https://international.thenewslens.com/feature/2024-taiwan-election...", "title": "Ko Overtakes Lai in Presidential Race...", "score": 0.98205 }, โ
โ ... โ
โ ] โ
โ } โ
โ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
A Structured Extractor Agent takes the top URLs
from the Search Strategist
and:
- Uses the WebsiteSearchTool + Googleโs Gemini (via RAG) to extract structured insights.
- Returns a rich JSON schema per article, including:
- Headline, date, author, summary, main text
- Named entities, key events
- Market mentions, causal statements, Wikipedia edit info
๐ Config details:
- Agent:
structured_extractor
- Task:
extract_structured_articles_task
- Tool:
rag_extract.py
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ง Agent Tool Execution โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โ
โ Agent: Structured Data Extractor โ
โ โ
โ Thought: I will attempt to extract the information from the second URL, as the first one returned a forbidden error. I will use the tool to search for the relevant information โ
โ โ
โ Using Tool: Search in a specific website โ
โ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Tool Input โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ "{\"search_query\": \"2024 Taiwanese Presidential Election Hou Yu-ih vs. Ko Wen-je\", \"website\": \"https://www.csis.org/...\"}" โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Tool Output โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Relevant Content: Taiwanโs 2024 Presidential Election ... A three-way race between Lai (DPP), Hou Yu-ih (KMT), and Ko Wen-je (TPP). โ
โ Hou has little diplomatic experience; election framed as โwar vs peace.โ โ
โ ... โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
๐ Crew: crew
โโโ ๐ Task: url_discovery_task โ โ
Completed using Tavily Search Tool
โโโ ๐ Task: extract_structured_articles_task โ Executing (used Search in a specific website)
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
Agent Final Answer โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Taiwanโs 2024 Elections: Results and Implications โ
โ - William Lai (DPP) won with ~40% of the vote. โ
โ - Legislative Yuan split: DPP 51, KMT 52, TPP 8 โ no majority. โ
โ - Next four years: challenges in cross-Strait relations, U.S. support needed, domestic divisions persist. โ
โ ... โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
- Add sentiment analysis module for enriched market datasets.
- Build predictive ML models for price forecasting and strategy optimization.
- Extend pipeline to other market APIs beyond Polymarket.