AgentQL Scraper is a developer-friendly project that demonstrates how to query, extract, and automate data from live websites using natural language. It showcases how AgentQL turns complex web pages into structured, usable data with minimal effort. This project is ideal for anyone exploring modern web automation powered by AI-driven queries.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for agentql-demo you've just found your team — Let’s Chat. 👆👆
This project provides a practical implementation of AgentQL for querying and automating interactions on real-world websites. It removes the friction of brittle selectors and constant script rewrites by relying on natural language queries that adapt as pages change.
It’s built for developers, data engineers, and automation teams who want reliable data extraction without spending hours maintaining scrapers.
- Uses natural language queries instead of fragile CSS or XPath selectors
- Works on dynamic, authenticated, and JavaScript-heavy websites
- Produces structured output directly from query definitions
- Adapts automatically to UI and layout changes over time
| Feature | Description |
|---|---|
| Natural Language Queries | Define what data you want using plain, human-readable queries. |
| Structured Data Output | Query shape directly controls the structure of returned data. |
| Cross-Site Compatibility | Reuse the same queries across similar websites. |
| Playwright Integration | Seamlessly automate browsers for complex workflows. |
| Resilient Automation | Continues working even when page layouts change. |
| Multiple SDK Support | Works with both Python and JavaScript environments. |
| Field Name | Field Description |
|---|---|
| query | The natural language query sent to the AgentQL engine. |
| results | Structured data returned based on the query definition. |
| metadata | Contextual information about the page and execution. |
| timestamps | Execution and data extraction time details. |
| status | Success or failure state of the query execution. |
[
{
"query": "Get all product names and prices",
"results": [
{
"name": "Wireless Headphones",
"price": "$129.99"
},
{
"name": "Bluetooth Speaker",
"price": "$79.00"
}
],
"status": "success",
"timestamp": 1723459200
}
]
AgentQL Demo/
├── src/
│ ├── runner.py
│ ├── queries/
│ │ └── sample_queries.py
│ ├── automation/
│ │ ├── browser.py
│ │ └── session.py
│ ├── outputs/
│ │ └── formatter.py
│ └── config/
│ └── settings.example.json
├── data/
│ └── sample_output.json
├── requirements.txt
└── README.md
- QA engineers use it to automate browser testing with self-healing selectors, reducing flaky tests.
- Data analysts use it to extract structured datasets from dynamic websites without manual cleanup.
- Automation teams use it to build workflows that survive frequent UI updates.
- Developers use it to prototype web integrations faster with minimal selector maintenance.
Does this scraper work on dynamic or JavaScript-heavy websites? Yes. The scraper is designed to work on fully rendered pages, including those built with modern JavaScript frameworks.
Do I need to maintain selectors when the website layout changes? No. Natural language queries adapt to UI changes, reducing or eliminating manual updates.
Which programming languages are supported? The project supports both Python and JavaScript through dedicated SDKs.
Can it handle authenticated or private pages? Yes. It supports authenticated sessions and can operate behind login flows.
Primary Metric: Average query execution completes within 2–4 seconds on complex pages.
Reliability Metric: Maintains over a 95% success rate across repeated runs on changing layouts.
Efficiency Metric: Handles multiple queries per session with minimal browser restarts, reducing resource usage.
Quality Metric: Consistently returns complete, structured datasets aligned with query definitions.
