Companion code for the Dev.to article "From Web Scraping Scripts to Web Data APIs: A Practical Python Guide" by Khalid Abdelaty.
This repository contains a fully working Python pipeline that demonstrates how to replace fragile scraping scripts with clean, API-driven data extraction using Olostep.
| Step | Description |
|---|---|
| Single-page scrape | Extract Markdown and HTML from any URL via /v1/scrapes |
| Batch processing | Submit hundreds of URLs in one request via /v1/batches |
| Polling & retrieval | Wait for batch completion and fetch content via /v1/retrieve |
| Structured extraction | Pull structured JSON using llm_extract |
| Web Q&A | Ask natural language questions grounded on live web data via /v1/answers |
| Retry logic | Production-ready retry with exponential backoff via tenacity |
- Python 3.9+
- An Olostep account — the free tier gives you 500 requests/month with no credit card required
1. Clone the repository
git clone https://github.com/KhalidAbdelaty/olostep-python-guide.git
cd olostep-python-guide2. Install dependencies
pip install -r requirements.txt3. Configure your API key
cp .env.example .envOpen .env and replace your_api_key_here with your actual Olostep API key, which you can find in your dashboard.
OLOSTEP_API_KEY=your_api_key_here
python pipeline.pyThis runs all five steps in sequence and saves two output files:
scraped_results.csv— Markdown content from the batch jobproducts.csv— Structured product data fromllm_extract
olostep-python-guide/
├── pipeline.py # Main script with all examples
├── requirements.txt # Python dependencies
├── .env.example # API key template
├── .gitignore
└── README.md
llm_extractcosts 20 credits per request. The free tier includes 500 credits/month.retrieve_idvalues are valid for 7 days from the time of scraping.- New accounts have a default batch limit of 100 URLs. Contact Olostep support to raise it to 10,000.
- The test URLs used in this example (
books.toscrape.com) are a sandbox site specifically designed for scraping practice.