From Web Scraping Scripts to Web Data APIs — Python Examples

Companion code for the Dev.to article "From Web Scraping Scripts to Web Data APIs: A Practical Python Guide" by Khalid Abdelaty.

This repository contains a fully working Python pipeline that demonstrates how to replace fragile scraping scripts with clean, API-driven data extraction using Olostep.

What This Covers

Step	Description
Single-page scrape	Extract Markdown and HTML from any URL via `/v1/scrapes`
Batch processing	Submit hundreds of URLs in one request via `/v1/batches`
Polling & retrieval	Wait for batch completion and fetch content via `/v1/retrieve`
Structured extraction	Pull structured JSON using `llm_extract`
Web Q&A	Ask natural language questions grounded on live web data via `/v1/answers`
Retry logic	Production-ready retry with exponential backoff via `tenacity`

Prerequisites

Python 3.9+
An Olostep account — the free tier gives you 500 requests/month with no credit card required

Setup

1. Clone the repository

git clone https://github.com/KhalidAbdelaty/olostep-python-guide.git
cd olostep-python-guide

2. Install dependencies

pip install -r requirements.txt

3. Configure your API key

cp .env.example .env

Open .env and replace your_api_key_here with your actual Olostep API key, which you can find in your dashboard.

OLOSTEP_API_KEY=your_api_key_here

Running the Pipeline

python pipeline.py

This runs all five steps in sequence and saves two output files:

scraped_results.csv — Markdown content from the batch job
products.csv — Structured product data from llm_extract

Project Structure

olostep-python-guide/
├── pipeline.py        # Main script with all examples
├── requirements.txt   # Python dependencies
├── .env.example       # API key template
├── .gitignore
└── README.md

Notes

llm_extract costs 20 credits per request. The free tier includes 500 credits/month.
retrieve_id values are valid for 7 days from the time of scraping.
New accounts have a default batch limit of 100 URLs. Contact Olostep support to raise it to 10,000.
The test URLs used in this example (books.toscrape.com) are a sandbox site specifically designed for scraping practice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

From Web Scraping Scripts to Web Data APIs — Python Examples

What This Covers

Prerequisites

Setup

Running the Pipeline

Project Structure

Notes

Learn More

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pipeline.py		pipeline.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

From Web Scraping Scripts to Web Data APIs — Python Examples

What This Covers

Prerequisites

Setup

Running the Pipeline

Project Structure

Notes

Learn More

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages