Data pipeline and API that aggregates property market data, validates it deterministically, and serves pre-computed metrics for AI-driven SEO content generation.
Collects property transaction data, floor plan records, and market indicators from public and licensed data sources. Python validation layers enforce plausibility rules before any data is exposed to the Claude API for content generation. Output feeds production websites across three markets.
Data Sources (DLD transactions, property portals, GSC, Ahrefs)
--> Ingestion (Python scrapers + BigQuery load jobs)
--> Validation (DuckDB, deterministic Python rules)
--> API (FastAPI, pre-computed metrics endpoints)
--> Content Generation (Claude API narrates validated data)
--> Re-validation (Python checks generated content against source data)
Anti-hallucination pattern: Python computes all metrics deterministically. Claude narrates pre-validated numbers. Python re-validates the output before any page is published.
- Python 3.11 / FastAPI
- DuckDB (local analytics) / BigQuery (production data warehouse)
- Claude API (claude-sonnet-4-6) for content generation
- Google Cloud Storage / BigQuery for data storage
- Ahrefs API / Google Search Console API for SEO signals
- Python 3.11+
- Google Cloud project with BigQuery enabled
- Anthropic API key
- Ahrefs API key (optional, for keyword data)
git clone https://github.com/shahe-dev/real-estate-intelligence-platform
cd real-estate-intelligence-platform
cp .env.example .env
# Edit .env with your credentials
pip install -r requirements.txtuvicorn src.api.main:app --reloadconfig/ # Configuration and validation rules
data/
raw/ # Raw CSV files (not in git)
database/ # DuckDB files (not in git)
generated_content/
src/
etl/ # Data loading and validation
metrics/ # Pre-calculated metrics
api/ # FastAPI endpoints
content/ # AI content generation
analytics/ # Keyword and citation intelligence
dashboard/ # Admin interface
utils/ # Shared utilities
scripts/ # One-off analysis and maintenance scripts
tests/ # Unit tests
See .env.example for the full list. Required variables:
ANTHROPIC_API_KEY- Anthropic API keyGOOGLE_PROJECT_ID- GCP project ID for BigQueryGOOGLE_CLIENT_EMAIL- Service account emailGOOGLE_PRIVATE_KEY- Service account private key (or useGOOGLE_SERVICE_ACCOUNT_FILE)
Active. Serving data to production content pipelines across multiple markets.
PolyForm Noncommercial 1.0.0 -- free for personal and research use, not for commercial use.