A cloud-native pipeline that scrapes apartment listings, processes housing data on AWS, and delivers daily insights via email and dashboard.
AWS (Lambda, ECS, ECR, S3, RDS, EC2, Secrets Manager, IAM, CloudWatch) · Apache Airflow · Postgres · Docker · Flask
-
AWS-native design — separation of orchestration (Airflow) and compute (Lambda, ECS) for scalibility.
-
Automated ETL — daily and replayable Airflow DAGs for processing and backfilling data.
-
Optimized performance — resolved multiprocessing bottleneck in scraper, achieving ~27% faster parsing. See detailed analysis
-
Structured data flow — raw S3 layer as immutable record source, transformed into transactional and analytical tables in RDS.
-
Secure by design — Secrets Manager and least-privilege IAM.
-
Email and metrics serving — automated email alerts and a Flask dashboard for housing metrics.
The architecture separates orchestration, transformation, and serving layers for modularity and fault isolation.

demo_web_compressed.mov
- IaC: AWS CDK or Terraform for provisioning AWS resources
- Google Maps API: Calculate the distance from a targeted location with a subscription database
- Scrape data from different websites
- Cold start for ECS scraper
- Distributed scraping to speed up and avoid potential anti-scraping
Contributions and feedback are welcome! Please submit a pull request (PR) or open an issue to get involved.