Skip to content

Download and consolidate time series from the Banco Central de Reserva del Perú (BCRP) API. This tool scrapes metadata, downloads 16k+ series in async batches, and stores them as Parquet files for fast analytics.

Notifications You must be signed in to change notification settings

Tooruogata/bcrp-data-hub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BCRP Series Downloader

Download and consolidate time series from the Banco Central de Reserva del Perú (BCRP) API. This tool scrapes metadata, downloads 16k+ series in async batches, and stores them as Parquet files for fast analytics.


🚀 Quick Start

  1. Install dependencies

Clone the repository:

git clone https://github.com/Tooruogata/bcrp-data-hub.git
cd bcrp-data-hub

Set up the docker image:

docker build -t bcrp-data-hub:latest -f .devcontainer/Dockerfile .
docker run -dit --name bcrp-data-hub -v "$repopath:/workspace" -w /workspace bcrp-data-hub:latest
  1. Run script

    control_panel()  # Create control CSV from metadata
    
    await run_batch(
        df_control=df_control,
        formato="json",
        inicio="2000-1",
        fin="2025-12",
        idioma="esp",
        output_dir="../data/bronze/",
        batch_size=100
    )
    
    duckdb.sql("""
        COPY (
            SELECT * FROM read_parquet('../data/bronze/*.parquet')
        ) TO '../data/silver/series_all.parquet' (FORMAT PARQUET);
    """)

📁 Structure

project/
├── data/
│   ├── bronze/         # Raw per-series files (Parquet)
│   ├── silver/         # Consolidated Parquet dataset
│   └── metadata/       # Control panel CSV
├── pull_data.ipynb     # Notebook to pull and download
└── README.md

🧰 Features

  • ✅ Scrapes official BCRP metadata
  • ⚡ Async download of 16k+ series
  • 📁 Saves individual files to bronze/
  • 🧹 Merges all into silver/series_all.parquet
  • ⏱️ ~5 min for 16k series, ~50s to consolidate

🔍 Query Example

import duckdb

df = duckdb.sql("SELECT * FROM read_parquet('../data/silver/series_all.parquet')").df()

About

Download and consolidate time series from the Banco Central de Reserva del Perú (BCRP) API. This tool scrapes metadata, downloads 16k+ series in async batches, and stores them as Parquet files for fast analytics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published