Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
a70f1df
Implement Google Cloud Batch for parallel policy impact calculations
PavelMakarchuk Oct 29, 2025
bb4513d
minor
PavelMakarchuk Oct 30, 2025
9a0250d
Merge branch 'main' of https://github.com/PolicyEngine/crfb-tob-impac…
PavelMakarchuk Oct 30, 2025
27ebaad
datasets
PavelMakarchuk Oct 30, 2025
101c690
Docker change
PavelMakarchuk Oct 30, 2025
cd13b40
bump ram
PavelMakarchuk Oct 30, 2025
cae5169
more ram and not spot
PavelMakarchuk Oct 30, 2025
432ef20
bump down and no cache
PavelMakarchuk Oct 30, 2025
cc1f40b
actually seems to work
PavelMakarchuk Oct 30, 2025
ae863c9
might be able to use even less ram
PavelMakarchuk Oct 30, 2025
af95d4f
proper computation flow
PavelMakarchuk Oct 31, 2025
65beca4
some scripts
PavelMakarchuk Oct 31, 2025
3ecbf23
correct dynamic data
PavelMakarchuk Oct 31, 2025
d6292ef
Add dict-returning functions for dynamic scoring
PavelMakarchuk Oct 31, 2025
bee2567
Fix KeyError: remove references to non-existent 'total_time' key
PavelMakarchuk Oct 31, 2025
8c13580
Fix dict merging for dynamic scoring: use manual iteration instead of…
PavelMakarchuk Oct 31, 2025
d6d4fe5
Fix dynamic scoring: use reform chaining instead of dict merging
PavelMakarchuk Oct 31, 2025
34d69f2
Use pre-merged dynamic reform dictionaries
PavelMakarchuk Oct 31, 2025
76651f2
Add comprehensive diagnostic logging to compute_year.py
PavelMakarchuk Oct 31, 2025
8a43116
reforms.py
PavelMakarchuk Oct 31, 2025
05601ff
dynamics
PavelMakarchuk Oct 31, 2025
be6b38b
a bunch of garbage code but also teh main code
PavelMakarchuk Oct 31, 2025
0530bc7
user guide
PavelMakarchuk Oct 31, 2025
42aef1f
remove garbage
PavelMakarchuk Oct 31, 2025
404dd57
automatic larger memory runners for 2026 and 2027 (isolated)
PavelMakarchuk Oct 31, 2025
5b3a963
documentation
PavelMakarchuk Oct 31, 2025
4e34185
2026 and 2027 values
PavelMakarchuk Nov 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions .gcloudignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
*.egg-info/
dist/
build/
.venv/
venv/
.pytest_cache/

# Jupyter
.ipynb_checkpoints/
jupyterbook/_build/
jupyterbook/.jupyter_cache/

# Node
node_modules/
policy-impact-dashboard/node_modules/
policy-impact-dashboard/build/

# Data files
data/*.csv
*.h5
*.hdf5

# Git
.git/
.gitignore

# IDE
.vscode/
.idea/
*.swp
*.swo

# Documentation builds
docs/
_build/
site/

# Logs
*.log

# OS
.DS_Store
Thumbs.db

# Only include what we need
!src/
!batch/
35 changes: 13 additions & 22 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,28 +1,19 @@
# Jupyter Book build outputs
jupyterbook/_build/
_build/

# Jupyter Notebook checkpoints
.ipynb_checkpoints/

# Python cache
__pycache__/
*.pyc
*.pyo

# Environment files
_build/
.DS_Store
.env
.idea/
.ipynb_checkpoints/
.pytest_cache/
.venv/

# IDE files
.vscode/
.idea/
*.log
*.pyc
*.pyo
*.tempresults/
*.tmp
jupyterbook/_build/
results/
settings.local.json

# OS files
.DS_Store
Thumbs.db

# Temporary files
*.tmp
*.temp
venv/
112 changes: 112 additions & 0 deletions CLOUD_BATCH_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Google Cloud Batch Guide

## How It Works

**Architecture:**
- Runs 75 parallel tasks (one per year, 2026-2100) on Google Cloud VMs
- Each VM executes `batch/compute_year.py` with PolicyEngine microsimulation
- Results saved to Cloud Storage as individual CSVs
- Combined into final results using `combine_results.sh`

**Automatic VM Sizing:**
- **Years 2026-2027:** `e2-highmem-8` (64GB RAM) - larger datasets require more memory
- **Years 2028-2100:** `e2-highmem-4` (32GB RAM) - standard configuration
- System automatically splits into 2 separate jobs when 2026-2027 are included
- Saves ~97% of extra costs by using expensive VMs only for 2/75 years

**Cost:** ~$2-3 per job (~$0.03 per year analyzed)

## Complete Workflow

### 1. Submit Job (Automatic Splitting)

```bash
# Submit for all 75 years (2026-2100)
PYTHONPATH=src python3 batch/submit_years.py \
--years $(seq -s, 2026 2100) \
--reforms option5 \
--scoring static

# System automatically creates 2 jobs:
# - Job 1: Years 2026-2027 with 64GB VMs
# - Job 2: Years 2028-2100 with 32GB VMs
#
# Output shows both job IDs and monitoring commands
```

### 2. Monitor Progress

```bash
# Use commands from submit output, e.g.:
./monitor_job.sh years-20251101-123456-abc123 option5 static &
./monitor_job.sh years-20251101-123457-def456 option5 static &

# Or check status directly:
gcloud batch jobs describe JOB_ID --location=us-central1
```

### 3. Combine Results

```bash
# After jobs complete, merge all CSVs into 2 final files
./combine_results.sh option5 JOB_ID_1 JOB_ID_2

# Output: option5_static_results.csv (all 75 years, sorted)
```

### 4. Repeat for Dynamic Scoring

```bash
# Same workflow with --scoring dynamic
PYTHONPATH=src python3 batch/submit_years.py \
--years $(seq -s, 2026 2100) \
--reforms option5 \
--scoring dynamic

# Monitor and combine same way
./combine_results.sh option5 JOB_ID_3 JOB_ID_4
# Output: option5_dynamic_results.csv
```

## Key Files

| File | Purpose |
|------|---------|
| `batch/submit_years.py` | Submits jobs with automatic VM sizing |
| `batch/compute_year.py` | Runs PolicyEngine simulation on each VM |
| `src/reforms.py` | Defines reform parameters |
| `combine_results.sh` | Merges individual CSVs into final output |
| `monitor_job.sh` | Tracks job progress |

## Storage Locations

- **Cloud Storage:** `gs://crfb-ss-analysis-results/results/<JOB_ID>/`
- **Local Results:** `{reform}_{scoring}_results.csv`

## Common Commands

```bash
# Check job status
gcloud batch jobs describe JOB_ID --location=us-central1

# List running jobs
gcloud batch jobs list --location=us-central1 --filter="state:RUNNING"

# Delete completed job (results already saved)
gcloud batch jobs delete JOB_ID --location=us-central1 --quiet

# Download CSVs manually
gsutil -m cp "gs://crfb-ss-analysis-results/results/JOB_ID/*.csv" ./temp/
```

## Troubleshooting

**Job shows FAILED but has results:**
- Check actual file count: `gsutil ls gs://.../results/JOB_ID/ | wc -l`
- Cloud Batch marks job failed if ANY task fails (even 2/75)
- Process results if 73+ years completed

**Tasks stuck in PENDING:**
- Check quota: `gcloud compute project-info describe | grep -A2 "CPUS"`
- Each job uses 300 CPUs (can run ~10 jobs simultaneously with 3,000 limit)
- Delete completed jobs to free resources
17 changes: 17 additions & 0 deletions all_reforms_dynamic_2026_2027.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
reform_name,year,baseline_revenue,reform_revenue,revenue_impact,scoring_type
option1,2026,2178.0,2088.47,-89.53,dynamic
option1,2027,2293.8,2197.06,-96.74,dynamic
option2,2026,2178.0,2204.2,26.2,dynamic
option2,2027,2293.8,2320.62,26.82,dynamic
option3,2026,2178.0,2204.2,26.2,dynamic
option3,2027,2293.8,2320.62,26.82,dynamic
option4,2026,2178.0,2211.29,33.29,dynamic
option4,2027,2293.8,2327.75,33.95,dynamic
option5,2026,2178.0,2221.31,43.31,dynamic
option5,2027,2293.8,2338.62,44.82,dynamic
option6,2026,2178.0,2195.1,17.1,dynamic
option6,2027,2293.8,2329.7,35.9,dynamic
option7,2026,2178.0,2200.97,22.96,dynamic
option7,2027,2293.8,2317.38,23.58,dynamic
option8,2026,2178.0,2232.59,54.59,dynamic
option8,2027,2293.8,2350.33,56.53,dynamic
17 changes: 17 additions & 0 deletions all_reforms_static_2026_2027.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
reform_name,year,baseline_revenue,reform_revenue,revenue_impact,scoring_type
option1,2026,2178.0,2087.55,-90.45,static
option1,2027,2293.8,2196.31,-97.49,static
option2,2026,2178.0,2203.72,25.72,static
option2,2027,2293.8,2320.23,26.43,static
option3,2026,2178.0,2203.72,25.72,static
option3,2027,2293.8,2320.23,26.43,static
option4,2026,2178.0,2210.79,32.79,static
option4,2027,2293.8,2327.46,33.66,static
option5,2026,2178.0,2232.03,54.03,static
option5,2027,2293.8,2349.14,55.34,static
option6,2026,2178.0,2196.68,18.68,static
option6,2027,2293.8,2333.75,39.95,static
option7,2026,2178.0,2201.06,23.06,static
option7,2027,2293.8,2317.57,23.77,static
option8,2026,2178.0,2232.15,54.14,static
option8,2027,2293.8,2350.51,56.71,static
38 changes: 38 additions & 0 deletions batch/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
FROM python:3.13-slim

# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
curl \
&& rm -rf /var/lib/apt/lists/*

# Set working directory
WORKDIR /app

# Copy requirements and install Python dependencies
COPY batch/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy source code (will be available when we build from parent directory)
COPY src/ /app/src/

# Copy batch worker scripts
COPY batch/compute_baseline.py /app/
COPY batch/compute_reform.py /app/
COPY batch/compute_year.py /app/batch/

# Make scripts executable
RUN chmod +x /app/compute_baseline.py /app/compute_reform.py /app/batch/compute_year.py

# Set PYTHONPATH for imports
ENV PYTHONPATH=/app/src

# NOTE: Dataset pre-caching was causing OOM issues in cloud (28GB+ RAM usage)
# even though local execution only uses 0.9GB. Letting datasets download at runtime instead.
# Each dataset download adds ~30-60 seconds but avoids the memory issue.

# Set Python to run in unbuffered mode (see output in real-time)
ENV PYTHONUNBUFFERED=1

# Default command (will be overridden by Cloud Batch)
CMD ["python", "--version"]
Loading