A web application to analyze Federal Regulations from the eCFR API, providing insights on word count, historical changes, checksums, and obligation density metrics.
.
├── backend/ # FastAPI backend
├── frontend/ # Streamlit frontend
├── scripts/ # Data ingestion scripts
├── data/ # Raw XML files and SQLite database (gitignored)
└── requirements.txt # Python dependencies
-
Install
uv(if not already installed):curl -LsSf https://astral.sh/uv/install.sh | sh -
Create virtual environment and install dependencies:
uv venv source .venv/bin/activate # On Windows: .venv\Scripts\activate uv pip install -r requirements.txt
Or use the setup script:
./setup.sh
-
Ingest initial data (downloads CFR titles 2, 5, 8, and 15):
python scripts/ingest.py
-
Start the backend API:
python backend/app.py
The API will run on http://localhost:8000
-
Start the frontend (in a new terminal):
streamlit run frontend/app.py
The UI will open in your browser at http://localhost:8501
-
Data Ingestion: Run
scripts/ingest.pyto download CFR titles and compute metrics. The script will:- Download the latest issue date for each title
- Fetch historical data for the past 3 months
- Compute word count, checksum (SHA-256), and obligation density metrics
- Store data in SQLite database
-
View Dashboard: Open the Streamlit app to:
- Select an agency from the dropdown
- Select a snapshot date (only dates available for that agency)
- View current metrics (word count, obligation density, checksum)
- Analyze historical trends with interactive charts
-
API Access: Use the FastAPI endpoints at http://localhost:8000/docs for programmatic access:
GET /api/agencies- List all agenciesGET /api/agency/{agency_id}/snapshots- Get snapshots for an agencyGET /api/metrics/agency/{agency_id}/history- Get historical metricsGET /api/metrics/agency/{agency_id}/snapshot/{date}- Get metrics for specific snapshot
- Word Count: Total word count per agency at each snapshot
- Historical Changes Tracking: Compare metrics across different snapshot dates
- Checksum (SHA-256): Detect changes even if word count barely moves
- Obligation Density: Custom metric counting obligation terms ("shall", "must", "required", "prohibited", "may not") per 1,000 words
- Time Series Analysis: Interactive charts showing trends over time
- Title 2: Federal Financial Assistance
- Title 5: Administrative Personnel
- Title 8: Aliens and Nationality
- Title 15: Commerce and Foreign Trade
- Backend: FastAPI, SQLAlchemy, SQLite
- Frontend: Streamlit, Plotly
- Data Processing: lxml, requests
- Package Management: uv
- Raw XML files are stored in
data/raw/title-{N}/directories - Metrics are stored in SQLite database at
data/app.db - Both are gitignored and should not be committed to version control
- The ingestion script processes historical data (past 3 months) for each title
- Each agency gets a unique ID per title to avoid conflicts
- The frontend automatically filters snapshots based on the selected agency