eCFR Regulation Analysis Tool

A web application to analyze Federal Regulations from the eCFR API, providing insights on word count, historical changes, checksums, and obligation density metrics.

Project Structure

.
├── backend/          # FastAPI backend
├── frontend/         # Streamlit frontend
├── scripts/          # Data ingestion scripts
├── data/             # Raw XML files and SQLite database (gitignored)
└── requirements.txt  # Python dependencies

Setup

Install uv (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

Create virtual environment and install dependencies:

uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -r requirements.txt

Or use the setup script:

./setup.sh

Ingest initial data (downloads CFR titles 2, 5, 8, and 15):
```
python scripts/ingest.py
```
Start the backend API:
```
python backend/app.py
```
The API will run on http://localhost:8000
Start the frontend (in a new terminal):
```
streamlit run frontend/app.py
```
The UI will open in your browser at http://localhost:8501

Usage

Data Ingestion: Run scripts/ingest.py to download CFR titles and compute metrics. The script will:
- Download the latest issue date for each title
- Fetch historical data for the past 3 months
- Compute word count, checksum (SHA-256), and obligation density metrics
- Store data in SQLite database
View Dashboard: Open the Streamlit app to:
- Select an agency from the dropdown
- Select a snapshot date (only dates available for that agency)
- View current metrics (word count, obligation density, checksum)
- Analyze historical trends with interactive charts
API Access: Use the FastAPI endpoints at http://localhost:8000/docs for programmatic access:
- GET /api/agencies - List all agencies
- GET /api/agency/{agency_id}/snapshots - Get snapshots for an agency
- GET /api/metrics/agency/{agency_id}/history - Get historical metrics
- GET /api/metrics/agency/{agency_id}/snapshot/{date} - Get metrics for specific snapshot

Features

Word Count: Total word count per agency at each snapshot
Historical Changes Tracking: Compare metrics across different snapshot dates
Checksum (SHA-256): Detect changes even if word count barely moves
Obligation Density: Custom metric counting obligation terms ("shall", "must", "required", "prohibited", "may not") per 1,000 words
Time Series Analysis: Interactive charts showing trends over time

CFR Titles Currently Ingested

Title 2: Federal Financial Assistance
Title 5: Administrative Personnel
Title 8: Aliens and Nationality
Title 15: Commerce and Foreign Trade

Technology Stack

Backend: FastAPI, SQLAlchemy, SQLite
Frontend: Streamlit, Plotly
Data Processing: lxml, requests
Package Management: uv

Data Storage

Raw XML files are stored in data/raw/title-{N}/ directories
Metrics are stored in SQLite database at data/app.db
Both are gitignored and should not be committed to version control

Notes

The ingestion script processes historical data (past 3 months) for each title
Each agency gets a unique ID per title to avoid conflicts
The frontend automatically filters snapshots based on the selected agency

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

eCFR Regulation Analysis Tool

Project Structure

Setup

Usage

Features

CFR Titles Currently Ingested

Technology Stack

Data Storage

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
frontend		frontend
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

eCFR Regulation Analysis Tool

Project Structure

Setup

Usage

Features

CFR Titles Currently Ingested

Technology Stack

Data Storage

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages