Skip to content

ffahmed/eCFRAssignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

eCFR Regulation Analysis Tool

A web application to analyze Federal Regulations from the eCFR API, providing insights on word count, historical changes, checksums, and obligation density metrics.

Project Structure

.
├── backend/          # FastAPI backend
├── frontend/         # Streamlit frontend
├── scripts/          # Data ingestion scripts
├── data/             # Raw XML files and SQLite database (gitignored)
└── requirements.txt  # Python dependencies

Setup

  1. Install uv (if not already installed):

    curl -LsSf https://astral.sh/uv/install.sh | sh
  2. Create virtual environment and install dependencies:

    uv venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    uv pip install -r requirements.txt

    Or use the setup script:

    ./setup.sh
  3. Ingest initial data (downloads CFR titles 2, 5, 8, and 15):

    python scripts/ingest.py
  4. Start the backend API:

    python backend/app.py

    The API will run on http://localhost:8000

  5. Start the frontend (in a new terminal):

    streamlit run frontend/app.py

    The UI will open in your browser at http://localhost:8501

Usage

  1. Data Ingestion: Run scripts/ingest.py to download CFR titles and compute metrics. The script will:

    • Download the latest issue date for each title
    • Fetch historical data for the past 3 months
    • Compute word count, checksum (SHA-256), and obligation density metrics
    • Store data in SQLite database
  2. View Dashboard: Open the Streamlit app to:

    • Select an agency from the dropdown
    • Select a snapshot date (only dates available for that agency)
    • View current metrics (word count, obligation density, checksum)
    • Analyze historical trends with interactive charts
  3. API Access: Use the FastAPI endpoints at http://localhost:8000/docs for programmatic access:

    • GET /api/agencies - List all agencies
    • GET /api/agency/{agency_id}/snapshots - Get snapshots for an agency
    • GET /api/metrics/agency/{agency_id}/history - Get historical metrics
    • GET /api/metrics/agency/{agency_id}/snapshot/{date} - Get metrics for specific snapshot

Features

  • Word Count: Total word count per agency at each snapshot
  • Historical Changes Tracking: Compare metrics across different snapshot dates
  • Checksum (SHA-256): Detect changes even if word count barely moves
  • Obligation Density: Custom metric counting obligation terms ("shall", "must", "required", "prohibited", "may not") per 1,000 words
  • Time Series Analysis: Interactive charts showing trends over time

CFR Titles Currently Ingested

  • Title 2: Federal Financial Assistance
  • Title 5: Administrative Personnel
  • Title 8: Aliens and Nationality
  • Title 15: Commerce and Foreign Trade

Technology Stack

  • Backend: FastAPI, SQLAlchemy, SQLite
  • Frontend: Streamlit, Plotly
  • Data Processing: lxml, requests
  • Package Management: uv

Data Storage

  • Raw XML files are stored in data/raw/title-{N}/ directories
  • Metrics are stored in SQLite database at data/app.db
  • Both are gitignored and should not be committed to version control

Notes

  • The ingestion script processes historical data (past 3 months) for each title
  • Each agency gets a unique ID per title to avoid conflicts
  • The frontend automatically filters snapshots based on the selected agency

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors