This project scrapes legislative bills from the Idaho Legislature and uses the OpenAI API to detect potential constitutional issues.
-
Install Python 3.13+ and create a virtual environment (optional but recommended).
-
Install dependencies:
pip install -r requirements.txt
Run the scraper:
python scrape.py
Upon completion, the script will output a string representing the date of the scrape and the directory where the data is stored. This value is referred to as the DATARUN
, and should be exported as an environment variable for use in subsequent steps. For example:
export DATARUN=04_30_2025
This step converts the downloaded PDF files into HTML while preserving formatting like strikethroughs and underlines, which are essential for interpreting legislative changes.
-
Make sure the
DATARUN
environment variable is set:export DATARUN=04_30_2025
-
Set your Adobe PDF Services credentials:
export PDF_SERVICES_CLIENT_ID="your_client_id_here" export PDF_SERVICES_CLIENT_SECRET="your_client_secret_here"
Start the conversion process:
python pdf_to_html.py
Note: This process may take several hours. It is intentionally throttled to avoid overloading external services.
After converting PDFs, run the ML analysis to detect constitutional conflicts using OpenAI.
-
Ensure
DATARUN
is set:export DATARUN=04_30_2025
-
Set your OpenAI API key (obfuscated):
export OPENAI_API_KEY="sk-***********************"
python ml_analysis.py
Finally, start the Streamlit app for visual exploration:
streamlit run bill_data_explorer.py
You can explore the interactive dashboard online here:
https://danielrmeyer-idaho-legislation-analys-bill-data-explorer-qxzijs.streamlit.app/
All processed data is stored in a subdirectory named after the DATARUN
value (e.g., 04_30_2025
). This enables archival and comparison of different scrape sessions over time.
- Fine-tune an OpenAI or Mistral model on historical Idaho legislation
- Automatically identify constitutional conflicts in proposed bills
- Provide a searchable legislative history for citizens and advocacy groups
This project is open-source. See LICENSE
for more information.
Contributions are welcome! Please open an issue or pull request with ideas or improvements.