Skip to content

aaronwe/comment-analysis

Repository files navigation

comment-analysis

For analyzing comments submitted to regulations.gov.

Yes, this is a bit messy and could get a whole lot cleaner if we used this all the time. It's basically a one-off, so it's not the cleanest bit of scripting you've ever seen. I'd probably build the whole thing in Python/Pandas if I had to do it again, but csvkit and bash get the job done. Searching is significantly slower than if you used Pandas, however.

Prerequisites

  • Jupyter Notebook (Highly recommend using virtual environments. pip install jupyter if you're already using Python)
  • Pandas (pip install pandas)
  • markegge's get-comments-with-api notebook
  • csvkit (pip install csvkit)
  • jot (included in MacOS, must compile from source on other platforms. Alternately, use another random number generator in line 19 of generate-random.sh.)
  • GNU core utilities (included in Linux, must install on MacOS using brew install coreutils)

Step-by-step

  1. Run get-comments-with-api from Jupyter Notebook to download the full comment set. (Alternately, export the notebook to a .py file and run that from the command line.) Note that you need an API key from data.gov to download all the comments.
  2. Copy comments.csv into your working directory.
  3. Run sh match-random.sh to clean comments.csv and pick 1000 random comments from it.
  4. Run sh search-comments.sh utah-residents.txt to find possible comments from Utah residents (output is in utah-residents.csv)
  5. Run sh random-from-search.sh 1000 utah-residents.csv to pick 1000 random comments.
  6. Import export-1000-random.csv and utah-residents-random.csv into a spreadsheet (we used Google Docs for simultaneous editing) and code each comment by hand.

Notes

  • If you just want to search the comment set for a bunch of terms, first generate a clean.csv file: csvclean -l comments.csv && mv comments_out.csv clean.csv

  • Then put your search terms into a .txt file, one term per line. (csvgrep uses regex, so terms like liv(e|ed|ing) in utah will find people who live, lived, or are living in Utah. , utah (\d*) finds digits (like a zip code) after comma-space-utah.)

  • run sh search-comments.sh [myfile.txt] to search clean.csv for all the terms in your text file. Output will be in [myfile].csv.]

  • This all works for me on MacOS Sierra. It should work fine on Linux, but in line 19 of generate-random.sh, you'll need to change gshuf to shuf.

  • Our analysis of 650,000 comments posted as of 7:00 am MDT Monday, July 10 is available here and here.

About

For analyzing comments to regulations.gov

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published