behindbarstools is an R package with the set of data tools used by the
UCLA Law COVID-19 Behind Bars
Project – a data project that
collects and reports facility-level data on COVID-19 in jails, prisons,
and other carceral facilities. behindbarstools includes a variety of
functions to help pull, clean, wrangle, and visualize our data.
Warning: This package is actively under development.
# Install directly from GitHub
devtools::install_github("uclalawcovid19behindbars/behindbarstools")The read_scrape_data() function can be used to load our data.
behindbarstools also includes functions to more easily load related
data from other organizations including the Vera Institute’s Jail
Population Data
through read_vera_pop() and the Department of Homeland Security’s
Homeland Infrastructure Foundation-Level
Data
through read_hifld_data().
library(behindbarstools)
# Pull latest data
latest_scraped <- read_scrape_data()
# Pull historical scraped data for California
scraped_CA <- read_scrape_data(all_dates = TRUE,
state = "California")The majority of the functions in behindbarstools help standardize our
ETL and data cleaning process. This includes functions to help with the
following:
- Cleaning facility names, e.g.
clean_fac_col_txt(),clean_facility_name() - Coalescing data from various sources,
e.g.
coalesce_with_warnings(),group_by_coalesce() - Enforcing data validation, e.g.
is_valid_state(),is_federal() - Standardizing our data scraping infrastructure,
e.g.
ExtractTable(),get_src_by_attr()
See our package documentation for more information and examples for each function.
behindbarstools also includes functions to create data visualizations.
This includes a custom ggplot2 theme called theme_behindbars() that
incorporates our team’s style guide. All plotting functions return
ggplot objects,
making it easy to customize and add additional layers.
# Plot cumulative COVID-19 cases in the Los Angeles Jails over the past 30 days
plot_fac_trend(fac_name = "Los Angeles Jails",
state = "California",
metric = "Residents.Confirmed",
plot_days = 30,
auto_label = TRUE) +
theme_behindbars(base_size = 14) +
ggplot2::ylim(3500, 4000) +
ggplot2::theme(legend.position = "none")# Plot the 3 facilities with the largest recent spikes in active COVID-19 cases
plot_recent_fac_increases(metric = "Residents.Active",
plot_days = 60,
num_fac = 3,
auto_label = TRUE) +
theme_behindbars(base_size = 14) +
ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1),
plot.tag.position = c(0.80, 0.05))
