Skip to content

Commit e98f924

Browse files
committed
Use dplyr::distinct in {latest,earliest}_issue
Profiling revealed that latest_issue was responsible for a large portion of the time taken in building correlation-utils.Rmd (apart from downloading the data). Much of this time was spent in dplyr::filter. Rather than grouping by geography and time, we can use dplyr::distinct, knowing that each geo_value and time_value should appear only once per issue date. By taking the first or last (after sorting by issue date), we get the desired result. dplyr does not document algorithmic details, so I can't easily give O(n) notation here. Algorithmic details notwithstanding, the results are extraordinary: > nrow(d) [1] 203360 > system.time(latest_issue_old(d)) user system elapsed 6.395 0.037 6.465 > system.time(latest_issue(d)) user system elapsed 0.025 0.003 0.027
1 parent 4e22d6e commit e98f924

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

R-packages/covidcast/R/utils.R

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@ latest_issue <- function(df) {
1515
attrs <- attrs[!(names(attrs) %in% c("row.names", "names"))]
1616

1717
df <- df %>%
18-
dplyr::group_by(.data$geo_value, .data$time_value) %>%
19-
dplyr::filter(.data$issue == max(.data$issue)) %>%
20-
dplyr::ungroup()
18+
dplyr::arrange(dplyr::desc(.data$issue)) %>%
19+
dplyr::distinct(.data$geo_value, .data$time_value,
20+
.keep_all = TRUE)
2121

2222
attributes(df) <- c(attributes(df), attrs)
2323

@@ -41,9 +41,9 @@ earliest_issue <- function(df) {
4141
attrs <- attrs[!(names(attrs) %in% c("row.names", "names"))]
4242

4343
df <- df %>%
44-
dplyr::group_by(.data$geo_value, .data$time_value) %>%
45-
dplyr::filter(.data$issue == min(.data$issue)) %>%
46-
dplyr::ungroup()
44+
dplyr::arrange(.data$issue) %>%
45+
dplyr::distinct(.data$geo_value, .data$time_value,
46+
.keep_all = TRUE)
4747

4848
attributes(df) <- c(attributes(df), attrs)
4949

0 commit comments

Comments
 (0)