Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,5 @@
^\.cache$
^\.vscode$
^[\.]?air\.toml$
^\.claude$
^CLAUDE\.md$
16 changes: 16 additions & 0 deletions .claude/settings.local.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"$schema": "https://json.schemastore.org/claude-code-settings.json",
"permissions": {
"allow": [
"Bash(find:*)",
"Bash(R:*)",
"Bash(Rscript:*)",
"Bash(air format:*)",
"Edit(R/**)",
"Edit(tests/**)",
"Edit(vignettes/**)",
"Edit(NEWS.md)"
],
"deny": []
}
}
76 changes: 76 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# CLAUDE.md
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plucked from ellmer, with modifications. I'm just trying it out.


This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Package Overview

dplyr is a tidyverse R package providing a grammar of data manipulation.

## Key development commands

### General

- When running R from the console, always run it with `--quiet --vanilla`
- Always run `air format .` after generating code

### Testing

- Tests for `R/{name}.R` go in `tests/testthat/test-{name}.R`
- Use `devtools::test(reporter = "check")` to run all tests
- Use `devtools::test(filter = "{name}", reporter = "check")` to run tests for `R/{name}.R`
- DO NOT USE `devtools::test_active_file()`
- All testing functions automatically load code; you don't need to
- All new code should have an accompanying test
- If there are existing tests, place new tests next to similar existing tests

### Documentation

- Run `devtools::document()` after changing any roxygen2 docs
- Every user facing function should be exported and have roxygen2 documentation
- Whenever you add a new documentation file, make sure to also add the topic name to `_pkgdown.yml`
- Run `pkgdown::check_pkgdown()` to check that all topics are included in the reference index
- Use sentence case for all headings
- Any user facing changes should be briefly described in a bullet point at the top of `NEWS.md`, following the tidyverse style guide (https://style.tidyverse.org/news.html)
- Any bullets added to `NEWS.md` should always be placed at the top, just under `# {package} (development version)`

### Code style

- Use newspaper style/high-level first function organisation. Main logic at the top and helper functions should come below
- Don't define functions inside of functions unless they are very brief
- Error messages should use `cli::cli_abort()` and follow the tidyverse style guide (https://style.tidyverse.org/errors.html)

## Architecture

### Key Design Patterns

#### Standalone Imports
- Self-contained utility functions
- `import-standalone-*.R` files reduce dependencies
- Imported from other tidyverse packages

## Key Files

### Core Implementation
- `R/` - R implementation files
- `src/` - C/C++ implementation files

### Testing and Quality
- `tests/testthat/` - Test suite
- `vignettes/` - Documentation and examples
- `.github/workflows/` - CI/CD with R CMD check

## Development Notes

### Testing Strategy
- Snapshot testing for output validation
- Separate test files for each major component

### Code Organization
- Utility functions grouped by purpose (`utils-*.R`)
- Standalone imports minimize external dependencies

### Documentation
- Roxygen2 comments for all exported functions
- Vignettes demonstrate key use cases
- pkgdown site provides comprehensive documentation
- Examples use realistic but safe API interactions
12 changes: 8 additions & 4 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# dplyr (development version)

* `if_any()` and `if_all()` are now more consistent in all use cases (#7059, #7077, #7746, @jrwinget). In particular:

* When called with zero inputs, `if_any()` returns `FALSE` and `if_all()` returns `TRUE`.

* When called with one input, both now return logical vectors rather than the original column.

* The result of applying `.fns` now must be a logical vector.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change, but hopefully not a common issue? We really do want the result of .fns to be logical vectors, particularly for #7746 to work right.

The vctrs functions for pany/pall will probably also be pretty strict about requiring logical inputs.


* Empty `rowwise()` list-column elements now resolve to `logical()` rather than a random logical of length 1 (#7710).

* `last_dplyr_warnings()` no longer prevents objects from being garbage collected (#7649).
Expand Down Expand Up @@ -97,10 +105,6 @@
* Fixed an issue where duckplyr's ALTREP data frames were being materialized
early due to internal usage of `ncol()` (#7049).

* `if_any()` and `if_all()` are now fully consistent with `any()` and `all()`.
In particular, when called with empty inputs `if_any()` returns `FALSE` and
`if_all()` returns `TRUE` (#7059, @jrwinget).

# dplyr 1.1.4

* `join_by()` now allows its helper functions to be namespaced with `dplyr::`,
Expand Down
164 changes: 104 additions & 60 deletions R/across.R
Original file line number Diff line number Diff line change
Expand Up @@ -349,31 +349,87 @@ across <- function(.cols, .fns, ..., .names = NULL, .unpack = FALSE) {
if_any <- function(.cols, .fns, ..., .names = NULL) {
context_local("across_if_fn", "if_any")
context_local("across_frame", current_env())
if_across(`|`, across({{ .cols }}, .fns, ..., .names = .names))
df <- across({{ .cols }}, .fns, ..., .names = .names)
x <- dplyr_new_list(df)
size <- vec_size(df)
dplyr_list_pany(x, size = size)
}

#' @rdname across
#' @export
if_all <- function(.cols, .fns, ..., .names = NULL) {
context_local("across_if_fn", "if_all")
context_local("across_frame", current_env())
if_across(`&`, across({{ .cols }}, .fns, ..., .names = .names))
df <- across({{ .cols }}, .fns, ..., .names = .names)
x <- dplyr_new_list(df)
size <- vec_size(df)
dplyr_list_pall(x, size = size)
}

if_across <- function(op, df) {
n <- nrow(df)
dplyr_list_pall <- function(
x,
...,
size = NULL,
error_call = caller_env()
) {
dplyr_list_pany_pall(x, "all", ..., size = size, error_call = error_call)
}

dplyr_list_pany <- function(
x,
...,
size = NULL,
error_call = caller_env()
) {
dplyr_list_pany_pall(x, "any", ..., size = size, error_call = error_call)
}

dplyr_list_pany_pall <- function(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping to just remove this in favor of the vctrs versions soon, but I needed to get the semantics of it right enough to be able to add all the tests here

x,
variant,
...,
size = NULL,
error_call = caller_env()
) {
check_dots_empty0(...)

obj_check_list(x, arg = "", call = error_call)

if (!length(df)) {
return(TRUE)
# Doesn't allow `NULL`, doesn't do type casting
for (i in seq_along(x)) {
check_logical(
x[[i]],
arg = (vec_names(x) %||% paste0("..", seq_len(vec_size(x))))[[i]],
call = error_call
)
}

combine <- function(x, y) {
if (is_null(x)) {
y
# Doesn't do recycling, asserts that all inputs are the same size
if (is.null(size)) {
if (length(x) == 0L) {
size <- 0L
} else {
op(x, y)
size <- vec_size(x[[1L]])
}
}
reduce(df, combine, .init = NULL)
list_check_all_size(x, size, arg = "", call = error_call)

switch(
variant,
any = {
op <- `|`
init <- FALSE
},
all = {
op <- `&`
init <- TRUE
},
abort("Unreachable", .internal = TRUE)
)

init <- vec_rep(init, times = size)

reduce(x, op, .init = init)
Comment on lines +430 to +432
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an initial value is an important reason that the 0 and 1 input cases now "just work" correctly. This was NULL before.

}

#' Combine values from multiple columns
Expand Down Expand Up @@ -608,79 +664,67 @@ dplyr_quosures <- function(...) {
quosures
}

# When mutate() or summarise() have an unnamed call to across() at the top level, e.g.
# summarise(across(<...>)) or mutate(across(<...>))
#
# a call to top_across(<...>) is evaluated instead.
# top_across() returns a flattened list of expressions along with some
# information about the "current column" for each expression
# in the "columns" attribute:
#
# For example with: summarise(across(c(x, y), mean, .names = "mean_{.col}")) top_across() will return
# something like:
#
# structure(
# list(mean_x = expr(mean(x)), mean_y = expr(mean(y)))
# columns = c("x", "y")
# )

# Technically this always returns a single quosure but we wrap it in a
# list to follow the pattern in `expand_across()`
# Always guaranteed to be 1 quosure in, 1 quosure out, unlike `expand_across()`
expand_if_across <- function(quo) {
quo_data <- attr(quo, "dplyr:::data")
if (!quo_is_call(quo, c("if_any", "if_all"), ns = c("", "dplyr"))) {
return(list(quo))
if (quo_is_call(quo, "if_any", ns = c("", "dplyr"))) {
variant <- "any"
} else if (quo_is_call(quo, "if_all", ns = c("", "dplyr"))) {
variant <- "all"
} else {
# Refuse to expand
return(quo)
}

# `definition` is the same between the two for the purposes of `match.call()`
definition <- if_any

call <- match.call(
definition = if_any,
definition = definition,
call = quo_get_expr(quo),
expand.dots = FALSE,
envir = quo_get_env(quo)
)

if (!is_null(call$...)) {
return(list(quo))
# Refuse to expand
return(quo)
}

if (is_call(call, "if_any")) {
op <- "|"
if (variant == "any") {
if_fn <- "if_any"
empty <- FALSE
dplyr_fn <- "dplyr_list_pany"
} else {
op <- "&"
if_fn <- "if_all"
empty <- TRUE
dplyr_fn <- "dplyr_list_pall"
}

# `expand_across()` will always expand at this point given that we bailed on
# `...` usage early on, which is the only case that would stop expansion.
#
# Set frame here for backtrace truncation. But override error call via
# `local_error_call()` so it refers to the function we're expanding, e.g.
# `if_any()` and not `expand_if_across()`.
context_local("across_if_fn", if_fn)

# Set frame here for backtrace truncation. But override error call
# via `local_error_call()` so it refers to the function we're
# expanding, e.g. `if_any()` and not `expand_if_across()`.
context_local("across_frame", current_env())
local_error_call(call(if_fn))

call[[1]] <- quote(across)
quos <- expand_across(quo_set_expr(quo, call))

# Select all rows if there are no inputs for if_all(),
# but select no rows if there are no inputs for if_any().
if (!length(quos)) {
return(list(quo(!!empty)))
}
expr <- expr({
x <- list(!!!quos)
ns <- asNamespace("dplyr")

combine <- function(x, y) {
if (is_null(x)) {
y
} else {
call(op, x, y)
}
}
expr <- reduce(quos, combine, .init = NULL)
# In the evaluation path, `across()` automatically recycles to common size,
# so we must here as well for compatibility. `across()` also returns a 0
# col, 1 row data frame in the case of no inputs so that it will recycle to
# the group size, which we also do here.
size <- ns[["dplyr_list_size_common"]](x, absent = 1L, call = call(!!if_fn))
x <- ns[["dplyr_list_recycle_common"]](x, size = size, call = call(!!if_fn))

ns[[!!dplyr_fn]](x, size = size, error_call = call(!!if_fn))
})

# Use `as_quosure()` instead of `new_quosure()` to avoid rewrapping
# quosure in case of single input
list(as_quosure(expr, env = baseenv()))
new_quosure(expr, env = baseenv())
}

expand_across <- function(quo) {
Expand Down
21 changes: 16 additions & 5 deletions R/filter.R
Original file line number Diff line number Diff line change
Expand Up @@ -147,8 +147,16 @@ filter_rows <- function(
mask <- DataMask$new(data, by, "filter", error_call = error_call)
on.exit(mask$forget(), add = TRUE)

dots <- filter_expand(dots, mask = mask, error_call = error_call)
filter_eval(dots, mask = mask, error_call = error_call, user_env = user_env)
# 1:1 mapping between `dots` and `dots_expanded`
dots_expanded <- filter_expand(dots, mask = mask, error_call = error_call)

filter_eval(
dots = dots,
dots_expanded = dots_expanded,
mask = mask,
error_call = error_call,
user_env = user_env
)
}

check_filter <- function(dots, error_call = caller_env()) {
Expand All @@ -174,6 +182,7 @@ check_filter <- function(dots, error_call = caller_env()) {

filter_expand <- function(dots, mask, error_call = caller_env()) {
env_filter <- env()

filter_expand_one <- function(dot, index) {
env_filter$current_expression <- index
dot <- expand_pick(dot, mask)
Expand All @@ -190,13 +199,15 @@ filter_expand <- function(dots, mask, error_call = caller_env()) {
}
)

dots <- list_flatten(dots)

new_quosures(dots)
}

# We evaluate `dots_expanded` but report errors relative to `dots` so that
# we show "In argument: `if_any(c(x, y), is.na)`" rather than its expanded form.
# This works because `dots` and `dots_expanded` have a 1:1 mapping.
filter_eval <- function(
dots,
dots_expanded,
mask,
error_call = caller_env(),
user_env = caller_env(2)
Expand All @@ -218,7 +229,7 @@ filter_eval <- function(
)

out <- withCallingHandlers(
mask$eval_all_filter(dots, env_filter),
mask$eval_all_filter(dots_expanded, env_filter),
error = dplyr_error_handler(
dots = dots,
mask = mask,
Expand Down
Loading