Migrate some simple ddply() calls to dplyr equivalents#1388
Open
MichaelChirico wants to merge 8 commits intotopepo:masterfrom
Open
Migrate some simple ddply() calls to dplyr equivalents#1388MichaelChirico wants to merge 8 commits intotopepo:masterfrom
MichaelChirico wants to merge 8 commits intotopepo:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1387.
Similar to #1382, #1383, #1385, #1386 -- the end goal is moving away from {plyr}. Like #1386, this bumps {dplyr} from
SuggeststoImports.There are a variety of stylistic choices around using tidyverse code in a package (whether to use
%>%/|>, whether to use the.datamask to avoid the need forglobalVariables(), etc). Let me know your preferences and I can make further edits.Also like #1386, the key driver of potential differences in these between {plyr} and {dplyr} comes down to the differences between
plyr::rbind.fill()anddplyr::bind_rows(). Since the structure across groups insummarize()expressions is typically consistent, I don't have much concern about that here.The second major difference is that
plyr::ddply()always sorts the output by the grouping key, whereasdplyr::summarize()sorts according to the input row order (this in turn matches the {data.table} behavior). I tried to have a look at nearby code to see if the output order matters -- if it's not clear we can ignore the row ordering, I default to assuming it's important, thus we might be able to drop morearrange()calls.There are still a large number of
ddply()calls -- this reduces the number from roughly 74 to roughly 48. The remaining ones either do more complicated things around column selection, or use a much more complicated.funwhich requires more careful examination. More PR(s) to follow.