Conversation
Codecov Report
@@ Coverage Diff @@
## master #154 +/- ##
==========================================
- Coverage 89.21% 89.08% -0.13%
==========================================
Files 92 92
Lines 5478 5507 +29
==========================================
+ Hits 4887 4906 +19
- Misses 591 601 +10
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #154 +/- ##
==========================================
- Coverage 90.04% 89.91% -0.13%
==========================================
Files 105 104 -1
Lines 6466 6414 -52
==========================================
- Hits 5822 5767 -55
- Misses 644 647 +3
Continue to review full report at Codecov.
|
jonkeane
left a comment
There was a problem hiding this comment.
This is very exciting. I know that we've struggled with making this easy, and it will be great to be able to do this.
R/make-array.R
Outdated
| } | ||
|
|
||
|
|
||
| #' Create Multiple Response Variable from Delimited |
There was a problem hiding this comment.
Super nitpicky: Create a Multiple Response Variable from Delimited Lists or Create Multiple Response Variables from Delimited Lists
R/make-array.R
Outdated
| #' @param unanswered Character string indicating non-response | ||
| #' @param ... Other arguments to be passed on to [makeMR()] | ||
| #' | ||
| #' @return |
R/make-array.R
Outdated
| #' @param delim The delimiter separating the responses | ||
| #' @param name The name of the resulting MR variable | ||
| #' @param selected A character string used to indicate a selection | ||
| #' @param not_selected Character string identifying non-selection |
There was a problem hiding this comment.
Add references to the default values that are specified
There was a problem hiding this comment.
I don't think "selected" and "not_selected" arguments are worth having. And I'm guessing that if you convert to the derived expression (using ~= on the server), you won't have room for defining subvariable category names anyway.
| v <- as.vector(var) | ||
| } else { | ||
| halt(dQuote("var"), " must be a Categorical or Text Crunch Variable.") | ||
| } |
There was a problem hiding this comment.
Here, dQuote("var") could be dQuote(substitute(var)) to get the actual variable string that was given
R/make-array.R
Outdated
| ds <- loadDataset(datasetReference(var)) | ||
| addVariables(ds, vardefs) | ||
| hide(var) | ||
| ds <- refresh(ds) |
There was a problem hiding this comment.
You can get rid of this refresh line, if you change line 122 to ds <- addVariables(ds, vardefs) and possible also change line 123 to var <- hide(var)
There was a problem hiding this comment.
You can get rid of all of this if we accept #162 and you use that approach instead.
tests/testthat/test-make-array.R
Outdated
| c("not_selected", "selected", "No Data")) | ||
| expect_identical(as.vector(ds$mr_5$maple), | ||
| structure(c(2L, 2L, 1L, 1L), .Label = c("not_selected", "selected" | ||
| ), class = "factor")) |
There was a problem hiding this comment.
I think this would be more readable as factor(c("selected", "selected", "not_selected", "not_selected"), levels=c("not_selected", "selected"))
tests/testthat/test-make-array.R
Outdated
| unanswered = v[is.na(v)]) | ||
| expect_equivalent(varDef, expected) | ||
| }) | ||
| c("maple; birch", "oak; maple; birch", "birch; sugar maple", "maple butter; oak") |
| not_selected = "No", | ||
| unanswered = v[is.na(v)]) | ||
| expect_equivalent(varDef, expected) | ||
| }) |
There was a problem hiding this comment.
This expectation doesn't seem quite right: I would have expected "oak" to be matched a few times given those values.
tests/testthat/test-make-array.R
Outdated
| }) | ||
|
|
||
| test_that("createSubvarDef generates the correct variable definition", { | ||
| v <- c("maple; birch", "oak; maple; birch", "birch; sugar maple", "maple butter; oak", NA) |
There was a problem hiding this comment.
it would be good to also test where one of the values is selected in all of the values.
R/make-array.R
Outdated
| vardefs <- lapply(cats, function(x) createSubvarDef(v, x, delim, | ||
| selected, not_selected, unanswered, missing = is.na(v))) | ||
| ds <- loadDataset(datasetReference(var)) | ||
| addVariables(ds, vardefs) |
There was a problem hiding this comment.
I'm not sure this is avoidable, but maybe it would be good to add a check that these variables don't already exist / if they do rename them? It seems a bit problematic that someone could only ever use this command once on a variable (without going through and manually deleting the subvariables that were created).
There was a problem hiding this comment.
How about just changing the subvariable names to varname_selection? In other words if the delimited variable was called trees then the subvariables would be trees_oak trees_maple etc. My preference would be to keep the function consistent so that it always named things in the same way and didn't change behaviour based of the other variables in a dataset.
There was a problem hiding this comment.
varname_selection is nice, but it doesn't ensure uniqueness. A safer way is to create the variable in one request (not upload and then bind, which is not atomic). And in doing it in a single request, you may be able to omit aliases for the subvariables, in which case the backend should generate (guaranteed valid) aliases for them.
|
It looks like there's a zz9 bug which prevents creating a MR variable with derived subvariable definitions. https://www.pivotaltracker.com/story/show/152975005 & The ideal scenario is to create the multiple response variable definition in one step without having to first create the subvariables, but for the time being we probably need to use the two step process until the bug is resolved. |
R/make-array.R
Outdated
| deriv <- zfunc("case", new_cat) | ||
| deriv$args[[2]] <- zfunc("is_missing", var) | ||
| deriv$args[[3]] <- zfunc("~=", var, buildDelimRegex(str, delim)) | ||
| deriv$references <- list(name = str, alias = paste0(alias(var), "_", str)) |
There was a problem hiding this comment.
We should probably check that these aliases are unique in the dataset. Currently the concatenate the source variable's alias with the item name (str).
It should be doable to check that this alias doesn't exist, and if it does do something else.
- Changed map names to accomodate no periods in Mongo - Added escape functionality for regex metacharacters
Create Multiple Response variables from delimited text or categorical variables. The process this function uses to generate the MR is: