-
Notifications
You must be signed in to change notification settings - Fork 1
Incorrect PPV estimate from riskProfile function #6
Description
I noticed that the PPV estimate for a given cutoff in the riskProfile() function output is incorrect while NPV (and separately sensitivity and specificity) all look to be correct. Below is a working toy example to illustrate the problem.
library(stats4phc)
library(dplyr)
OUTCOME = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 1, 1, 0, 0, 0, 0)
SCORE = c(0.1717, 0.7474, 0.1368, 0.24947, 0.18512, 0.23712, 0.3914,
0.095, 0.59691, 0.56331, 0.5564, 0.18824, 0.17069, 0.1634, 0.6308,
0.1938, 0.37125, 0.19701, 0.13728, 0.0931, 0.35145, 0.17576,
0.1026, 0.49504, 0.41325, 0.30098, 0.57096, 0.54846, 0.25443,
0.14725, 0.1584, 0.30492, 0.49203, 0.12771, 0.2525, 0.28179,
0.22725, 0.57855, 0.17523, 0.13464, 0.22523, 0.26361, 0.12672,
0.14976, 0.49005, 0.73528, 0.15444, 0.47672, 0.0572, 0.15352)
CUTOFF = ifelse(SCORE > 0.41325, 1, 0)
comb_df <- data.frame(OUTCOME, SCORE, CUTOFF) %>%
mutate(OUTCOME_f = factor(OUTCOME, levels = c("1", "0"))) %>%
mutate(CUTOFF_f = factor(CUTOFF, levels = c("1", "0")))
myNPV <- comb_df %>%
yardstick::npv(
truth = OUTCOME_f,
estimate = CUTOFF_f,
event_level = "first"
)
print(myNPV$.estimate)
myPPV <- comb_df %>%
yardstick::ppv(
truth = OUTCOME_f,
estimate = CUTOFF_f,
event_level = "first"
)
print(myPPV$.estimate)
Our NPV and PPV for this example are 0.9189189 and 0.4615385 as expected.
With the riskProfile function we see a different estimate for PPV:
# risk profiling with stats4phc
p1cn <- riskProfile(outcome = comb_df$OUTCOME, score = comb_df$SCORE,
include = c("PPV", "1-NPV"), methods = c("cgam"))
s4p_perf <- p1cn$data %>%
filter(method == "non-parametric") %>%
filter(score == 0.41325)
print(s4p_perf$pvValue[s4p_perf$pv=="NPV"])
print(s4p_perf$pvValue[s4p_perf$pv=="PPV"])
Our NPV and PPV for this riskProfile example are 0.9189189 and 0.4285714 i.e. not the correct PPV!
If you take a deeper dive into the data output from the riskProfile function it looks like the PPV estimate at the next cutoff is the correct value for the previous cutoff so could be a merging or off-by-one type error (remember that the NPV is still correct for a given cutoff though).
# if you look at the next entry in the full output for PPV you can see that the correct PPV is assign to the next cutoff value
s4p_perf_all <- p1cn$data %>%
filter(method == "non-parametric")
print(s4p_perf_all$pvValue[which(s4p_perf_all$pv=="PPV" & s4p_perf_all$score == 0.47672)])
Here's my session info as well JIC:
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_1.1.3 stats4phc_0.1
loaded via a namespace (and not attached):
[1] Matrix_1.6-1.1 gtable_0.3.4 coneproj_1.17 compiler_4.3.1 tidyselect_1.2.0 Rcpp_1.0.12 cgam_1.21 tidyr_1.3.0
[9] splines_4.3.1 scales_1.3.0 yaml_2.3.8 fastmap_1.1.1 boot_1.3-28.1 statmod_1.5.0 svGUI_1.0.1 lattice_0.21-9
[17] ggplot2_3.5.0 R6_2.5.1 labeling_0.4.3 generics_0.1.3 knitr_1.44 MASS_7.3-60 backports_1.4.1 checkmate_2.2.0
[25] tibble_3.2.1 nloptr_2.0.3 munsell_0.5.0 minqa_1.2.6 pillar_1.9.0 rlang_1.1.3 utf8_1.2.3 xfun_0.40
[33] cli_3.6.2 withr_3.0.0 magrittr_2.0.3 digest_0.6.35 grid_4.3.1 rstudioapi_0.16.0 svDialogs_1.1.0 lme4_1.1-35.1
[41] lifecycle_1.0.4 nlme_3.1-163 vctrs_0.6.5 evaluate_0.22 pracma_2.4.2 glue_1.7.0 yardstick_1.3.1 fansi_1.0.5
[49] colorspace_2.1-0 rmarkdown_2.25 purrr_1.0.2 htmltools_0.5.8.1 tools_4.3.1 pkgconfig_2.0.3