Skip to content

Incorrect PPV estimate from riskProfile function #6

@dhibar

Description

@dhibar

I noticed that the PPV estimate for a given cutoff in the riskProfile() function output is incorrect while NPV (and separately sensitivity and specificity) all look to be correct. Below is a working toy example to illustrate the problem.

library(stats4phc)
library(dplyr)

OUTCOME = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 
            1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
            0, 0, 0, 1, 1, 0, 0, 0, 0)

SCORE = c(0.1717, 0.7474, 0.1368, 0.24947, 0.18512, 0.23712, 0.3914, 
          0.095, 0.59691, 0.56331, 0.5564, 0.18824, 0.17069, 0.1634, 0.6308, 
          0.1938, 0.37125, 0.19701, 0.13728, 0.0931, 0.35145, 0.17576, 
          0.1026, 0.49504, 0.41325, 0.30098, 0.57096, 0.54846, 0.25443, 
          0.14725, 0.1584, 0.30492, 0.49203, 0.12771, 0.2525, 0.28179, 
          0.22725, 0.57855, 0.17523, 0.13464, 0.22523, 0.26361, 0.12672, 
          0.14976, 0.49005, 0.73528, 0.15444, 0.47672, 0.0572, 0.15352)


CUTOFF = ifelse(SCORE > 0.41325, 1, 0)

comb_df <- data.frame(OUTCOME, SCORE, CUTOFF) %>%
  mutate(OUTCOME_f = factor(OUTCOME, levels = c("1", "0"))) %>%
  mutate(CUTOFF_f = factor(CUTOFF, levels = c("1", "0")))

myNPV <- comb_df %>%
  yardstick::npv(
    truth = OUTCOME_f,
    estimate = CUTOFF_f,
    event_level = "first"
  )

print(myNPV$.estimate)

myPPV <- comb_df %>%
  yardstick::ppv(
    truth = OUTCOME_f,
    estimate = CUTOFF_f,
    event_level = "first"
  )

print(myPPV$.estimate)

Our NPV and PPV for this example are 0.9189189 and 0.4615385 as expected.

With the riskProfile function we see a different estimate for PPV:

# risk profiling with stats4phc
p1cn <- riskProfile(outcome = comb_df$OUTCOME, score = comb_df$SCORE,
                    include = c("PPV", "1-NPV"), methods = c("cgam"))

s4p_perf <- p1cn$data %>%
  filter(method == "non-parametric") %>%
  filter(score == 0.41325)

print(s4p_perf$pvValue[s4p_perf$pv=="NPV"])

print(s4p_perf$pvValue[s4p_perf$pv=="PPV"])

Our NPV and PPV for this riskProfile example are 0.9189189 and 0.4285714 i.e. not the correct PPV!

If you take a deeper dive into the data output from the riskProfile function it looks like the PPV estimate at the next cutoff is the correct value for the previous cutoff so could be a merging or off-by-one type error (remember that the NPV is still correct for a given cutoff though).

# if you look at the next entry in the full output for PPV you can see that the correct PPV is assign to the next cutoff value
s4p_perf_all <- p1cn$data %>%
  filter(method == "non-parametric")

print(s4p_perf_all$pvValue[which(s4p_perf_all$pv=="PPV" & s4p_perf_all$score == 0.47672)])

Here's my session info as well JIC:

R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.1.3   stats4phc_0.1

loaded via a namespace (and not attached):
 [1] Matrix_1.6-1.1    gtable_0.3.4      coneproj_1.17     compiler_4.3.1    tidyselect_1.2.0  Rcpp_1.0.12       cgam_1.21         tidyr_1.3.0      
 [9] splines_4.3.1     scales_1.3.0      yaml_2.3.8        fastmap_1.1.1     boot_1.3-28.1     statmod_1.5.0     svGUI_1.0.1       lattice_0.21-9   
[17] ggplot2_3.5.0     R6_2.5.1          labeling_0.4.3    generics_0.1.3    knitr_1.44        MASS_7.3-60       backports_1.4.1   checkmate_2.2.0  
[25] tibble_3.2.1      nloptr_2.0.3      munsell_0.5.0     minqa_1.2.6       pillar_1.9.0      rlang_1.1.3       utf8_1.2.3        xfun_0.40        
[33] cli_3.6.2         withr_3.0.0       magrittr_2.0.3    digest_0.6.35     grid_4.3.1        rstudioapi_0.16.0 svDialogs_1.1.0   lme4_1.1-35.1    
[41] lifecycle_1.0.4   nlme_3.1-163      vctrs_0.6.5       evaluate_0.22     pracma_2.4.2      glue_1.7.0        yardstick_1.3.1   fansi_1.0.5      
[49] colorspace_2.1-0  rmarkdown_2.25    purrr_1.0.2       htmltools_0.5.8.1 tools_4.3.1       pkgconfig_2.0.3  

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions