Merge pull request #583 from Merck/493-finalize-futility-vignette

LittleBeannie · web-flow · commit c421e60decdc · 2025-09-16T13:24:17.000-04:00
Finalize the NPH futility vignette
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -57,6 +57,7 @@ Imports:
 Suggests:
     covr,
     ggplot2,
+    cowplot,
     kableExtra,
     knitr,
     rmarkdown,
diff --git a/vignettes/articles/story-nph-futility.Rmd b/vignettes/articles/story-nph-futility.Rmd
@@ -1,6 +1,6 @@
 ---
 title: "Futility bounds at design and analysis under non-proportional hazards"
-author: "Keaven M. Anderson"
+author: "Keaven M. Anderson and Yujie Zhao"
 output:
   rmarkdown::html_document:
     toc: true
@@ -25,19 +25,17 @@ library(ggplot2)
 
 # Overview
 
-We set up futility bounds under a non-proportional hazards assumption.
-We consider methods presented by @korn2018interim for setting such bounds and then consider an alternate futility bound based on $\beta-$spending under a delayed or crossing treatment effect to simplify implementation.
-Finally, we show how to update this $\beta-$spending bound based on blinded interim data.
-We will consider an example to reproduce a line of @korn2018interim Table 1 with the alternative futility bounds considered.
+This vignette demonstrates possible ways to set up futility bounds in clinical trial designs under the assumption of non-proportional hazards. 
+We review the methods proposed by @wieand1994stopping and @korn2018interim.
+To be more consistent with common practice, we propose a futility bound based on $\beta$-spending that automatically accounts for
+non-proportional hazards as assumed in the design.
 
-## Initial design set-up for fixed analysis
-
-@korn2018interim considered delayed effect scenarios and proposed a futility bound that is a modification of an earlier method proposed by @wieand1994stopping.
-We begin with the enrollment and failure rate assumptions which @korn2018interim based on an example by @chen2013statistical.
+We start by specifying the enrollment and failure rate assumptions, following the example used by @korn2018interim (based on @chen2013statistical).
 
 ```{r}
 # Enrollment assumed to be 680 patients over 12 months with no ramp-up
 enroll_rate <- define_enroll_rate(duration = 12, rate = 680 / 12)
+
 # Failure rates
 ## Control exponential with median of 12 mos
 ## Delayed effect with HR = 1 for 3 months and HR = .693 thereafter
@@ -48,17 +46,20 @@ fail_rate <- define_fail_rate(
   hr = c(1, .693),
   dropout_rate = 0
 )
+
 ## Study duration was 34.8 in Korn & Freidlin Table 1
-## We change to 34.86 here to obtain 512 expected events more precisely
+## We change to 34.86 here to obtain 512 expected events they presented
 study_duration <- 34.86
-```
 
-We now derive a fixed sample size based on these assumptions.
-Ideally, we would allow a targeted event count and variable follow-up in `fixed_design_ahr()` so that the study duration will be computed automatically.
+# randomization ratio (exp:control)
+ratio <- 1
+```
 
+In this example, with 680 subjects enrolled over 12 months, we expect 512 events to occur within 34.86 months, 
+yielding approximately 90.45% power if no interim analyses are performed.
 ```{r}
 fixedevents <- fixed_design_ahr(
-  alpha = 0.025, power = NULL,
+  alpha = 0.025, power = NULL, ratio = ratio,
   enroll_rate = enroll_rate,
   fail_rate = fail_rate,
   study_duration = study_duration
@@ -72,47 +73,61 @@ fixedevents |>
   fmt_number(columns = 5:6, decimals = 3)
 ```
 
-# Modified Wieand futility bound
-
-The @wieand1994stopping rule recommends stopping after 50% of planned events accrue if the observed HR > 1.
-kornfreidlin2018 modified this by adding a second interim analysis after 75% of planned events and stop if the observed HR > 1
-This is implemented here by requiring a trend in favor of control with a direction $Z$-bound at 0 resulting in the *Nominal p* bound being 0.5 for interim analyses in the table below.
-A fixed bound is specified with the `gs_b()` function for `upper` and `lower` and its corresponding parameters `upar` for the upper (efficacy) bound and `lpar` for the lower (futility) bound.
-The final efficacy bound is for a 1-sided nominal p-value of 0.025; the futility bound lowers this to 0.0247 as noted in the lower-right-hand corner of the table below.
-It is < 0.025 since the probability is computed with the binding assumption.
-This is an arbitrary convention; if the futility bound is ignored,
-this computation yields 0.025.
-In the last row under *Alternate hypothesis* below we see the power is 88.44%.
-@korn2018interim computed 88.4% power for this design with 100,000 simulations which estimate the standard error for the power calculation to be `r paste(100 * round(sqrt(.884 * (1 - .884) / 100000), 4), "%", sep = "")`.
-
-```{r}
-wieand <- gs_power_ahr(
-  enroll_rate = enroll_rate, fail_rate = fail_rate,
-  upper = gs_b, upar = c(rep(Inf, 2), qnorm(.975)),
-  lower = gs_b, lpar = c(0, 0, -Inf),
-  event = 512 * c(.5, .75, 1)
-)
-wieand |>
-  summary() |>
-  as_gt(
-    title = "Group sequential design with futility only at interim analyses",
-    subtitle = "Wieand futility rule stops if HR > 1"
-  )
-```
-
 # Beta-spending futility bound with AHR
 
-Need to summarize here.
+Beta-spending allocates the Type II error rate ($\beta$) across interim analyses in a group sequential design. 
+At each interim analysis, a portion of the total allowed $\beta$ is spent to determine the futility boundary. 
+The cumulative $\beta$ spent up to each analysis is specified by a beta-spending function ($\beta(t)$ with $\beta(0) = 0$ and $\beta(1) = \beta$).
+The AHR model for the NPH alternate hypothesis accounts for the assumed early lack of benefit.
+
+Methodology, the futility bound of IA1 (denoted as $a_1$) is
+$$
+  a_1 = \left\{
+  a_1
+  :
+  \text{Pr}
+  \left(
+     \underbrace{Z_1 \leq a_1}_{\text{fail at IA1}} \; | \; H_1
+  \right)
+  =
+  \beta(t_1) 
+  \right\}.
+$$
+The futility bound at IA2  (denoted as $a_2$) is 
+$$
+  a_2
+  =
+  \left\{
+  a_2
+  :
+  \text{Pr}
+  \left(
+    \underbrace{Z_2 \leq a_2}_{\text{fail at IA2}} \;
+    \text{ and }
+    \underbrace{a_1 < Z_i < b_1}_{\text{continue at IA1}}
+    \; | \;
+    H_1
+  \right)
+  =
+  \beta(t_2) - \beta(t_1)
+  \right\}.
+$$
+The futility bound after IA2 can be derived in the similar logic. 
 
+In this example, the group sequential design with the $\beta$-spending of AHR can be derived as below. 
 ```{r}
 betaspending <- gs_power_ahr(
   enroll_rate = enroll_rate,
   fail_rate = fail_rate,
+  ratio = ratio,
+  # 2 IAs + 1 FA
+  event = 512 * c(.5, .75, 1),
+  # efficacy bound
   upper = gs_b,
   upar = c(rep(Inf, 2), qnorm(.975)),
+  # futility bound
   lower = gs_spending_bound,
   lpar = list(sf = gsDesign::sfLDOF, total_spend = 0.025, param = NULL, timing = NULL),
-  event = 512 * c(.5, .75, 1),
   test_lower = c(TRUE, TRUE, FALSE)
 )
 
@@ -124,47 +139,208 @@ betaspending |>
   )
 ```
 
-# Classical beta-spending futility bound
+# Modified Wieand futility bound
+
+The @wieand1994stopping rule recommends stopping the trial if the observed HR exceeds 1 after 50% of planned events. @korn2018interim extends this approach by adding a second interim analysis at 75% of planned events, also stopping if HR > 1.
+
+Here, we implement these futility rules by setting a Z-bound at 0, corresponding to a nominal p-value bound of approximately 0.5 at interim analyses. Fixed bounds are specified via the `gs_b()` function for both efficacy and futility boundaries.
+
+The final efficacy bound is for a 1-sided nominal p-value of 0.025; the futility bound lowers this to 0.0247 as noted in the lower-right-hand corner of the table below. It is < 0.025 since the probability is computed with the binding assumption. This is an arbitrary convention; if the futility bound is ignored, this computation yields 0.025.
+
+The design has 88.44% power. This closely matches the 88.4% power from @korn2018interim with 100,000 simulations which estimate the standard error for the power calculation to be `r paste(100 * round(sqrt(.884 * (1 - .884) / 100000), 4), "%", sep = "")`.
+
+```{r}
+wieand <- gs_power_ahr(
+  enroll_rate = enroll_rate, fail_rate = fail_rate,
+  ratio = ratio,
+  # 2 IAs + 1 FA
+  event = 512 * c(.5, .75, 1),
+  # efficacy bound
+  upper = gs_b, upar = c(rep(Inf, 2), qnorm(.975)),
+  # futility bound
+  lower = gs_b, lpar = c(0, 0, -Inf),
+  
+)
+
+wieand |>
+  summary() |>
+  as_gt(
+    title = "Group sequential design with futility only at interim analyses",
+    subtitle = "Wieand futility rule stops if HR > 1"
+  )
+```
 
-A classical $\beta-$spending bound would assume a constant treatment effect over time using the  proportional hazards assumption. We use the average hazard ratio at the fixed design analysis for this purpose.
 
 # Korn and Freidlin futility bound
 
-The @korn2018interim futility bound is set *when at least 50% of the expected events have occurred and at least two thirds of the observed events have occurred later than 3 months from randomization*.
-The expected timing for this is demonstrated below.
+@korn2018interim addressed scenarios with delayed treatment effects by modifying the futility rule proposed by @wieand1994stopping. Their approach sets the futility bound when at least 50% of expected events have occurred, and at least two-thirds of these events happened after 3 months from randomization.
+
+To illustrate this, we analyze the accumulation of events over time by `gsDesign2::expected_event()` , distinguishing between
++ events occurring during the initial 3-month no-effect period and 
++ event accumulation through the 34.86 months planned trial duration.
+
+```{r}
+find_ia_time <- function(t) {
+  
+  e_event0 <- expected_event(
+    enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio)), 
+    fail_rate = betaspending$fail_rate |> select(stratum, fail_rate, duration, dropout_rate),
+    total_duration = t, simple = FALSE)
+  
+  e_event1 <- expected_event(
+    enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio) * ratio), 
+    fail_rate = betaspending$fail_rate |> 
+      mutate(fail_rate = fail_rate * hr) |>
+      select(stratum, fail_rate, duration, dropout_rate), 
+    total_duration = t, simple = FALSE)
+  
+  total_event <- sum(e_event0$event) + sum(e_event1$event)
+  first3m_event <- sum(e_event0$event[1]) + sum(e_event1$event[1])
+    
+  return(2 / 3 * total_event - (total_event - first3m_event))
+} 
+
+ia1_time <- uniroot(find_ia_time, interval = c(1, 50))$root
+```
+
+```{r}
+# expected total events
+e_event_overtime <- sapply(1:betaspending$analysis$time[3], function(t){
+  e_event0 <- expected_event(
+    enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio)), 
+    fail_rate = betaspending$fail_rate |> select(stratum, fail_rate, duration, dropout_rate),
+    total_duration = t, simple = TRUE)
+  
+  e_event1 <- expected_event(
+    enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio) * ratio), 
+    fail_rate = betaspending$fail_rate |> 
+      mutate(fail_rate = fail_rate * hr) |>
+      select(stratum, fail_rate, duration, dropout_rate), 
+    total_duration = t, simple = TRUE)
+  
+  sum(e_event0) + sum(e_event1)
+}) |> unlist()
+
 
-## Accumulation of events by time interval
+# visualization of expected total events
+p1 <- ggplot(data = data.frame(time = 1:betaspending$analysis$time[3],
+                               event = e_event_overtime), 
+             aes(x = time, y = event)) +
+  geom_line(linewidth = 1.2) +
+  geom_vline(xintercept = ia1_time, linetype = "dashed", 
+             color = "red", linewidth = 1.2) +
+  scale_x_continuous(breaks = c(0, 6, 12, 18, 24, 30, 36),
+                     labels = c("0", "6", "12", "18", "24", "30", "36")) +
+  labs(x = "Months", y =  "Events") +
+  ggtitle("Expected events since study starts") +
+  theme(axis.title.x = element_text(size = 18),
+        axis.title.y = element_text(size = 18),
+        axis.text.x = element_text(size = 20),
+        axis.text.y = element_text(size = 18),
+        plot.title = element_text(size = 20))
 
-We consider the accumulation of events over time that occur during the no-effect interval for the first 3 months after randomization and events after this time interval.
-This is done for the overall trial without dividing out by treatment group using the `gsDesign2::AHR()` function.
-We consider monthly accumulation of events through the 34.86 months planned trial duration.
-We note in the summary of early expected events below that all events during the first 3 months on-study are expected prior to the first interim analysis.
+# expected events occur first 3 months
+e_event_first3m <- sapply(1:betaspending$analysis$time[3], function(t){
+  expected_event(enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio)), 
+                 fail_rate = betaspending$fail_rate |> select(stratum, fail_rate, duration, dropout_rate), 
+                 total_duration = t, simple = FALSE)$event[1] +
+  expected_event(enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio) * ratio), 
+                 fail_rate = betaspending$fail_rate |> 
+                   mutate(fail_rate = fail_rate * hr) |>
+                   select(stratum, fail_rate, duration, dropout_rate), 
+                 total_duration = t, simple = FALSE)$event[1]
+})
 
+# visualization of expected events occur first 3 months
+p2 <- ggplot(data = data.frame(time = 1:betaspending$analysis$time[3],
+                               prop = (e_event_overtime - e_event_first3m) / e_event_overtime * 100), 
+             aes(x = time, y = prop)) +
+  geom_line(linewidth = 1.2) +
+  scale_x_continuous(breaks = c(6, 12, 18, 24, 30, 36),
+                     labels = c("6", "12", "18", "24", "30", "36")) +
+  scale_y_continuous(breaks = c(10, 30, 50, 70, 90, 100),
+                     labels = c("10%", "30%", "50%", "70%", "90%", "100%")) +
+  geom_hline(yintercept = 2/3*100, linetype = "dashed", 
+             color = "red", linewidth = 1.2) +
+  geom_vline(xintercept = ia1_time, linetype = "dashed", 
+             color = "red", linewidth = 1.2) +
+  labs(x = "Months", 
+       y = "Proportion") +
+  ggtitle("Proportion of expected events occuring 3 months after study start") +
+  theme(axis.title.x = element_text(size = 18),
+        axis.title.y = element_text(size = 18),
+        axis.text.x = element_text(size = 20),
+        axis.text.y = element_text(size = 18),
+        plot.title = element_text(size = 12))
+
+# plot p1 and p2 together
+cowplot::plot_grid(p1, p2, nrow = 2,
+                   rel_heights = c(0.5, 0.5))
+```
+
+As shown by the above plot, the IA1 analysis time is `r ia1_time |> round(2)`. 
+With the IA1 analysis time known, we now derive the group sequential design with the futility bound by @korn2018interim.
 ```{r}
-event_accumulation <- pw_info(
-  enroll_rate = enroll_rate,
+kf <- gs_power_ahr(
+  enroll_rate = enroll_rate, 
   fail_rate = fail_rate,
-  total_duration = c(1:34, 34.86),
-  ratio = 1
-)
-head(event_accumulation, n = 7) |> gt()
+  ratio = ratio,
+  # 2 IAs + 1 FA
+  event = 512 * c(.5, .75, 1),
+  analysis_time = c(ia1_time, 
+                    ia1_time + 0.01, 
+                    ia1_time + 0.02),
+  # efficacy bound
+  upper = gs_b, 
+  upar = c(Inf, Inf, qnorm(.975)),
+  # futility bound
+  lower = gs_b, 
+  lpar = c(0, 0, -Inf))
+
+kf |>
+  summary() |>
+  as_gt(title = "Group sequential design with futility only",
+        subtitle = "Korn and Freidlin futility rule stops if HR > 1") 
 ```
 
-We can look at the proportion of events after the first 3 months as follows:
 
+
+# Classical beta-spending futility bound
+
+A classical $\beta-$spending bound would assume a constant treatment effect over time using the  proportional hazards assumption. We use the average hazard ratio at the fixed design analysis for this purpose.
 ```{r}
-event_accumulation |>
-  group_by(time) |>
-  summarize(`Total events` = sum(event), "Proportion early" = first(event) / `Total events`) |>
-  ggplot(aes(x = time, y = `Proportion early`)) +
-  geom_line()
+betaspending_classic <- gs_power_ahr(
+  enroll_rate = enroll_rate,
+  fail_rate = define_fail_rate(duration = Inf, 
+                               fail_rate = -log(.5) / 12,
+                               hr = fixedevents$analysis$ahr,
+                               dropout_rate = 0),
+  ratio = ratio,
+  # 2 IAs + 1 FA
+  event = 512 * c(.5, .75, 1),
+  # efficacy bound
+  upper = gs_b,
+  upar = c(rep(Inf, 2), qnorm(.975)),
+  # futility bound
+  lower = gs_spending_bound,
+  lpar = list(sf = gsDesign::sfLDOF, total_spend = 0.025, param = NULL, timing = NULL),
+  test_lower = c(TRUE, TRUE, FALSE)
+)
+
+betaspending_classic |>
+  summary() |>
+  as_gt(
+    title = "Group sequential design with futility only",
+    subtitle = "Classical beta-spending futility bound"
+  )
 ```
 
-For the @korn2018interim bound the targeted timing is when both 50% of events have occurred and at least 2/3 are more than 3 months after enrollment with 3 months being the delayed effect period.
-We see above that about 1/3 of events are still within 3 months of enrollment at month 20.
+# Conclusion
 
-## Korn and Freidlin bound
+As an alternative ad hoc methods to account for delayed effects as proposed by @wieand1994stopping and @korn2018interim,
+we propose a method for $\beta$-spending that automatically accounts for delayed effects.
+We have shown that results compare favorably to the ad hoc methods, but control Type II error 
+and adapt to the timing and distribution of event times at the time of interim analysis.
 
-The bound proposed by @korn2018interim
 
 # References