diff --git a/DESCRIPTION b/DESCRIPTION index 360a4ebd..327cc9ea 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -57,6 +57,7 @@ Imports: Suggests: covr, ggplot2, + cowplot, kableExtra, knitr, rmarkdown, diff --git a/vignettes/articles/story-nph-futility.Rmd b/vignettes/articles/story-nph-futility.Rmd index 4126f5aa..3f08890e 100644 --- a/vignettes/articles/story-nph-futility.Rmd +++ b/vignettes/articles/story-nph-futility.Rmd @@ -1,6 +1,6 @@ --- title: "Futility bounds at design and analysis under non-proportional hazards" -author: "Keaven M. Anderson" +author: "Keaven M. Anderson and Yujie Zhao" output: rmarkdown::html_document: toc: true @@ -25,19 +25,17 @@ library(ggplot2) # Overview -We set up futility bounds under a non-proportional hazards assumption. -We consider methods presented by @korn2018interim for setting such bounds and then consider an alternate futility bound based on $\beta-$spending under a delayed or crossing treatment effect to simplify implementation. -Finally, we show how to update this $\beta-$spending bound based on blinded interim data. -We will consider an example to reproduce a line of @korn2018interim Table 1 with the alternative futility bounds considered. +This vignette demonstrates possible ways to set up futility bounds in clinical trial designs under the assumption of non-proportional hazards. +We review the methods proposed by @wieand1994stopping and @korn2018interim. +To be more consistent with common practice, we propose a futility bound based on $\beta$-spending that automatically accounts for +non-proportional hazards as assumed in the design. -## Initial design set-up for fixed analysis - -@korn2018interim considered delayed effect scenarios and proposed a futility bound that is a modification of an earlier method proposed by @wieand1994stopping. -We begin with the enrollment and failure rate assumptions which @korn2018interim based on an example by @chen2013statistical. +We start by specifying the enrollment and failure rate assumptions, following the example used by @korn2018interim (based on @chen2013statistical). ```{r} # Enrollment assumed to be 680 patients over 12 months with no ramp-up enroll_rate <- define_enroll_rate(duration = 12, rate = 680 / 12) + # Failure rates ## Control exponential with median of 12 mos ## Delayed effect with HR = 1 for 3 months and HR = .693 thereafter @@ -48,17 +46,20 @@ fail_rate <- define_fail_rate( hr = c(1, .693), dropout_rate = 0 ) + ## Study duration was 34.8 in Korn & Freidlin Table 1 -## We change to 34.86 here to obtain 512 expected events more precisely +## We change to 34.86 here to obtain 512 expected events they presented study_duration <- 34.86 -``` -We now derive a fixed sample size based on these assumptions. -Ideally, we would allow a targeted event count and variable follow-up in `fixed_design_ahr()` so that the study duration will be computed automatically. +# randomization ratio (exp:control) +ratio <- 1 +``` +In this example, with 680 subjects enrolled over 12 months, we expect 512 events to occur within 34.86 months, +yielding approximately 90.45% power if no interim analyses are performed. ```{r} fixedevents <- fixed_design_ahr( - alpha = 0.025, power = NULL, + alpha = 0.025, power = NULL, ratio = ratio, enroll_rate = enroll_rate, fail_rate = fail_rate, study_duration = study_duration @@ -72,47 +73,61 @@ fixedevents |> fmt_number(columns = 5:6, decimals = 3) ``` -# Modified Wieand futility bound - -The @wieand1994stopping rule recommends stopping after 50% of planned events accrue if the observed HR > 1. -kornfreidlin2018 modified this by adding a second interim analysis after 75% of planned events and stop if the observed HR > 1 -This is implemented here by requiring a trend in favor of control with a direction $Z$-bound at 0 resulting in the *Nominal p* bound being 0.5 for interim analyses in the table below. -A fixed bound is specified with the `gs_b()` function for `upper` and `lower` and its corresponding parameters `upar` for the upper (efficacy) bound and `lpar` for the lower (futility) bound. -The final efficacy bound is for a 1-sided nominal p-value of 0.025; the futility bound lowers this to 0.0247 as noted in the lower-right-hand corner of the table below. -It is < 0.025 since the probability is computed with the binding assumption. -This is an arbitrary convention; if the futility bound is ignored, -this computation yields 0.025. -In the last row under *Alternate hypothesis* below we see the power is 88.44%. -@korn2018interim computed 88.4% power for this design with 100,000 simulations which estimate the standard error for the power calculation to be `r paste(100 * round(sqrt(.884 * (1 - .884) / 100000), 4), "%", sep = "")`. - -```{r} -wieand <- gs_power_ahr( - enroll_rate = enroll_rate, fail_rate = fail_rate, - upper = gs_b, upar = c(rep(Inf, 2), qnorm(.975)), - lower = gs_b, lpar = c(0, 0, -Inf), - event = 512 * c(.5, .75, 1) -) -wieand |> - summary() |> - as_gt( - title = "Group sequential design with futility only at interim analyses", - subtitle = "Wieand futility rule stops if HR > 1" - ) -``` - # Beta-spending futility bound with AHR -Need to summarize here. +Beta-spending allocates the Type II error rate ($\beta$) across interim analyses in a group sequential design. +At each interim analysis, a portion of the total allowed $\beta$ is spent to determine the futility boundary. +The cumulative $\beta$ spent up to each analysis is specified by a beta-spending function ($\beta(t)$ with $\beta(0) = 0$ and $\beta(1) = \beta$). +The AHR model for the NPH alternate hypothesis accounts for the assumed early lack of benefit. + +Methodology, the futility bound of IA1 (denoted as $a_1$) is +$$ + a_1 = \left\{ + a_1 + : + \text{Pr} + \left( + \underbrace{Z_1 \leq a_1}_{\text{fail at IA1}} \; | \; H_1 + \right) + = + \beta(t_1) + \right\}. +$$ +The futility bound at IA2 (denoted as $a_2$) is +$$ + a_2 + = + \left\{ + a_2 + : + \text{Pr} + \left( + \underbrace{Z_2 \leq a_2}_{\text{fail at IA2}} \; + \text{ and } + \underbrace{a_1 < Z_i < b_1}_{\text{continue at IA1}} + \; | \; + H_1 + \right) + = + \beta(t_2) - \beta(t_1) + \right\}. +$$ +The futility bound after IA2 can be derived in the similar logic. +In this example, the group sequential design with the $\beta$-spending of AHR can be derived as below. ```{r} betaspending <- gs_power_ahr( enroll_rate = enroll_rate, fail_rate = fail_rate, + ratio = ratio, + # 2 IAs + 1 FA + event = 512 * c(.5, .75, 1), + # efficacy bound upper = gs_b, upar = c(rep(Inf, 2), qnorm(.975)), + # futility bound lower = gs_spending_bound, lpar = list(sf = gsDesign::sfLDOF, total_spend = 0.025, param = NULL, timing = NULL), - event = 512 * c(.5, .75, 1), test_lower = c(TRUE, TRUE, FALSE) ) @@ -124,47 +139,208 @@ betaspending |> ) ``` -# Classical beta-spending futility bound +# Modified Wieand futility bound + +The @wieand1994stopping rule recommends stopping the trial if the observed HR exceeds 1 after 50% of planned events. @korn2018interim extends this approach by adding a second interim analysis at 75% of planned events, also stopping if HR > 1. + +Here, we implement these futility rules by setting a Z-bound at 0, corresponding to a nominal p-value bound of approximately 0.5 at interim analyses. Fixed bounds are specified via the `gs_b()` function for both efficacy and futility boundaries. + +The final efficacy bound is for a 1-sided nominal p-value of 0.025; the futility bound lowers this to 0.0247 as noted in the lower-right-hand corner of the table below. It is < 0.025 since the probability is computed with the binding assumption. This is an arbitrary convention; if the futility bound is ignored, this computation yields 0.025. + +The design has 88.44% power. This closely matches the 88.4% power from @korn2018interim with 100,000 simulations which estimate the standard error for the power calculation to be `r paste(100 * round(sqrt(.884 * (1 - .884) / 100000), 4), "%", sep = "")`. + +```{r} +wieand <- gs_power_ahr( + enroll_rate = enroll_rate, fail_rate = fail_rate, + ratio = ratio, + # 2 IAs + 1 FA + event = 512 * c(.5, .75, 1), + # efficacy bound + upper = gs_b, upar = c(rep(Inf, 2), qnorm(.975)), + # futility bound + lower = gs_b, lpar = c(0, 0, -Inf), + +) + +wieand |> + summary() |> + as_gt( + title = "Group sequential design with futility only at interim analyses", + subtitle = "Wieand futility rule stops if HR > 1" + ) +``` -A classical $\beta-$spending bound would assume a constant treatment effect over time using the proportional hazards assumption. We use the average hazard ratio at the fixed design analysis for this purpose. # Korn and Freidlin futility bound -The @korn2018interim futility bound is set *when at least 50% of the expected events have occurred and at least two thirds of the observed events have occurred later than 3 months from randomization*. -The expected timing for this is demonstrated below. +@korn2018interim addressed scenarios with delayed treatment effects by modifying the futility rule proposed by @wieand1994stopping. Their approach sets the futility bound when at least 50% of expected events have occurred, and at least two-thirds of these events happened after 3 months from randomization. + +To illustrate this, we analyze the accumulation of events over time by `gsDesign2::expected_event()` , distinguishing between ++ events occurring during the initial 3-month no-effect period and ++ event accumulation through the 34.86 months planned trial duration. + +```{r} +find_ia_time <- function(t) { + + e_event0 <- expected_event( + enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio)), + fail_rate = betaspending$fail_rate |> select(stratum, fail_rate, duration, dropout_rate), + total_duration = t, simple = FALSE) + + e_event1 <- expected_event( + enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio) * ratio), + fail_rate = betaspending$fail_rate |> + mutate(fail_rate = fail_rate * hr) |> + select(stratum, fail_rate, duration, dropout_rate), + total_duration = t, simple = FALSE) + + total_event <- sum(e_event0$event) + sum(e_event1$event) + first3m_event <- sum(e_event0$event[1]) + sum(e_event1$event[1]) + + return(2 / 3 * total_event - (total_event - first3m_event)) +} + +ia1_time <- uniroot(find_ia_time, interval = c(1, 50))$root +``` + +```{r} +# expected total events +e_event_overtime <- sapply(1:betaspending$analysis$time[3], function(t){ + e_event0 <- expected_event( + enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio)), + fail_rate = betaspending$fail_rate |> select(stratum, fail_rate, duration, dropout_rate), + total_duration = t, simple = TRUE) + + e_event1 <- expected_event( + enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio) * ratio), + fail_rate = betaspending$fail_rate |> + mutate(fail_rate = fail_rate * hr) |> + select(stratum, fail_rate, duration, dropout_rate), + total_duration = t, simple = TRUE) + + sum(e_event0) + sum(e_event1) +}) |> unlist() + -## Accumulation of events by time interval +# visualization of expected total events +p1 <- ggplot(data = data.frame(time = 1:betaspending$analysis$time[3], + event = e_event_overtime), + aes(x = time, y = event)) + + geom_line(linewidth = 1.2) + + geom_vline(xintercept = ia1_time, linetype = "dashed", + color = "red", linewidth = 1.2) + + scale_x_continuous(breaks = c(0, 6, 12, 18, 24, 30, 36), + labels = c("0", "6", "12", "18", "24", "30", "36")) + + labs(x = "Months", y = "Events") + + ggtitle("Expected events since study starts") + + theme(axis.title.x = element_text(size = 18), + axis.title.y = element_text(size = 18), + axis.text.x = element_text(size = 20), + axis.text.y = element_text(size = 18), + plot.title = element_text(size = 20)) -We consider the accumulation of events over time that occur during the no-effect interval for the first 3 months after randomization and events after this time interval. -This is done for the overall trial without dividing out by treatment group using the `gsDesign2::AHR()` function. -We consider monthly accumulation of events through the 34.86 months planned trial duration. -We note in the summary of early expected events below that all events during the first 3 months on-study are expected prior to the first interim analysis. +# expected events occur first 3 months +e_event_first3m <- sapply(1:betaspending$analysis$time[3], function(t){ + expected_event(enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio)), + fail_rate = betaspending$fail_rate |> select(stratum, fail_rate, duration, dropout_rate), + total_duration = t, simple = FALSE)$event[1] + + expected_event(enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio) * ratio), + fail_rate = betaspending$fail_rate |> + mutate(fail_rate = fail_rate * hr) |> + select(stratum, fail_rate, duration, dropout_rate), + total_duration = t, simple = FALSE)$event[1] +}) +# visualization of expected events occur first 3 months +p2 <- ggplot(data = data.frame(time = 1:betaspending$analysis$time[3], + prop = (e_event_overtime - e_event_first3m) / e_event_overtime * 100), + aes(x = time, y = prop)) + + geom_line(linewidth = 1.2) + + scale_x_continuous(breaks = c(6, 12, 18, 24, 30, 36), + labels = c("6", "12", "18", "24", "30", "36")) + + scale_y_continuous(breaks = c(10, 30, 50, 70, 90, 100), + labels = c("10%", "30%", "50%", "70%", "90%", "100%")) + + geom_hline(yintercept = 2/3*100, linetype = "dashed", + color = "red", linewidth = 1.2) + + geom_vline(xintercept = ia1_time, linetype = "dashed", + color = "red", linewidth = 1.2) + + labs(x = "Months", + y = "Proportion") + + ggtitle("Proportion of expected events occuring 3 months after study start") + + theme(axis.title.x = element_text(size = 18), + axis.title.y = element_text(size = 18), + axis.text.x = element_text(size = 20), + axis.text.y = element_text(size = 18), + plot.title = element_text(size = 12)) + +# plot p1 and p2 together +cowplot::plot_grid(p1, p2, nrow = 2, + rel_heights = c(0.5, 0.5)) +``` + +As shown by the above plot, the IA1 analysis time is `r ia1_time |> round(2)`. +With the IA1 analysis time known, we now derive the group sequential design with the futility bound by @korn2018interim. ```{r} -event_accumulation <- pw_info( - enroll_rate = enroll_rate, +kf <- gs_power_ahr( + enroll_rate = enroll_rate, fail_rate = fail_rate, - total_duration = c(1:34, 34.86), - ratio = 1 -) -head(event_accumulation, n = 7) |> gt() + ratio = ratio, + # 2 IAs + 1 FA + event = 512 * c(.5, .75, 1), + analysis_time = c(ia1_time, + ia1_time + 0.01, + ia1_time + 0.02), + # efficacy bound + upper = gs_b, + upar = c(Inf, Inf, qnorm(.975)), + # futility bound + lower = gs_b, + lpar = c(0, 0, -Inf)) + +kf |> + summary() |> + as_gt(title = "Group sequential design with futility only", + subtitle = "Korn and Freidlin futility rule stops if HR > 1") ``` -We can look at the proportion of events after the first 3 months as follows: + +# Classical beta-spending futility bound + +A classical $\beta-$spending bound would assume a constant treatment effect over time using the proportional hazards assumption. We use the average hazard ratio at the fixed design analysis for this purpose. ```{r} -event_accumulation |> - group_by(time) |> - summarize(`Total events` = sum(event), "Proportion early" = first(event) / `Total events`) |> - ggplot(aes(x = time, y = `Proportion early`)) + - geom_line() +betaspending_classic <- gs_power_ahr( + enroll_rate = enroll_rate, + fail_rate = define_fail_rate(duration = Inf, + fail_rate = -log(.5) / 12, + hr = fixedevents$analysis$ahr, + dropout_rate = 0), + ratio = ratio, + # 2 IAs + 1 FA + event = 512 * c(.5, .75, 1), + # efficacy bound + upper = gs_b, + upar = c(rep(Inf, 2), qnorm(.975)), + # futility bound + lower = gs_spending_bound, + lpar = list(sf = gsDesign::sfLDOF, total_spend = 0.025, param = NULL, timing = NULL), + test_lower = c(TRUE, TRUE, FALSE) +) + +betaspending_classic |> + summary() |> + as_gt( + title = "Group sequential design with futility only", + subtitle = "Classical beta-spending futility bound" + ) ``` -For the @korn2018interim bound the targeted timing is when both 50% of events have occurred and at least 2/3 are more than 3 months after enrollment with 3 months being the delayed effect period. -We see above that about 1/3 of events are still within 3 months of enrollment at month 20. +# Conclusion -## Korn and Freidlin bound +As an alternative ad hoc methods to account for delayed effects as proposed by @wieand1994stopping and @korn2018interim, +we propose a method for $\beta$-spending that automatically accounts for delayed effects. +We have shown that results compare favorably to the ad hoc methods, but control Type II error +and adapt to the timing and distribution of event times at the time of interim analysis. -The bound proposed by @korn2018interim # References