Skip to content

Commit c421e60

Browse files
Merge pull request #583 from Merck/493-finalize-futility-vignette
Finalize the NPH futility vignette
2 parents 936ec20 + ff3e791 commit c421e60

File tree

2 files changed

+246
-69
lines changed

2 files changed

+246
-69
lines changed

DESCRIPTION

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ Imports:
5757
Suggests:
5858
covr,
5959
ggplot2,
60+
cowplot,
6061
kableExtra,
6162
knitr,
6263
rmarkdown,
Lines changed: 245 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: "Futility bounds at design and analysis under non-proportional hazards"
3-
author: "Keaven M. Anderson"
3+
author: "Keaven M. Anderson and Yujie Zhao"
44
output:
55
rmarkdown::html_document:
66
toc: true
@@ -25,19 +25,17 @@ library(ggplot2)
2525

2626
# Overview
2727

28-
We set up futility bounds under a non-proportional hazards assumption.
29-
We consider methods presented by @korn2018interim for setting such bounds and then consider an alternate futility bound based on $\beta-$spending under a delayed or crossing treatment effect to simplify implementation.
30-
Finally, we show how to update this $\beta-$spending bound based on blinded interim data.
31-
We will consider an example to reproduce a line of @korn2018interim Table 1 with the alternative futility bounds considered.
28+
This vignette demonstrates possible ways to set up futility bounds in clinical trial designs under the assumption of non-proportional hazards.
29+
We review the methods proposed by @wieand1994stopping and @korn2018interim.
30+
To be more consistent with common practice, we propose a futility bound based on $\beta$-spending that automatically accounts for
31+
non-proportional hazards as assumed in the design.
3232

33-
## Initial design set-up for fixed analysis
34-
35-
@korn2018interim considered delayed effect scenarios and proposed a futility bound that is a modification of an earlier method proposed by @wieand1994stopping.
36-
We begin with the enrollment and failure rate assumptions which @korn2018interim based on an example by @chen2013statistical.
33+
We start by specifying the enrollment and failure rate assumptions, following the example used by @korn2018interim (based on @chen2013statistical).
3734

3835
```{r}
3936
# Enrollment assumed to be 680 patients over 12 months with no ramp-up
4037
enroll_rate <- define_enroll_rate(duration = 12, rate = 680 / 12)
38+
4139
# Failure rates
4240
## Control exponential with median of 12 mos
4341
## Delayed effect with HR = 1 for 3 months and HR = .693 thereafter
@@ -48,17 +46,20 @@ fail_rate <- define_fail_rate(
4846
hr = c(1, .693),
4947
dropout_rate = 0
5048
)
49+
5150
## Study duration was 34.8 in Korn & Freidlin Table 1
52-
## We change to 34.86 here to obtain 512 expected events more precisely
51+
## We change to 34.86 here to obtain 512 expected events they presented
5352
study_duration <- 34.86
54-
```
5553
56-
We now derive a fixed sample size based on these assumptions.
57-
Ideally, we would allow a targeted event count and variable follow-up in `fixed_design_ahr()` so that the study duration will be computed automatically.
54+
# randomization ratio (exp:control)
55+
ratio <- 1
56+
```
5857

58+
In this example, with 680 subjects enrolled over 12 months, we expect 512 events to occur within 34.86 months,
59+
yielding approximately 90.45% power if no interim analyses are performed.
5960
```{r}
6061
fixedevents <- fixed_design_ahr(
61-
alpha = 0.025, power = NULL,
62+
alpha = 0.025, power = NULL, ratio = ratio,
6263
enroll_rate = enroll_rate,
6364
fail_rate = fail_rate,
6465
study_duration = study_duration
@@ -72,47 +73,61 @@ fixedevents |>
7273
fmt_number(columns = 5:6, decimals = 3)
7374
```
7475

75-
# Modified Wieand futility bound
76-
77-
The @wieand1994stopping rule recommends stopping after 50% of planned events accrue if the observed HR > 1.
78-
kornfreidlin2018 modified this by adding a second interim analysis after 75% of planned events and stop if the observed HR > 1
79-
This is implemented here by requiring a trend in favor of control with a direction $Z$-bound at 0 resulting in the *Nominal p* bound being 0.5 for interim analyses in the table below.
80-
A fixed bound is specified with the `gs_b()` function for `upper` and `lower` and its corresponding parameters `upar` for the upper (efficacy) bound and `lpar` for the lower (futility) bound.
81-
The final efficacy bound is for a 1-sided nominal p-value of 0.025; the futility bound lowers this to 0.0247 as noted in the lower-right-hand corner of the table below.
82-
It is < 0.025 since the probability is computed with the binding assumption.
83-
This is an arbitrary convention; if the futility bound is ignored,
84-
this computation yields 0.025.
85-
In the last row under *Alternate hypothesis* below we see the power is 88.44%.
86-
@korn2018interim computed 88.4% power for this design with 100,000 simulations which estimate the standard error for the power calculation to be `r paste(100 * round(sqrt(.884 * (1 - .884) / 100000), 4), "%", sep = "")`.
87-
88-
```{r}
89-
wieand <- gs_power_ahr(
90-
enroll_rate = enroll_rate, fail_rate = fail_rate,
91-
upper = gs_b, upar = c(rep(Inf, 2), qnorm(.975)),
92-
lower = gs_b, lpar = c(0, 0, -Inf),
93-
event = 512 * c(.5, .75, 1)
94-
)
95-
wieand |>
96-
summary() |>
97-
as_gt(
98-
title = "Group sequential design with futility only at interim analyses",
99-
subtitle = "Wieand futility rule stops if HR > 1"
100-
)
101-
```
102-
10376
# Beta-spending futility bound with AHR
10477

105-
Need to summarize here.
78+
Beta-spending allocates the Type II error rate ($\beta$) across interim analyses in a group sequential design.
79+
At each interim analysis, a portion of the total allowed $\beta$ is spent to determine the futility boundary.
80+
The cumulative $\beta$ spent up to each analysis is specified by a beta-spending function ($\beta(t)$ with $\beta(0) = 0$ and $\beta(1) = \beta$).
81+
The AHR model for the NPH alternate hypothesis accounts for the assumed early lack of benefit.
82+
83+
Methodology, the futility bound of IA1 (denoted as $a_1$) is
84+
$$
85+
a_1 = \left\{
86+
a_1
87+
:
88+
\text{Pr}
89+
\left(
90+
\underbrace{Z_1 \leq a_1}_{\text{fail at IA1}} \; | \; H_1
91+
\right)
92+
=
93+
\beta(t_1)
94+
\right\}.
95+
$$
96+
The futility bound at IA2 (denoted as $a_2$) is
97+
$$
98+
a_2
99+
=
100+
\left\{
101+
a_2
102+
:
103+
\text{Pr}
104+
\left(
105+
\underbrace{Z_2 \leq a_2}_{\text{fail at IA2}} \;
106+
\text{ and }
107+
\underbrace{a_1 < Z_i < b_1}_{\text{continue at IA1}}
108+
\; | \;
109+
H_1
110+
\right)
111+
=
112+
\beta(t_2) - \beta(t_1)
113+
\right\}.
114+
$$
115+
The futility bound after IA2 can be derived in the similar logic.
106116

117+
In this example, the group sequential design with the $\beta$-spending of AHR can be derived as below.
107118
```{r}
108119
betaspending <- gs_power_ahr(
109120
enroll_rate = enroll_rate,
110121
fail_rate = fail_rate,
122+
ratio = ratio,
123+
# 2 IAs + 1 FA
124+
event = 512 * c(.5, .75, 1),
125+
# efficacy bound
111126
upper = gs_b,
112127
upar = c(rep(Inf, 2), qnorm(.975)),
128+
# futility bound
113129
lower = gs_spending_bound,
114130
lpar = list(sf = gsDesign::sfLDOF, total_spend = 0.025, param = NULL, timing = NULL),
115-
event = 512 * c(.5, .75, 1),
116131
test_lower = c(TRUE, TRUE, FALSE)
117132
)
118133
@@ -124,47 +139,208 @@ betaspending |>
124139
)
125140
```
126141

127-
# Classical beta-spending futility bound
142+
# Modified Wieand futility bound
143+
144+
The @wieand1994stopping rule recommends stopping the trial if the observed HR exceeds 1 after 50% of planned events. @korn2018interim extends this approach by adding a second interim analysis at 75% of planned events, also stopping if HR > 1.
145+
146+
Here, we implement these futility rules by setting a Z-bound at 0, corresponding to a nominal p-value bound of approximately 0.5 at interim analyses. Fixed bounds are specified via the `gs_b()` function for both efficacy and futility boundaries.
147+
148+
The final efficacy bound is for a 1-sided nominal p-value of 0.025; the futility bound lowers this to 0.0247 as noted in the lower-right-hand corner of the table below. It is < 0.025 since the probability is computed with the binding assumption. This is an arbitrary convention; if the futility bound is ignored, this computation yields 0.025.
149+
150+
The design has 88.44% power. This closely matches the 88.4% power from @korn2018interim with 100,000 simulations which estimate the standard error for the power calculation to be `r paste(100 * round(sqrt(.884 * (1 - .884) / 100000), 4), "%", sep = "")`.
151+
152+
```{r}
153+
wieand <- gs_power_ahr(
154+
enroll_rate = enroll_rate, fail_rate = fail_rate,
155+
ratio = ratio,
156+
# 2 IAs + 1 FA
157+
event = 512 * c(.5, .75, 1),
158+
# efficacy bound
159+
upper = gs_b, upar = c(rep(Inf, 2), qnorm(.975)),
160+
# futility bound
161+
lower = gs_b, lpar = c(0, 0, -Inf),
162+
163+
)
164+
165+
wieand |>
166+
summary() |>
167+
as_gt(
168+
title = "Group sequential design with futility only at interim analyses",
169+
subtitle = "Wieand futility rule stops if HR > 1"
170+
)
171+
```
128172

129-
A classical $\beta-$spending bound would assume a constant treatment effect over time using the proportional hazards assumption. We use the average hazard ratio at the fixed design analysis for this purpose.
130173

131174
# Korn and Freidlin futility bound
132175

133-
The @korn2018interim futility bound is set *when at least 50% of the expected events have occurred and at least two thirds of the observed events have occurred later than 3 months from randomization*.
134-
The expected timing for this is demonstrated below.
176+
@korn2018interim addressed scenarios with delayed treatment effects by modifying the futility rule proposed by @wieand1994stopping. Their approach sets the futility bound when at least 50% of expected events have occurred, and at least two-thirds of these events happened after 3 months from randomization.
177+
178+
To illustrate this, we analyze the accumulation of events over time by `gsDesign2::expected_event()` , distinguishing between
179+
+ events occurring during the initial 3-month no-effect period and
180+
+ event accumulation through the 34.86 months planned trial duration.
181+
182+
```{r}
183+
find_ia_time <- function(t) {
184+
185+
e_event0 <- expected_event(
186+
enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio)),
187+
fail_rate = betaspending$fail_rate |> select(stratum, fail_rate, duration, dropout_rate),
188+
total_duration = t, simple = FALSE)
189+
190+
e_event1 <- expected_event(
191+
enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio) * ratio),
192+
fail_rate = betaspending$fail_rate |>
193+
mutate(fail_rate = fail_rate * hr) |>
194+
select(stratum, fail_rate, duration, dropout_rate),
195+
total_duration = t, simple = FALSE)
196+
197+
total_event <- sum(e_event0$event) + sum(e_event1$event)
198+
first3m_event <- sum(e_event0$event[1]) + sum(e_event1$event[1])
199+
200+
return(2 / 3 * total_event - (total_event - first3m_event))
201+
}
202+
203+
ia1_time <- uniroot(find_ia_time, interval = c(1, 50))$root
204+
```
205+
206+
```{r}
207+
# expected total events
208+
e_event_overtime <- sapply(1:betaspending$analysis$time[3], function(t){
209+
e_event0 <- expected_event(
210+
enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio)),
211+
fail_rate = betaspending$fail_rate |> select(stratum, fail_rate, duration, dropout_rate),
212+
total_duration = t, simple = TRUE)
213+
214+
e_event1 <- expected_event(
215+
enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio) * ratio),
216+
fail_rate = betaspending$fail_rate |>
217+
mutate(fail_rate = fail_rate * hr) |>
218+
select(stratum, fail_rate, duration, dropout_rate),
219+
total_duration = t, simple = TRUE)
220+
221+
sum(e_event0) + sum(e_event1)
222+
}) |> unlist()
223+
135224
136-
## Accumulation of events by time interval
225+
# visualization of expected total events
226+
p1 <- ggplot(data = data.frame(time = 1:betaspending$analysis$time[3],
227+
event = e_event_overtime),
228+
aes(x = time, y = event)) +
229+
geom_line(linewidth = 1.2) +
230+
geom_vline(xintercept = ia1_time, linetype = "dashed",
231+
color = "red", linewidth = 1.2) +
232+
scale_x_continuous(breaks = c(0, 6, 12, 18, 24, 30, 36),
233+
labels = c("0", "6", "12", "18", "24", "30", "36")) +
234+
labs(x = "Months", y = "Events") +
235+
ggtitle("Expected events since study starts") +
236+
theme(axis.title.x = element_text(size = 18),
237+
axis.title.y = element_text(size = 18),
238+
axis.text.x = element_text(size = 20),
239+
axis.text.y = element_text(size = 18),
240+
plot.title = element_text(size = 20))
137241
138-
We consider the accumulation of events over time that occur during the no-effect interval for the first 3 months after randomization and events after this time interval.
139-
This is done for the overall trial without dividing out by treatment group using the `gsDesign2::AHR()` function.
140-
We consider monthly accumulation of events through the 34.86 months planned trial duration.
141-
We note in the summary of early expected events below that all events during the first 3 months on-study are expected prior to the first interim analysis.
242+
# expected events occur first 3 months
243+
e_event_first3m <- sapply(1:betaspending$analysis$time[3], function(t){
244+
expected_event(enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio)),
245+
fail_rate = betaspending$fail_rate |> select(stratum, fail_rate, duration, dropout_rate),
246+
total_duration = t, simple = FALSE)$event[1] +
247+
expected_event(enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio) * ratio),
248+
fail_rate = betaspending$fail_rate |>
249+
mutate(fail_rate = fail_rate * hr) |>
250+
select(stratum, fail_rate, duration, dropout_rate),
251+
total_duration = t, simple = FALSE)$event[1]
252+
})
142253
254+
# visualization of expected events occur first 3 months
255+
p2 <- ggplot(data = data.frame(time = 1:betaspending$analysis$time[3],
256+
prop = (e_event_overtime - e_event_first3m) / e_event_overtime * 100),
257+
aes(x = time, y = prop)) +
258+
geom_line(linewidth = 1.2) +
259+
scale_x_continuous(breaks = c(6, 12, 18, 24, 30, 36),
260+
labels = c("6", "12", "18", "24", "30", "36")) +
261+
scale_y_continuous(breaks = c(10, 30, 50, 70, 90, 100),
262+
labels = c("10%", "30%", "50%", "70%", "90%", "100%")) +
263+
geom_hline(yintercept = 2/3*100, linetype = "dashed",
264+
color = "red", linewidth = 1.2) +
265+
geom_vline(xintercept = ia1_time, linetype = "dashed",
266+
color = "red", linewidth = 1.2) +
267+
labs(x = "Months",
268+
y = "Proportion") +
269+
ggtitle("Proportion of expected events occuring 3 months after study start") +
270+
theme(axis.title.x = element_text(size = 18),
271+
axis.title.y = element_text(size = 18),
272+
axis.text.x = element_text(size = 20),
273+
axis.text.y = element_text(size = 18),
274+
plot.title = element_text(size = 12))
275+
276+
# plot p1 and p2 together
277+
cowplot::plot_grid(p1, p2, nrow = 2,
278+
rel_heights = c(0.5, 0.5))
279+
```
280+
281+
As shown by the above plot, the IA1 analysis time is `r ia1_time |> round(2)`.
282+
With the IA1 analysis time known, we now derive the group sequential design with the futility bound by @korn2018interim.
143283
```{r}
144-
event_accumulation <- pw_info(
145-
enroll_rate = enroll_rate,
284+
kf <- gs_power_ahr(
285+
enroll_rate = enroll_rate,
146286
fail_rate = fail_rate,
147-
total_duration = c(1:34, 34.86),
148-
ratio = 1
149-
)
150-
head(event_accumulation, n = 7) |> gt()
287+
ratio = ratio,
288+
# 2 IAs + 1 FA
289+
event = 512 * c(.5, .75, 1),
290+
analysis_time = c(ia1_time,
291+
ia1_time + 0.01,
292+
ia1_time + 0.02),
293+
# efficacy bound
294+
upper = gs_b,
295+
upar = c(Inf, Inf, qnorm(.975)),
296+
# futility bound
297+
lower = gs_b,
298+
lpar = c(0, 0, -Inf))
299+
300+
kf |>
301+
summary() |>
302+
as_gt(title = "Group sequential design with futility only",
303+
subtitle = "Korn and Freidlin futility rule stops if HR > 1")
151304
```
152305

153-
We can look at the proportion of events after the first 3 months as follows:
154306

307+
308+
# Classical beta-spending futility bound
309+
310+
A classical $\beta-$spending bound would assume a constant treatment effect over time using the proportional hazards assumption. We use the average hazard ratio at the fixed design analysis for this purpose.
155311
```{r}
156-
event_accumulation |>
157-
group_by(time) |>
158-
summarize(`Total events` = sum(event), "Proportion early" = first(event) / `Total events`) |>
159-
ggplot(aes(x = time, y = `Proportion early`)) +
160-
geom_line()
312+
betaspending_classic <- gs_power_ahr(
313+
enroll_rate = enroll_rate,
314+
fail_rate = define_fail_rate(duration = Inf,
315+
fail_rate = -log(.5) / 12,
316+
hr = fixedevents$analysis$ahr,
317+
dropout_rate = 0),
318+
ratio = ratio,
319+
# 2 IAs + 1 FA
320+
event = 512 * c(.5, .75, 1),
321+
# efficacy bound
322+
upper = gs_b,
323+
upar = c(rep(Inf, 2), qnorm(.975)),
324+
# futility bound
325+
lower = gs_spending_bound,
326+
lpar = list(sf = gsDesign::sfLDOF, total_spend = 0.025, param = NULL, timing = NULL),
327+
test_lower = c(TRUE, TRUE, FALSE)
328+
)
329+
330+
betaspending_classic |>
331+
summary() |>
332+
as_gt(
333+
title = "Group sequential design with futility only",
334+
subtitle = "Classical beta-spending futility bound"
335+
)
161336
```
162337

163-
For the @korn2018interim bound the targeted timing is when both 50% of events have occurred and at least 2/3 are more than 3 months after enrollment with 3 months being the delayed effect period.
164-
We see above that about 1/3 of events are still within 3 months of enrollment at month 20.
338+
# Conclusion
165339

166-
## Korn and Freidlin bound
340+
As an alternative ad hoc methods to account for delayed effects as proposed by @wieand1994stopping and @korn2018interim,
341+
we propose a method for $\beta$-spending that automatically accounts for delayed effects.
342+
We have shown that results compare favorably to the ad hoc methods, but control Type II error
343+
and adapt to the timing and distribution of event times at the time of interim analysis.
167344

168-
The bound proposed by @korn2018interim
169345

170346
# References

0 commit comments

Comments
 (0)