Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ Imports:
Suggests:
covr,
ggplot2,
cowplot,
kableExtra,
knitr,
rmarkdown,
Expand Down
314 changes: 245 additions & 69 deletions vignettes/articles/story-nph-futility.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Futility bounds at design and analysis under non-proportional hazards"
author: "Keaven M. Anderson"
author: "Keaven M. Anderson and Yujie Zhao"
output:
rmarkdown::html_document:
toc: true
Expand All @@ -25,19 +25,17 @@ library(ggplot2)

# Overview

We set up futility bounds under a non-proportional hazards assumption.
We consider methods presented by @korn2018interim for setting such bounds and then consider an alternate futility bound based on $\beta-$spending under a delayed or crossing treatment effect to simplify implementation.
Finally, we show how to update this $\beta-$spending bound based on blinded interim data.
We will consider an example to reproduce a line of @korn2018interim Table 1 with the alternative futility bounds considered.
This vignette demonstrates possible ways to set up futility bounds in clinical trial designs under the assumption of non-proportional hazards.
We review the methods proposed by @wieand1994stopping and @korn2018interim.
To be more consistent with common practice, we propose a futility bound based on $\beta$-spending that automatically accounts for
non-proportional hazards as assumed in the design.

## Initial design set-up for fixed analysis

@korn2018interim considered delayed effect scenarios and proposed a futility bound that is a modification of an earlier method proposed by @wieand1994stopping.
We begin with the enrollment and failure rate assumptions which @korn2018interim based on an example by @chen2013statistical.
We start by specifying the enrollment and failure rate assumptions, following the example used by @korn2018interim (based on @chen2013statistical).

```{r}
# Enrollment assumed to be 680 patients over 12 months with no ramp-up
enroll_rate <- define_enroll_rate(duration = 12, rate = 680 / 12)

# Failure rates
## Control exponential with median of 12 mos
## Delayed effect with HR = 1 for 3 months and HR = .693 thereafter
Expand All @@ -48,17 +46,20 @@ fail_rate <- define_fail_rate(
hr = c(1, .693),
dropout_rate = 0
)

## Study duration was 34.8 in Korn & Freidlin Table 1
## We change to 34.86 here to obtain 512 expected events more precisely
## We change to 34.86 here to obtain 512 expected events they presented
study_duration <- 34.86
```

We now derive a fixed sample size based on these assumptions.
Ideally, we would allow a targeted event count and variable follow-up in `fixed_design_ahr()` so that the study duration will be computed automatically.
# randomization ratio (exp:control)
ratio <- 1
```

In this example, with 680 subjects enrolled over 12 months, we expect 512 events to occur within 34.86 months,
yielding approximately 90.45% power if no interim analyses are performed.
```{r}
fixedevents <- fixed_design_ahr(
alpha = 0.025, power = NULL,
alpha = 0.025, power = NULL, ratio = ratio,
enroll_rate = enroll_rate,
fail_rate = fail_rate,
study_duration = study_duration
Expand All @@ -72,47 +73,61 @@ fixedevents |>
fmt_number(columns = 5:6, decimals = 3)
```

# Modified Wieand futility bound

The @wieand1994stopping rule recommends stopping after 50% of planned events accrue if the observed HR > 1.
kornfreidlin2018 modified this by adding a second interim analysis after 75% of planned events and stop if the observed HR > 1
This is implemented here by requiring a trend in favor of control with a direction $Z$-bound at 0 resulting in the *Nominal p* bound being 0.5 for interim analyses in the table below.
A fixed bound is specified with the `gs_b()` function for `upper` and `lower` and its corresponding parameters `upar` for the upper (efficacy) bound and `lpar` for the lower (futility) bound.
The final efficacy bound is for a 1-sided nominal p-value of 0.025; the futility bound lowers this to 0.0247 as noted in the lower-right-hand corner of the table below.
It is < 0.025 since the probability is computed with the binding assumption.
This is an arbitrary convention; if the futility bound is ignored,
this computation yields 0.025.
In the last row under *Alternate hypothesis* below we see the power is 88.44%.
@korn2018interim computed 88.4% power for this design with 100,000 simulations which estimate the standard error for the power calculation to be `r paste(100 * round(sqrt(.884 * (1 - .884) / 100000), 4), "%", sep = "")`.

```{r}
wieand <- gs_power_ahr(
enroll_rate = enroll_rate, fail_rate = fail_rate,
upper = gs_b, upar = c(rep(Inf, 2), qnorm(.975)),
lower = gs_b, lpar = c(0, 0, -Inf),
event = 512 * c(.5, .75, 1)
)
wieand |>
summary() |>
as_gt(
title = "Group sequential design with futility only at interim analyses",
subtitle = "Wieand futility rule stops if HR > 1"
)
```

# Beta-spending futility bound with AHR

Need to summarize here.
Beta-spending allocates the Type II error rate ($\beta$) across interim analyses in a group sequential design.
At each interim analysis, a portion of the total allowed $\beta$ is spent to determine the futility boundary.
The cumulative $\beta$ spent up to each analysis is specified by a beta-spending function ($\beta(t)$ with $\beta(0) = 0$ and $\beta(1) = \beta$).
The AHR model for the NPH alternate hypothesis accounts for the assumed early lack of benefit.

Methodology, the futility bound of IA1 (denoted as $a_1$) is
$$
a_1 = \left\{
a_1
:
\text{Pr}
\left(
\underbrace{Z_1 \leq a_1}_{\text{fail at IA1}} \; | \; H_1
\right)
=
\beta(t_1)
\right\}.
$$
The futility bound at IA2 (denoted as $a_2$) is
$$
a_2
=
\left\{
a_2
:
\text{Pr}
\left(
\underbrace{Z_2 \leq a_2}_{\text{fail at IA2}} \;
\text{ and }
\underbrace{a_1 < Z_i < b_1}_{\text{continue at IA1}}
\; | \;
H_1
\right)
=
\beta(t_2) - \beta(t_1)
\right\}.
$$
The futility bound after IA2 can be derived in the similar logic.

In this example, the group sequential design with the $\beta$-spending of AHR can be derived as below.
```{r}
betaspending <- gs_power_ahr(
enroll_rate = enroll_rate,
fail_rate = fail_rate,
ratio = ratio,
# 2 IAs + 1 FA
event = 512 * c(.5, .75, 1),
# efficacy bound
upper = gs_b,
upar = c(rep(Inf, 2), qnorm(.975)),
# futility bound
lower = gs_spending_bound,
lpar = list(sf = gsDesign::sfLDOF, total_spend = 0.025, param = NULL, timing = NULL),
event = 512 * c(.5, .75, 1),
test_lower = c(TRUE, TRUE, FALSE)
)

Expand All @@ -124,47 +139,208 @@ betaspending |>
)
```

# Classical beta-spending futility bound
# Modified Wieand futility bound

The @wieand1994stopping rule recommends stopping the trial if the observed HR exceeds 1 after 50% of planned events. @korn2018interim extends this approach by adding a second interim analysis at 75% of planned events, also stopping if HR > 1.

Here, we implement these futility rules by setting a Z-bound at 0, corresponding to a nominal p-value bound of approximately 0.5 at interim analyses. Fixed bounds are specified via the `gs_b()` function for both efficacy and futility boundaries.

The final efficacy bound is for a 1-sided nominal p-value of 0.025; the futility bound lowers this to 0.0247 as noted in the lower-right-hand corner of the table below. It is < 0.025 since the probability is computed with the binding assumption. This is an arbitrary convention; if the futility bound is ignored, this computation yields 0.025.

The design has 88.44% power. This closely matches the 88.4% power from @korn2018interim with 100,000 simulations which estimate the standard error for the power calculation to be `r paste(100 * round(sqrt(.884 * (1 - .884) / 100000), 4), "%", sep = "")`.

```{r}
wieand <- gs_power_ahr(
enroll_rate = enroll_rate, fail_rate = fail_rate,
ratio = ratio,
# 2 IAs + 1 FA
event = 512 * c(.5, .75, 1),
# efficacy bound
upper = gs_b, upar = c(rep(Inf, 2), qnorm(.975)),
# futility bound
lower = gs_b, lpar = c(0, 0, -Inf),

)

wieand |>
summary() |>
as_gt(
title = "Group sequential design with futility only at interim analyses",
subtitle = "Wieand futility rule stops if HR > 1"
)
```

A classical $\beta-$spending bound would assume a constant treatment effect over time using the proportional hazards assumption. We use the average hazard ratio at the fixed design analysis for this purpose.

# Korn and Freidlin futility bound

The @korn2018interim futility bound is set *when at least 50% of the expected events have occurred and at least two thirds of the observed events have occurred later than 3 months from randomization*.
The expected timing for this is demonstrated below.
@korn2018interim addressed scenarios with delayed treatment effects by modifying the futility rule proposed by @wieand1994stopping. Their approach sets the futility bound when at least 50% of expected events have occurred, and at least two-thirds of these events happened after 3 months from randomization.

To illustrate this, we analyze the accumulation of events over time by `gsDesign2::expected_event()` , distinguishing between
+ events occurring during the initial 3-month no-effect period and
+ event accumulation through the 34.86 months planned trial duration.

```{r}
find_ia_time <- function(t) {

e_event0 <- expected_event(
enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio)),
fail_rate = betaspending$fail_rate |> select(stratum, fail_rate, duration, dropout_rate),
total_duration = t, simple = FALSE)

e_event1 <- expected_event(
enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio) * ratio),
fail_rate = betaspending$fail_rate |>
mutate(fail_rate = fail_rate * hr) |>
select(stratum, fail_rate, duration, dropout_rate),
total_duration = t, simple = FALSE)

total_event <- sum(e_event0$event) + sum(e_event1$event)
first3m_event <- sum(e_event0$event[1]) + sum(e_event1$event[1])

return(2 / 3 * total_event - (total_event - first3m_event))
}

ia1_time <- uniroot(find_ia_time, interval = c(1, 50))$root
```

```{r}
# expected total events
e_event_overtime <- sapply(1:betaspending$analysis$time[3], function(t){
e_event0 <- expected_event(
enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio)),
fail_rate = betaspending$fail_rate |> select(stratum, fail_rate, duration, dropout_rate),
total_duration = t, simple = TRUE)

e_event1 <- expected_event(
enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio) * ratio),
fail_rate = betaspending$fail_rate |>
mutate(fail_rate = fail_rate * hr) |>
select(stratum, fail_rate, duration, dropout_rate),
total_duration = t, simple = TRUE)

sum(e_event0) + sum(e_event1)
}) |> unlist()


## Accumulation of events by time interval
# visualization of expected total events
p1 <- ggplot(data = data.frame(time = 1:betaspending$analysis$time[3],
event = e_event_overtime),
aes(x = time, y = event)) +
geom_line(linewidth = 1.2) +
geom_vline(xintercept = ia1_time, linetype = "dashed",
color = "red", linewidth = 1.2) +
scale_x_continuous(breaks = c(0, 6, 12, 18, 24, 30, 36),
labels = c("0", "6", "12", "18", "24", "30", "36")) +
labs(x = "Months", y = "Events") +
ggtitle("Expected events since study starts") +
theme(axis.title.x = element_text(size = 18),
axis.title.y = element_text(size = 18),
axis.text.x = element_text(size = 20),
axis.text.y = element_text(size = 18),
plot.title = element_text(size = 20))

We consider the accumulation of events over time that occur during the no-effect interval for the first 3 months after randomization and events after this time interval.
This is done for the overall trial without dividing out by treatment group using the `gsDesign2::AHR()` function.
We consider monthly accumulation of events through the 34.86 months planned trial duration.
We note in the summary of early expected events below that all events during the first 3 months on-study are expected prior to the first interim analysis.
# expected events occur first 3 months
e_event_first3m <- sapply(1:betaspending$analysis$time[3], function(t){
expected_event(enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio)),
fail_rate = betaspending$fail_rate |> select(stratum, fail_rate, duration, dropout_rate),
total_duration = t, simple = FALSE)$event[1] +
expected_event(enroll_rate = betaspending$enroll_rate |> mutate(rate = rate / (1 + ratio) * ratio),
fail_rate = betaspending$fail_rate |>
mutate(fail_rate = fail_rate * hr) |>
select(stratum, fail_rate, duration, dropout_rate),
total_duration = t, simple = FALSE)$event[1]
})

# visualization of expected events occur first 3 months
p2 <- ggplot(data = data.frame(time = 1:betaspending$analysis$time[3],
prop = (e_event_overtime - e_event_first3m) / e_event_overtime * 100),
aes(x = time, y = prop)) +
geom_line(linewidth = 1.2) +
scale_x_continuous(breaks = c(6, 12, 18, 24, 30, 36),
labels = c("6", "12", "18", "24", "30", "36")) +
scale_y_continuous(breaks = c(10, 30, 50, 70, 90, 100),
labels = c("10%", "30%", "50%", "70%", "90%", "100%")) +
geom_hline(yintercept = 2/3*100, linetype = "dashed",
color = "red", linewidth = 1.2) +
geom_vline(xintercept = ia1_time, linetype = "dashed",
color = "red", linewidth = 1.2) +
labs(x = "Months",
y = "Proportion") +
ggtitle("Proportion of expected events occuring 3 months after study start") +
theme(axis.title.x = element_text(size = 18),
axis.title.y = element_text(size = 18),
axis.text.x = element_text(size = 20),
axis.text.y = element_text(size = 18),
plot.title = element_text(size = 12))

# plot p1 and p2 together
cowplot::plot_grid(p1, p2, nrow = 2,
rel_heights = c(0.5, 0.5))
```

As shown by the above plot, the IA1 analysis time is `r ia1_time |> round(2)`.
With the IA1 analysis time known, we now derive the group sequential design with the futility bound by @korn2018interim.
```{r}
event_accumulation <- pw_info(
enroll_rate = enroll_rate,
kf <- gs_power_ahr(
enroll_rate = enroll_rate,
fail_rate = fail_rate,
total_duration = c(1:34, 34.86),
ratio = 1
)
head(event_accumulation, n = 7) |> gt()
ratio = ratio,
# 2 IAs + 1 FA
event = 512 * c(.5, .75, 1),
analysis_time = c(ia1_time,
ia1_time + 0.01,
ia1_time + 0.02),
# efficacy bound
upper = gs_b,
upar = c(Inf, Inf, qnorm(.975)),
# futility bound
lower = gs_b,
lpar = c(0, 0, -Inf))

kf |>
summary() |>
as_gt(title = "Group sequential design with futility only",
subtitle = "Korn and Freidlin futility rule stops if HR > 1")
```

We can look at the proportion of events after the first 3 months as follows:


# Classical beta-spending futility bound

A classical $\beta-$spending bound would assume a constant treatment effect over time using the proportional hazards assumption. We use the average hazard ratio at the fixed design analysis for this purpose.
```{r}
event_accumulation |>
group_by(time) |>
summarize(`Total events` = sum(event), "Proportion early" = first(event) / `Total events`) |>
ggplot(aes(x = time, y = `Proportion early`)) +
geom_line()
betaspending_classic <- gs_power_ahr(
enroll_rate = enroll_rate,
fail_rate = define_fail_rate(duration = Inf,
fail_rate = -log(.5) / 12,
hr = fixedevents$analysis$ahr,
dropout_rate = 0),
ratio = ratio,
# 2 IAs + 1 FA
event = 512 * c(.5, .75, 1),
# efficacy bound
upper = gs_b,
upar = c(rep(Inf, 2), qnorm(.975)),
# futility bound
lower = gs_spending_bound,
lpar = list(sf = gsDesign::sfLDOF, total_spend = 0.025, param = NULL, timing = NULL),
test_lower = c(TRUE, TRUE, FALSE)
)

betaspending_classic |>
summary() |>
as_gt(
title = "Group sequential design with futility only",
subtitle = "Classical beta-spending futility bound"
)
```

For the @korn2018interim bound the targeted timing is when both 50% of events have occurred and at least 2/3 are more than 3 months after enrollment with 3 months being the delayed effect period.
We see above that about 1/3 of events are still within 3 months of enrollment at month 20.
# Conclusion

## Korn and Freidlin bound
As an alternative ad hoc methods to account for delayed effects as proposed by @wieand1994stopping and @korn2018interim,
we propose a method for $\beta$-spending that automatically accounts for delayed effects.
We have shown that results compare favorably to the ad hoc methods, but control Type II error
and adapt to the timing and distribution of event times at the time of interim analysis.

The bound proposed by @korn2018interim

# References
Loading