Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 2 additions & 18 deletions Comp/r-east_gsd_tte.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
title: "R vs EAST vs SAS: Group sequential design"
editor_options:
chunk_output_type: console
execute:
eval: false
---

## Introduction
Expand All @@ -30,7 +28,6 @@ We assume that a GSD is utilized for progression-free survival (PFS) endpoint. I
Further design assumptions are as follows:

```{r}
#| eval: true
# PFS HR=0.6
hr1_pfs <- 0.6
# median PFS of 9.4 months in the control arm
Expand Down Expand Up @@ -62,7 +59,6 @@ Note that, in EAST the number of target events is reported as an integer, howeve
For ease of comparison the results from EAST are summarized below:

```{r}
#| eval: true
#| echo: false
#| warning: false
library(flextable)
Expand Down Expand Up @@ -106,7 +102,6 @@ pfs_east |>
- gsDesign code to reproduce the above EAST results:

```{r}
#| eval: true
#| warning: false
library(gsDesign)

Expand Down Expand Up @@ -136,7 +131,6 @@ pfs_gsDesign |>
- gsDesign vs EAST comparison using absolute differences:

```{r}
#| eval: true
#| echo: false
digit_comp <- 4
pfs_gsDesign |>
Expand Down Expand Up @@ -191,7 +185,6 @@ pfs_gsDesign |>
- Note that, here `gsDesign2::gs_power_ahr()` is used given the number of target events for each analysis based on EAST results.

```{r}
#| eval: true
#| echo: false
#helper function to align the gsDesign2 summary with gsDesign summary
as_gs <- function(xnph) {
Expand Down Expand Up @@ -318,7 +311,6 @@ as_gs <- function(xnph) {
```

```{r}
#| eval: true
#| warning: false
#| message: false
library(gsDesign2)
Expand Down Expand Up @@ -365,7 +357,6 @@ pfs_gsDesign2 |>
- gsDesign2 vs EAST comparison using absolute differences:

```{r}
#| eval: true
#| echo: false
pfs_gsDesign2 |>
as_gs() |>
Expand Down Expand Up @@ -418,7 +409,6 @@ pfs_gsDesign2 |>
- rpact code to reproduce the above EAST results appears below.

```{r}
#| eval: true
#| warning: false
library(rpact)

Expand Down Expand Up @@ -451,7 +441,6 @@ kable(summary(pfs_rpact))

```{r}
#| echo: false
#| eval: true

pcross_h1_eff <- cumsum(pfs_rpact$rejectPerStage)
pcross_h1_fut <- pfs_rpact$futilityPerStage[1]
Expand Down Expand Up @@ -548,7 +537,7 @@ pfs_rpact_sum |>

- SAS code to reproduce the above rpact results appears below.

```{sas}
```sas
PROC SEQDESIGN BOUNDARYSCALE=MLE ERRSPEND;
DESIGN NSTAGES=2
INFO=CUM(0.748936170212766 1.0)
Expand All @@ -572,7 +561,6 @@ RUN;
The following shows the events (D) and required sample sizes (N) for IA and FA.

```{r}
#| eval: true
#| echo: false
#| fig-align: center
#| out-width: 100%
Expand All @@ -585,7 +573,7 @@ Please note that the `BOUNDARYSCALE=MLE | SCORE | STDZ | PVALUE` options display

SAS doesn't provide a boundary information with HR, so the HR boundaries is obtained from the MLE boundaries (as MLE $=\hat{\theta}=-log(\text{HR})$, see [SAS User's Guide: Test for Two Survival Distributions with a Log-Rank Test](https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_seqdesign_details52.htm#statug.seqdesign.cseqdlogrank)) via the following code.

```{sas}
```sas
DATA BHR;
SET BMLE;
Bound_UA_HR=exp(-Bound_UA);
Expand All @@ -599,7 +587,6 @@ RUN;
The HR boundaries are shown below.

```{r}
#| eval: true
#| echo: false
#| fig-align: center
#| out-width: 100%
Expand All @@ -611,7 +598,6 @@ knitr::include_graphics(
The results calculated by SAS are presneted in the table below. Please note that SAS doesn't report the probablities $P(Cross | HR=1)$ and $P(Cross | HR=0.6)$, resulting in empty cells for these results in the table.

```{r}
#| eval: true
#| echo: false
#| warning: false
library(flextable)
Expand Down Expand Up @@ -648,7 +634,6 @@ pfs_sas |>
- SAS vs rapct comparison using absolute differences:

```{r}
#| eval: true
#| echo: false
#| warning: false
pfs_rpact_sum |>
Expand Down Expand Up @@ -694,7 +679,6 @@ pfs_rpact_sum |>
- SAS vs EAST comparison using absolute differences:

```{r}
#| eval: true
#| echo: false
pfs_sas_sum <- tibble::tibble(
sas_eff = pfs_sas$eff_125,
Expand Down
13 changes: 2 additions & 11 deletions Comp/r-sas-python_survey-stats-summary.qmd
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
---
title: "R vs SAS vs Python Survey Summary Statistics"
bibliography: survey-stats-summary.bib
execute:
eval: false
---

This document will compare the survey summary statistics functionality in SAS (available through SAS/STAT), R (available from the [`{survey}`](%5B%60%7Bsurvey%7D%60%5D(https://r-survey.r-forge.r-project.org/survey/html/api.html)) package), and Python (available from the [`samplics`](https://samplics-org.github.io/samplics/) package), highlighting differences in methods and results. Only the default Taylor series linearisation method for calculating variances is used in all languages. A more detailed comparison between R and SAS for specific methods and use-cases is available in [@2017_YRBS], [@so2020modelling], or [@adamico_2009]. For a general guide to survey statistics, which has companion guides for both R and SAS, see [@Lohr_2022].
Expand Down Expand Up @@ -31,7 +29,6 @@ For the full R, SAS, and Python code and results used for this comparison, see b
## R

```{r}
#| eval: true
#| message: false
#| warning: false
library(survey)
Expand Down Expand Up @@ -125,7 +122,7 @@ print(list(

## SAS

```{sas}
```sas
* Mean, sum quantile of HI_CHOL;
proc surveymeans data=nhanes mean sum clm quantile=(0.025 0.5 0.975);
cluster SDMVPSU;
Expand Down Expand Up @@ -277,7 +274,6 @@ run;
## Python

```{python}
#| eval: true
import pandas as pd
from samplics import TaylorEstimator
from samplics.utils.types import PopParam
Expand Down Expand Up @@ -363,7 +359,6 @@ print(
`samplics` in Python does not have a method for calculating quantiles, and in R and SAS the available methods lead to different results. To demonstrate the differences in calculating quantiles, we will use the `apisrs` dataset from the `survey` package in R [@API_2000].

```{r}
#| eval: true
#| message: false
library(survey)

Expand All @@ -377,7 +372,7 @@ In SAS, PROC SURVEYMEANS will calculate quantiles of specific probabilities as y

The method and results from SAS are as follows:

```{sas}
```sas
proc surveymeans data=apisrs total=6194 quantile=(0.025 0.5 0.975);
var growth;
run;
Expand Down Expand Up @@ -407,7 +402,6 @@ run;
If in R we use the default `qrule="math"` (equivalent to `qrule="hf1"` and matches `type=1` in the `quantile` function for unweighted data) along with the default `interval.type="mean"`, we get the following results:

```{r}
#| eval: true
srs_design <- survey::svydesign(data = apisrs, id = ~1, fpc = ~fpc, )

survey::svyquantile(
Expand All @@ -422,7 +416,6 @@ survey::svyquantile(
Here we can see that the quantiles, confidence intervals, and standard errors do not match SAS. From testing, none of the available `qrule` methods match SAS for the quantile values, so it is recommended to use the default values unless you have need of some of the other properties of different quantile definitions - see [`vignette("qrule", package="survey")`](https://cran.r-project.org/web/packages/survey/vignettes/qrule.pdf) for more detail. If an exact match to SAS is required, then the `svyquantile` function allows for passing a custom function to the `qrule` argument to define your own method for calculating quantiles. Below is an example that will match SAS:

```{r}
#| eval: true
sas_qrule <- function(x, w, p) {
# Custom qrule to match SAS, based on survey::oldsvyquantile's internal method
if (any(is.na(x))) {
Expand Down Expand Up @@ -459,7 +452,6 @@ sas_quants
Note that although the quantiles and standard errors match, the confidence intervals still do not match SAS. For this another custom calculation is required, based on the formula used in SAS:

```{r}
#| eval: true
sas_quantile_confint <- function(newsvyquantile, level = 0.05, df = Inf) {
q <- coef(newsvyquantile)
se <- survey::SE(newsvyquantile)
Expand Down Expand Up @@ -503,7 +495,6 @@ In contrast, the `samplics` package in Python is still early in development, and

::: {.callout-note collapse="true" title="Session Info"}
```{r}
#| eval: true
#| echo: false
si <- sessioninfo::session_info("survey", dependencies = FALSE)
# If reticulate is used, si will include python info. However, this doesn't
Expand Down
7 changes: 3 additions & 4 deletions Comp/r-sas-summary-stats.qmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
---
title: "Deriving Quantiles or Percentiles in R vs SAS"
eval: false
---

### Data
Expand All @@ -15,7 +14,7 @@ c(10, 20, 30, 40, 150, 160, 170, 180, 190, 200)

Assuming the data above is stored in the variable `aval` within the dataset `adlb`, the 25th and 40th percentiles could be calculated using the following code.

```{sas}
```sas
proc univariate data=adlb;
var aval;
output out=stats pctlpts=25 40 pctlpre=p;
Expand All @@ -38,14 +37,14 @@ The procedure has the option `PCTLDEF` which allows for five different percentil
The 25th and 40th percentiles of `aval` can be calculated using the `quantile` function.

```{r}
#| eval: false
quantile(adlb$aval, probs = c(0.25, 0.4))
```

This gives the following output.

```{r}
#| echo: false
#| eval: true
adlb <- data.frame(aval = c(10, 20, 30, 40, 150, 160, 170, 180, 190, 200))
quantile(adlb$aval, probs = c(0.25, 0.4))
```
Expand All @@ -59,14 +58,14 @@ The default percentile definition used by the UNIVARIATE procedure in SAS finds
It is possible to get the quantile function in R to use the same definition as the default used in SAS, by specifying `type=2`.

```{r}
#| eval: false
alquantile(adlb$aval, probs = c(0.25, 0.4), type = 2)
```

This gives the following output.

```{r}
#| echo: false
#| eval: true
quantile(adlb$aval, probs = c(0.25, 0.4), type = 2)
```

Expand Down
9 changes: 2 additions & 7 deletions Comp/r-sas_anova.qmd
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
---
title: "R vs SAS Linear Models"
execute:
eval: false
---

# R vs. SAS ANOVA
Expand Down Expand Up @@ -31,7 +29,6 @@ The following table provides an overview of the support and results comparabilit
In order to get the ANOVA model fit and sum of squares you can use the `anova` function in the `stats` package.

```{r}
#| eval: true
library(emmeans)
drug_trial <- read.csv("../data/drug_trial.csv")

Expand All @@ -43,7 +40,6 @@ lm_model |>
It is recommended to use the `emmeans` package to get the contrasts between R.

```{r}
#| eval: true
lm_model |>
emmeans("drug") |>
contrast(
Expand All @@ -56,7 +52,7 @@ lm_model |>

In SAS, all contrasts must be manually defined, but the syntax is largely similar in both.

```{sas}
```sas
proc glm data=work.mycsv;
class drug;
model post = pre drug / solution;
Expand All @@ -66,7 +62,6 @@ run;
```

```{r}
#| eval: true
#| echo: false
#| fig-align: center
#| out-width: 75%
Expand Down Expand Up @@ -119,7 +114,7 @@ Provided below is a detailed comparison of the results obtained from both SAS an

Note, however, that there are some cases where the scale of the parameter estimates between SAS and R is off, though the test statistics and p-values are identical. In these cases, we can adjust the SAS code to include a divisor. As far as we can tell, this difference only occurs when using the predefined Base R contrast methods like `contr.helmert`.

```{sas}
```sas
proc glm data=work.mycsv;
class drug;
model post = pre drug / solution;
Expand Down
7 changes: 1 addition & 6 deletions Comp/r-sas_chi-sq.qmd
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
---
title: "R/SAS Chi-Squared and Fisher's Exact Comparision"
execute:
eval: false
---

# Chi-Squared Test
Expand All @@ -26,7 +24,6 @@ For an r x c table (where r is the number of rows and c the number of columns),
For this example we will use data about cough symptoms and history of bronchitis.

```{r}
#| eval: true
bronch <- matrix(c(26, 247, 44, 1002), ncol = 2)
row.names(bronch) <- c("cough", "no cough")
colnames(bronch) <- c("bronchitis", "no bronchitis")
Expand All @@ -36,13 +33,12 @@ bronch
To a chi-squared test in R you will use the following code.

```{r}
#| eval: true
stats::chisq.test(bronch)
```

To run a chi-squared test in SAS you used the following code.

```{sas}
```sas
proc freq data=proj1.bronchitis;
tables Cough*Bronchitis / chisq;
run;
Expand All @@ -51,7 +47,6 @@ run;
The result in the "Chi-Square" section of the results table in SAS will not match R, in this case it is 12.1804 with a p-value of 0.0005. This is because by default R does a Yates continuity adjustment for 2x2 tables. To change this set `correct` to false.

```{r}
#| eval: true
stats::chisq.test(bronch, correct = FALSE)
```

Expand Down
Loading
Loading