TPHP

This repository contains the core code for the TPHP project. It performs per-cancer, per-protein tumor–paired non-tumor comparisons using linear mixed-effects regression and exports the results as RDS objects for downstream analysis.

Repository layout

tumor_dysregulation_analysis.R — main analysis script
data/tumor_compare_data.parquet — input table (paired tumor/non-tumor)
output/compare_report_output.rds — main result object (generated)
output/package_versions.txt — R + package versions used (generated)

1. System requirements

1.1 Operating systems

The workflow is expected to run on every common desktop/server platforms that support R. The test environment is on Windows 11 x64 system.

1.2 Software dependencies

Required

R (recommended R 4.0+)
CRAN packages
- tidyverse
- arrow
- lme4
- lmerTest
- plyr

Exact tested version recorded in output/package_versions.txt

1.3 Hardware

CPU-only execution (no GPU required)
Recommended RAM: ≥8 GB (larger datasets may require more)
No required non-standard hardware

2. Installation guide

2.1 Get the code

git clone <REPO_URL>
cd <REPO_DIR>

2.2 Install R

Install R for the operating system from CRAN.

Optionally, install packages manually:

install.packages(c("tidyverse","arrow","lmerTest","lme4"))

This project is script-based, and no software installation step is required beyond installing dependencies.

3. Demo

Quick start

From the repository root:

Rscript tumor_dysregulation_analysis.R

3.1 Input files

Default input path:

data/tumor_compare_data.parquet

3.2 Expected output files

After a successful run, the script writes:

output/compare_report_output.rds

An R list DEA with:
- DEA$Diff.report: full per-(cancer, protein) model results
- DEA$Diff.report.filter: filtered results using Hedges'g >= 0.5 and p_adj_BH < 0.05
output/package_versions.txt R version and package versions used.

3.3 Expected runtime

Runtime scales primarily with the number of model fits, i.e., cancer types × proteins × samples.

As a rough guide, on a standard desktop CPU (8–16 threads, 16–32 GB RAM), throughput is ~25 model fits per second, corresponding to < 10 minutes per cancer type under typical settings.

4. Instructions

4.1 Input data format

The script expects a Parquet table with metadata columns:

patient_ID (string)
sample_type (must include NT for non-tumor and T for tumor)
cancer_abbr (string)
cancer_subtype (string; may be constant within a subset)
Gender (categorical)
Age (integer-like)
Dataset (categorical)

All remaining columns are treated as numeric features (proteins).

4.2 What the script does

For each cancer_abbr and each protein, the script fits a mixed-effects model:

value ~ 1 + sample_type + (optional covariates) + (1 | patient_ID)

Optional covariates among {cancer_subtype, Gender, Age_c, Dataset} are included only when estimable within the cancer/protein subset.

The reported tumor effect is the fixed-effect coefficient for sample_typeT.

4.3 Output columns

Each row in DEA$Diff.report corresponds to one (cancer, protein) fit and includes:

effect, se, t
df(degree of freedom)
p, p_adj_BH (p value)
sigma (residual SD)
es_adj = effect / sigma (standardized effect)
g_adj = J(df) * es_adj (small-sample adjusted standardized effect)
formula (the exact model formula used)
is_singular, re_var_patient (fit diagnostics)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
output		output
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
tumor_dysregulation_analysis.R		tumor_dysregulation_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TPHP

Repository layout

1. System requirements

1.1 Operating systems

1.2 Software dependencies

Required

1.3 Hardware

2. Installation guide

2.1 Get the code

2.2 Install R

3. Demo

Quick start

3.1 Input files

3.2 Expected output files

3.3 Expected runtime

4. Instructions

4.1 Input data format

4.2 What the script does

4.3 Output columns

About

Uh oh!

Releases

Packages

Languages

License

Wenhhao/TPHP-code

Folders and files

Latest commit

History

Repository files navigation

TPHP

Repository layout

1. System requirements

1.1 Operating systems

1.2 Software dependencies

Required

1.3 Hardware

2. Installation guide

2.1 Get the code

2.2 Install R

3. Demo

Quick start

3.1 Input files

3.2 Expected output files

3.3 Expected runtime

4. Instructions

4.1 Input data format

4.2 What the script does

4.3 Output columns

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages