Integrated transcriptomic analysis reveals universal mitochondrial targeting across the COVID-19 disease spectrum
This study presents the first comprehensive molecular characterization of the complete COVID-19 pathophysiological spectrum, from health through acute severe infection to post-acute sequelae and mortality. Through integrative analysis of three independent RNA-seq datasets (n=152 samples across eight disease states) processed with a standardized containerized pipeline, we reveal mitochondrial dysfunction as the universal molecular signature of COVID-19.
Oxidative Phosphorylation (OXPHOS) emerged as the most consistently enriched pathway across disease states, ranking first in five of six comparisons with Enrichr Combined Scores ranging from 6,742 to 10,227. A three-pathway molecular triad—OXPHOS, MYC Targets V1, and mTORC1—demonstrated distinct patterns across disease progression, with fatal cases uniquely exhibiting MYC Targets V1 dominance (Combined Score 173,380,125), representing a metabolic-to-proliferative switch characteristic of irreversible disease progression.
Four critical mitochondrial components (NDUFA1, COX5A, ATP5F1, TOMM20) emerged as shared targets across all three pathways, indicating coordinated viral disruption of cellular bioenergetics rather than collateral inflammatory damage. These findings fundamentally reframe COVID-19 from a respiratory illness with systemic complications to a primary mitochondrial disease with respiratory manifestations, opening new therapeutic avenues targeting cellular bioenergetics for both acute COVID-19 and Long COVID management.
This study integrates three independent RNA-seq datasets capturing the complete COVID-19 pathophysiological spectrum:
| Dataset | Disease States | Notes |
|---|---|---|
| Ryan et al. (GSE169687) | Healthy controls (n=14) PASC mild (n=98) PASC moderate (n=11) PASC severe (n=14) PASC critical (n=15) |
Australia n=152 total Paired-end PBMC |
| Yin et al. (GSE224615) | Recovered without sequelae (n=13) Long COVID (n=23) |
United States n=36 total Paired-end PBMC |
| Vlasov et al. (GSE185863) | Acute severe survivors (n=5) Fatal cases (n=3) Technical replicates: 16 samples from 8 patients |
Russia n=16 total Paired-end PBMC |
Total samples analyzed: 152 across eight disease groups
We applied the following analytical pipeline to uniformly process RNA-seq datasets from different studies/labs.
Pipeline Overview: Containerized, standardized workflow ensuring reproducible cross-dataset integration
- Quality Control: Trim Galore (v0.6.10) for adapter trimming
- Alignment: STAR (v2.7.11a) to GRCh38 reference genome (Ensembl release 110)
- Quantification: Salmon (v1.10.1) with transcript-level quantification
- Differential Expression: edgeR with FDR <0.05, log2FC >1.25
- Pathway Analysis: Gene Set Enrichment Analysis (GSEA) using the Enrichr web portal at https://maayanlab.cloud/Enrichr/# and MSigDB Hallmark gene sets
Reference: GRCh38 primary assembly, GENCODE annotation release 44
Statistical Thresholds: FDR <0.05, |log2FC| >1.25
Pathway Filtering: Gene sets with 15-500 genes per pathway
Our containerized workflows (with source code) for uniform RNA-seq data processing are available in workflows. Our uniformly processed data are available from Zenodo with DOI 10.5281/zenodo.16751356.
The following Jupyter notebooks took the uniformly processed counts table as input, and performed differential expression analysis.
- A vs B - Group A vs B: Healthy controls vs. recovered patients with no PASC signs and symptoms (to identify the main post-recovery molecular signatures)
- A vs CD - Group A vs C+D: Healthy controls vs. mild/moderate PASC (to help us investigate the dominant gene expression pathways as patients enter the PASC phase)
- A vs EF - Group A vs E+F: Healthy controls vs. severe/critical PASC (to characterize the molecular associations differentially expressed genes with severe/critical PASC)
- A vs H - Group A vs H: Healthy controls vs. acute severe COVID-19 survivors (examining enriched gene pathways around resolution of severe active COVID-19)
- A vs I - Group A vs I: Healthy controls vs. acute severe COVID-19 fatalities (identifying maximal infection related inflammatory pathway activation secondary to COVID-19)
- H vs I - Group H vs I: Acute severe survivors vs. fatalities (contrasting the dominant gene expression profiles of pathways related to fatality versus near-fatal cases)