GeneExt takes as input scRNA-seq mapped reads and a gene annotation file (GTF or GFF, any version) and outputs an extended gene annotation file for improved scRNA-seq transcript quantification.
Note: Users lacking a Conda installation are recommended to install Miniforge.
Tool dependencies can be installed with conda or mamba:
# create environment
mamba env create -n geneext -f environment.yaml
mamba activate geneext Install macs2 separately with pip:
pip install macs2Once dependencies are installed, try running GeneExt with sample data:
python geneext.py -g test_data/annotation.gtf -b test_data/alignments.bam -o result.gtf --force --orphanThis should generate result.gtf file and interactive HTML report result.gtf.Report.html
The resulting gtf file will contain:
- input features - untouched
- input transcripts extended - the 2nd column (source) changed to
GeneExt - inferred orphan peaks - exon,transcript/gene triplets per orphan cluster; the "source" field is
GeneExt_orphan
The updated features can be easily tracked by their source column (2nd):
cat result.gtf | awk '$3=="gene"' | cut -f 2 | sort | uniq -c
# 35 Genbank
# 14 GeneExt_orphan
cat result.gtf | awk '$3=="transcript"' | cut -f 2 | sort | uniq -c
# 14 Genbank
# 21 GeneExt
# 14 GeneExt_orphanThe output above suggests there are 14 orphan peaks (GeneExt_orphan), and 21 genes extended (the source of the transcript has changed to GeneExt); 14 input genes have been left unchanged.
Most errors with GeneExt come from improperly formatted files. If you encounter errors, please, try standardizing your annotation file with AGAT.
For details on how to obtain the alignment file, please refer to the manual.
If problems persist, don't hesitate to contact the authors.
If you use this tool, please cite:
Grygoriy Zolotarov, Xavier Grau-Bové, Arnau Sebé-Pedrós, GeneExt: a gene model extension tool for enhanced single-cell RNA-seq analysis, Bioinformatics, Volume 42, Issue 3, March 2026, btag094, https://doi.org/10.1093/bioinformatics/btag094
