A centromere mapping and annotation pipeline for T2T human and non-human primate genome assemblies implemented in Snakemake.
Chr1 α-satellite higher-order repeat structure, centromere dip regions, and self-identity plot |
Chr12 α-satellite HOR arrays |
Ideogram |
Cumulative α-satellite HOR array lengths |
conda install bioconda::cenmapFor a single assembly:
# Find centromeres in human samples.
cenmap -i asm*.fa.gz -s HG002
# Find centromeres in non-human primate samples.
cenmap -i asm*.fa.gz -s mPanTro3 --mode nhp
# Find centromeres and validate with nucflag.
cenmap -i asm*.fa.gz -s HG002 --hifi hifi*.fq.gz
# Find centromeres and determine centromere dip regions.
cenmap -i asm*.fa.gz -s HG002 --ont ont*.bam
# Find centromeres, validate with nucflag, and determine centromere dip regions.
cenmap -i asm*.fa.gz -s HG002 --hifi hifi*.fq.gz --ont ont*.bamFor multiple assemblies:
# Create new config.
cenmap --generate-config > example.yaml
# Modify config parameters and include multiple samples.
# ...
# Then run.
cenmap --config example.yamlVerkkoorhifiasmhuman or non-human primate genome assemblies- PacBio HiFi reads used in the assemblies
- (Optional) Unaligned BAM files with 5mC modifications at CpG sites.
- Complete and correctly assembled centromere sequences and their regions validated by
NucFlag. - Centromere α-satellite higher order repeat (HOR) array lengths via
censtats. RepeatMaskerandHumAS-SDα-satellite HOR monomer of SF annotations and plots.ModDotPlotsequence identity plots.- Combined sequence identity and HOR array structure plots via
cenplot. - (Optional) Centromere dip region (CDRs) with
CDR-Finder
Read the docs on the CenMAP wiki.
To run tests, refer to the wiki page.
Gao S, Oshima KK, Chuang SC, Loftus M, Montanari A, Gordon DS, Human Genome Structural Variation Consortium, Human Pangenome Reference Consortium, Hsieh P, Konkel MK, Ventura M, Logsdon GA. A global view of human centromere variation and evolution. bioRxiv. 2025. p. 2025.12.09.693231. doi:10.64898/2025.12.09.693231



