Skip to content

logsdon-lab/CenMAP

Repository files navigation

CenMAP

CI GitHub Release install with bioconda

A centromere mapping and annotation pipeline for T2T human and non-human primate genome assemblies implemented in Snakemake.


Chr1 α-satellite higher-order repeat structure, centromere dip regions, and self-identity plot

Chr12 α-satellite HOR arrays

Ideogram

Cumulative α-satellite HOR array lengths
conda install bioconda::cenmap

For a single assembly:

# Find centromeres in human samples.
cenmap -i asm*.fa.gz -s HG002
# Find centromeres in non-human primate samples.
cenmap -i asm*.fa.gz -s mPanTro3 --mode nhp
# Find centromeres and validate with nucflag.
cenmap -i asm*.fa.gz -s HG002 --hifi hifi*.fq.gz
# Find centromeres and determine centromere dip regions.
cenmap -i asm*.fa.gz -s HG002 --ont ont*.bam
# Find centromeres, validate with nucflag, and determine centromere dip regions.
cenmap -i asm*.fa.gz -s HG002 --hifi hifi*.fq.gz --ont ont*.bam

For multiple assemblies:

# Create new config.
cenmap --generate-config > example.yaml
# Modify config parameters and include multiple samples.
# ...
# Then run.
cenmap --config example.yaml
  • Verkko or hifiasm human or non-human primate genome assemblies
  • PacBio HiFi reads used in the assemblies
  • (Optional) Unaligned BAM files with 5mC modifications at CpG sites.
  • Complete and correctly assembled centromere sequences and their regions validated by NucFlag.
  • Centromere α-satellite higher order repeat (HOR) array lengths via censtats.
  • RepeatMasker and HumAS-SD α-satellite HOR monomer of SF annotations and plots.
  • ModDotPlot sequence identity plots.
  • Combined sequence identity and HOR array structure plots via cenplot.
  • (Optional) Centromere dip region (CDRs) with CDR-Finder

Read the docs on the CenMAP wiki.

To run tests, refer to the wiki page.

Cite

Gao S, Oshima KK, Chuang SC, Loftus M, Montanari A, Gordon DS, Human Genome Structural Variation Consortium, Human Pangenome Reference Consortium, Hsieh P, Konkel MK, Ventura M, Logsdon GA. A global view of human centromere variation and evolution. bioRxiv. 2025. p. 2025.12.09.693231. doi:10.64898/2025.12.09.693231

About

Centromere mapping and annotation pipeline

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages