Skip to content

Issues with NEA Alignment to hg38 Reference After hg19→hg38 Conversion #208

@wangyiyisheng

Description

@wangyiyisheng

Hi Yunye
I converted my summary data from hg19 to hg38, and then tried to align it with the reference sequence using hg38.fa, but I found that not a single variant matched successfully. What could be the reason?

The code I ran is as follows:
mysumstats.liftover(from_build="19", to_build="38")
2026/02/20 16:30:15 Start to perform liftover ...(v4.0.4)
2026/02/20 16:30:15 -Using built-in chain file: /home/drwang/miniconda3/envs/gwaslab/lib/python3.12/site-packages/gwaslab/data/chains/hg19ToHg38.over.chain.gz
2026/02/20 16:30:15 -Converting variants with status code xxx0xxx: 34,371
2026/02/20 16:30:15 -Target build: 38
2026/02/20 16:30:15 -Input positions are 1-based
2026/02/20 16:30:15 -Output positions will be 1-based
2026/02/20 16:30:12 -Chromosome mismatches detected: 1 variants (treated as unmapped)
2026/02/20 16:30:12 -Examples of chromosome mismatches:
2026/02/20 16:30:12 SNPID=17:34932498:C:G | CHR=17 | POS=34932498 | CHR_LIFT=17_KI270857v1_alt | POS_LIFT=811563 | STATUS=1960099
2026/02/20 16:30:12 -Mapped: 34364 variants
2026/02/20 16:30:12 -Unmapped: 7 variants
2026/02/20 16:30:12 -Examples of unmapped variants:
2026/02/20 16:30:12 SNPID=1:2692477:G:A | CHR=1 | POS=2692477 | STATUS=1960099
2026/02/20 16:30:12 SNPID=1:2692487:G:T | CHR=1 | POS=2692487 | STATUS=1960099
2026/02/20 16:30:12 SNPID=17:34869155:G:A | CHR=17 | POS=34869155 | STATUS=1960099
2026/02/20 16:30:12 SNPID=17:34932498:C:G | CHR=17 | POS=34932498 | STATUS=1960099
2026/02/20 16:30:12 SNPID=17:66078395:G:A | CHR=17 | POS=66078395 | STATUS=1960099
2026/02/20 16:30:12 -Removed 7 unmapped variants
2026/02/20 16:30:12 Start to fix chromosome notation (CHR) ...(v4.0.4)
2026/02/20 16:30:12 -Current Dataframe shape : 34364 x 13 ; Memory usage: 3.18 MB
2026/02/20 16:30:12 -Checking CHR data type...
2026/02/20 16:30:12 -Variants with standardized chromosome notation: 34364
2026/02/20 16:30:12 -All CHR are already fixed...
2026/02/20 16:30:12 Finished fixing chromosome notation (CHR).
2026/02/20 16:30:12 Start to fix basepair positions (POS) ...(v4.0.4)
2026/02/20 16:30:12 -Trying to convert datatype for POS: Int64 -> Int64...
2026/02/20 16:30:12 -Position bound:(0 , 250,000,000)
2026/02/20 16:30:12 -No outlier variants were removed.
2026/02/20 16:30:12 -Removed variants with bad positions: 0
2026/02/20 16:30:12 Finished fixing basepair positions (POS).
2026/02/20 16:30:12 Finished liftover.

mysumstats.check_ref(ref_seq="hg38.fa")
2026/02/20 16:30:21 Start to check if NEA is aligned with reference sequence ...(v4.0.4)
2026/02/20 16:30:21 -Reference genome FASTA file: hg38.fa
2026/02/20 16:30:21 -Loading and building numpy fasta records:
2026/02/20 16:43:40 -Variants allele on given reference sequence : 0
2026/02/20 16:43:40 -Variants flipped : 0
2026/02/20 16:43:40 -Raw Matching rate : 0.00%
2026/02/20 16:43:40 #WARNING! Matching rate is low, please check if the right reference genome is used.
2026/02/20 16:43:40 -Variants inferred reverse_complement : 0
2026/02/20 16:43:40 -Variants inferred reverse_complement_flipped : 0
2026/02/20 16:43:40 -Both allele on genome + unable to distinguish : 0
2026/02/20 16:43:40 -Variants not on given reference sequence : 0
2026/02/20 16:43:40 Finished checking if NEA is aligned with reference sequence.
<gwaslab.g_Sumstats.Sumstats object at 0x76a003187320>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions