Skip to content

LDSC fails after harmonization #213

@ampregnall

Description

@ampregnall

Hello! Thank you again for creating and maintaining this package. I am running into a potential bug. If I use the harmonize() function and then try to perform LDSC to account for test statistic inflation I get an error. My code is as follows:

sumstats.harmonize(
    ref_seq=args.fasta,
    ref_rsid_vcf=args.dbsnp, 
    ref_infer=args.popvcf, # Ancestry specific. Logic handled by Nextflow
    ref_alt_freq="AF", 
    threads=args.threads, # pass threads from nextflow process, 
    sweep_mode=True 
    )

# Perform LDSC correction
sumstats_hapmap3 = sumstats.filter_hapmap3(inplace=False)
sumstats_hapmap3.estimate_h2_by_ldsc(ref_ld = args.ldsc,  w_ld = args.ldsc)


if np.float64(sumstats_hapmap3.ldsc_h2['Intercept'][0]) > 1:
     # Perform correction

The error traceback is:

Traceback (most recent call last): 
File "ampregnall/tools/nf-meta-gwas/bin/munge_sumstats.py", line 43, 
in <module> sumstats_hapmap3.estimate_h2_by_ldsc(ref_ld = args.ldsc, w_ld = args.ldsc) 
File "/opt/conda/lib/python3.12/site-packages/gwaslab/g_Sumstats.py", line 1544, 
in estimate_h2_by_ldsc self.ldsc_h2, self.ldsc_h2_results = _estimate_h2_by_ldsc(insumstats=insumstats,
    File "/opt/conda/lib/python3.12/site-packages/gwaslab/qc/qc_decorator.py", line 241, in wrapper result = func(*args, **kwargs)
    File "/opt/conda/lib/python3.12/site-packages/gwaslab/util/util_ex_ldsc.py", line 404, 
    in _estimate_h2_by_ldsc summary = estimate_h2(sumstats, args = default_kwargs, log = log) 
    File "/opt/conda/lib/python3.12/site-packages/gwaslab/extension/ldsc/ldsc_sumstats.py", line 331, 
    in estimate_h2 M_annot, w_ld_cname, ref_ld_cnames, sumstats, novar_cols = _read_ld_sumstats( 
    File "/opt/conda/lib/python3.12/site-packages/gwaslab/extension/ldsc/ldsc_sumstats.py", line 254, 
    in _read_ld_sumstats sumstats = _merge_and_log(ref_ld, sumstats, 'reference panel LD', log) 
    File "/opt/conda/lib/python3.12/site-packages/gwaslab/extension/ldsc/ldsc_sumstats.py", line 239, 
    in _merge_and_log raise ValueError(msg.format(N=len(sumstats), F=noun)) 
ValueError: -After merging with reference panel LD, 0 SNPs remain.

However, if I save the summary statistics from the harmonize step and call a separate script that loads that file and performs LDSC everything works. From my pipeline design perspective I think separating the processes is actually better, but something weird seems to be happening. I have attached my log file. Thanks!

gwaslab-logs.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions