Skip to content

Rename stochastic_ld_sample to sketch_samples in RSS interfaces#470

Merged
gaow merged 4 commits intoStatFunGen:mainfrom
hsun3163:main
Apr 18, 2026
Merged

Rename stochastic_ld_sample to sketch_samples in RSS interfaces#470
gaow merged 4 commits intoStatFunGen:mainfrom
hsun3163:main

Conversation

@hsun3163
Copy link
Copy Markdown
Contributor

@hsun3163 hsun3163 commented Apr 17, 2026

Summary

This PR updates the RSS sketch-LD interface to use sketch_samples instead of stochastic_ld_sample, aligning pecotmr with
the current susieR::susie_rss() API.

It also fixes one sketch-LD loading bug: U_MIN / U_MAX from PLINK2 .afreq files were being read but dropped before
genotype rescaling, so the exact sketch inversion path could not run.

Motivation

There were two related issues:

  1. Interface mismatch:
  • pecotmr used stochastic_ld_sample
  • current susieR uses sketch_samples
  1. Sketch genotype restoration bug:
  • read_afreq() correctly reads U_MIN / U_MAX
  • but load_plink2_data() only merged id, alt_freq, and obs_ct
  • so u_min / u_max were discarded before stochastic genotype inversion
  • as a result, sketch genotype loading fell back to the warning path instead of restoring the original scale

Changes

API rename

  • rename stochastic_ld_sample to sketch_samples in:
    • susie_rss_wrapper()
    • susie_rss_pipeline()
    • rss_analysis_pipeline()
  • update internal forwarding to pass sketch_samples
  • regenerate the related .Rd files

Sketch LD loading fix

In load_plink2_data():

  • preserve u_min and u_max from .afreq when merging allele-frequency metadata into variant_info

variant_info <- merge(variant_info, afreq[, c("id", "alt_freq", "obs_ct")],
by = "id", all.x = TRUE, sort = FALSE)

After:

afreq_cols <- intersect(c("id", "alt_freq", "obs_ct", "u_min", "u_max"), colnames(afreq))
variant_info <- merge(variant_info, afreq[, afreq_cols, drop = FALSE],
by = "id", all.x = TRUE, sort = FALSE)

Why this matters:

  • downstream code already checks for u_min / u_max
  • without carrying those columns forward, the exact inversion branch is unreachable
  • with this fix, stochastic sketch genotypes can be restored from min-max scaled PLINK2 storage as intended

Impact

  • no intended methodological change
  • interface naming is now consistent with susieR
  • sketch-LD inversion now uses U_MIN / U_MAX when available, instead of unnecessarily falling back

Notes

This PR combines:

  • an API-alignment change (stochastic_ld_sample -> sketch_samples)
  • a small bug fix in sketch genotype restoration

@gaow gaow merged commit 47035cb into StatFunGen:main Apr 18, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants