Skip to content

Fix OTTERS lassosum selector parity#482

Open
hsun3163 wants to merge 5 commits intoStatFunGen:mainfrom
hsun3163:fix/otters-lassosum-selector
Open

Fix OTTERS lassosum selector parity#482
hsun3163 wants to merge 5 commits intoStatFunGen:mainfrom
hsun3163:fix/otters-lassosum-selector

Conversation

@hsun3163
Copy link
Copy Markdown
Contributor

@hsun3163 hsun3163 commented Apr 23, 2026

Summary

This PR fixes the OTTERS lassosum regression by replacing the default OTTERS selector with the LD-quadratic pseudovalidation score:

score(beta) = (c^T beta) / sqrt(beta^T R beta)

where:

  • c is the aligned summary-statistics correlation vector
  • R is the supplied LD correlation matrix
  • beta is one candidate on the lassosum (s, lambda) path

This also removes the earlier genotype-format-specific selector patch. min(fbeta) is kept only as an explicit debug option.

Root Cause

Old OTTERS did not select lassosum models by min(fbeta). It fit the beta path and then used lassosum pseudovalidation to choose the final (s, lambda).

The refactor changed that selector to min(fbeta), and the OTTERS wrapper also double-scaled the lassosum input before it reached the low-level solver.

Fixture 206 isolates the selector bug cleanly:

  • old saved vs old direct published lassosum: Pearson 1.0, 0 opposite-sign variants
  • corrected-scaling + min(fbeta): Pearson about 0.360, 1309 opposite-sign variants

Published lassosum selected s = 0.2, lambda = 1e-4, while min(fbeta) selected s = 1, lambda = 1e-4 on the same grid. This is not a grid-definition problem. It is a selector
regression.

Mathematical Rationale

Old pseudovalidation can be written as:

scaled_beta = beta / sd
pred = X * scaled_beta
score = (c^T beta) / sqrt(Var(pred))

After centering and standardizing the reference matrix columns by the same per-variant scale, this becomes:

score(beta) = (c^T beta) / sqrt(beta^T R beta)

So the selector can be evaluated directly from summary-statistics correlation and LD, without using genotype explicitly.

Validation

PLINK1 source: genotype matrix vs LD-quadratic

The LD-quadratic score matches PLINK1 genotype pseudovalidation essentially exactly.

  • Fixture 161:
    • PLINK1 genotype best: soft_lambda=0.041050213
    • PLINK1 LD-quadratic best: soft_lambda=0.041050213
    • Pearson 0.9999999
    • same best candidate TRUE
  • Fixture 206:
    • PLINK1 genotype best: soft_lambda=0.029906976
    • PLINK1 LD-quadratic best: soft_lambda=0.029906976
    • Pearson 1.0000000
    • same best candidate TRUE

This validates the selector formula itself.

Sketch source: sample matrix vs LD-quadratic

For the sketch source, the sample-matrix pseudovalidation and the LD-quadratic score are the same numeric object once both are built from the same restored sketch matrix and the
same column standardization.

  • Fixture 161:
    • sketch sample-matrix best: soft_lambda=0.021788613
    • sketch LD-quadratic best: soft_lambda=0.021788613
    • Pearson 1.0
    • max absolute difference < 1e-15
    • same best candidate TRUE

So the remaining mismatch is not between sample-matrix pseudovalidation and quadratic LD scoring. It is between the current sketch-derived standardized LD path and the PLINK1/
genotype-backed standardized LD path.

What This PR Changes

R/regularized_regression.R

  • fixes the OTTERS lassosum scaling contract so correlation input is only converted once before the low-level solver
  • makes lassosum_rss_weights() use ld_quadratic by default
  • keeps min(fbeta) only as an explicit debug option
  • preserves first-max tie behavior for equal selector scores

R/otters.R

  • passes correlation-scale statistics into lassosum explicitly via stat$cor and stat$z
  • removes the temporary genotype-source and variant-metadata plumbing that was only needed for the earlier compatibility patch

@gaow gaow closed this Apr 24, 2026
@gaow gaow reopened this Apr 24, 2026
@danielnachun danielnachun force-pushed the fix/otters-lassosum-selector branch from 9f378b1 to e01db04 Compare April 24, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants