Skip to content

Releases: Fu-Yilei/methphaser

MethPhaser v0.4.0

17 Feb 06:48

Choose a tag to compare

MethPhaser v0.4.0 Pre-release
Pre-release

MethPhaser v0.4.0

Summary

This release introduces evidence-based parental-origin assignment for phaseblocks using directional imprinting regions and haplotype-specific methylation evidence. It also packages a built-in, genome-wide directional imprinting database.


What Changed

Added

  • Built-in directional imprinting BED database:
    • imprinting_regions.hg38.bed
  • Provenance/source notes for the database:
    • imprinting_regions.hg38.SOURCES.md
  • Reproducible database builder:
    • build_imprinting_database.py
  • Packaged BED resources included in the Python distribution.
  • New parent-assignment engine:
    • imprinting.py

Updated

  • pipeline.py updated to pass merged BAM into parent assignment stage.
  • Documentation and architecture notes revised.
  • Tests added/updated for:
    • Imprinting logic
    • Integration-level output validation

Parent Assignment Logic (New)

Previous Behavior

Imprinting-overlapping phaseblocks were effectively assigned with a fixed rule:

  • H1 = father
  • H2 = mother

No methylation evidence was used to determine directionality.


New Evidence-Based Behavior

Parent assignment now proceeds as follows:

  1. Identify directional imprinting regions
    Determine maternal- or paternal-methylated regions overlapping each phaseblock.

  2. Compute haplotype methylation means
    For each overlapping directional region:

    • Extract haplotype-specific methylation from BAM MM/ML tags
    • Use HP-tagged reads
    • Compute mean methylation for H1 and H2
  3. Convert region to paternal-haplotype vote

    Based on expected imprinting direction:

    • Maternal-methylated region
      → Lower-methylation haplotype is paternal

    • Paternal-methylated region
      → Higher-methylation haplotype is paternal

  4. Aggregate weighted votes
    Combine votes across all directional regions overlapping the phaseblock.

  5. Make assignment decision

    • Assign parent only if votes are decisive.
    • Otherwise, leave assignment as random.

Output Changes

phaseblock_parent_assignment.tsv now includes evidence-oriented fields:

Core Assignment Fields

  • paternal_haplotype
  • h1_parent
  • h2_parent
  • assignment_basis

Evidence Metrics

  • directional_region_count
  • decisive_region_count
  • h1_vote_weight
  • h2_vote_weight
  • vote_margin

Overlap Metadata

  • overlap_bp
  • Source tracks/databases
  • Region IDs

Database Directionality Notes

Directional labels in the bundled database are inferred from:

  • Gamete methylation means (oocyte vs sperm)
  • Configured methylation thresholds

Regions without strong directional signal are labeled:

  • unknown

These regions do not drive parent-assignment decisions.


Current Dataset Counts

  • paternal: 1093
  • maternal: 481
  • unknown: 3927

Version

Package version bumped to 0.4.0 in:

  • pyproject.toml
  • __init__.py

Validation

  • Test suite: 15 passed
  • One multiprocessing warning observed in integration execution context