Releases: Fu-Yilei/methphaser
MethPhaser v0.4.0
MethPhaser v0.4.0
Summary
This release introduces evidence-based parental-origin assignment for phaseblocks using directional imprinting regions and haplotype-specific methylation evidence. It also packages a built-in, genome-wide directional imprinting database.
What Changed
Added
- Built-in directional imprinting BED database:
imprinting_regions.hg38.bed
- Provenance/source notes for the database:
imprinting_regions.hg38.SOURCES.md
- Reproducible database builder:
build_imprinting_database.py
- Packaged BED resources included in the Python distribution.
- New parent-assignment engine:
imprinting.py
Updated
pipeline.pyupdated to pass merged BAM into parent assignment stage.- Documentation and architecture notes revised.
- Tests added/updated for:
- Imprinting logic
- Integration-level output validation
Parent Assignment Logic (New)
Previous Behavior
Imprinting-overlapping phaseblocks were effectively assigned with a fixed rule:
H1 = fatherH2 = mother
No methylation evidence was used to determine directionality.
New Evidence-Based Behavior
Parent assignment now proceeds as follows:
-
Identify directional imprinting regions
Determine maternal- or paternal-methylated regions overlapping each phaseblock. -
Compute haplotype methylation means
For each overlapping directional region:- Extract haplotype-specific methylation from BAM
MM/MLtags - Use HP-tagged reads
- Compute mean methylation for
H1andH2
- Extract haplotype-specific methylation from BAM
-
Convert region to paternal-haplotype vote
Based on expected imprinting direction:
-
Maternal-methylated region
→ Lower-methylation haplotype is paternal -
Paternal-methylated region
→ Higher-methylation haplotype is paternal
-
-
Aggregate weighted votes
Combine votes across all directional regions overlapping the phaseblock. -
Make assignment decision
- Assign parent only if votes are decisive.
- Otherwise, leave assignment as random.
Output Changes
phaseblock_parent_assignment.tsv now includes evidence-oriented fields:
Core Assignment Fields
paternal_haplotypeh1_parenth2_parentassignment_basis
Evidence Metrics
directional_region_countdecisive_region_counth1_vote_weighth2_vote_weightvote_margin
Overlap Metadata
overlap_bp- Source tracks/databases
- Region IDs
Database Directionality Notes
Directional labels in the bundled database are inferred from:
- Gamete methylation means (oocyte vs sperm)
- Configured methylation thresholds
Regions without strong directional signal are labeled:
unknown
These regions do not drive parent-assignment decisions.
Current Dataset Counts
- paternal: 1093
- maternal: 481
- unknown: 3927
Version
Package version bumped to 0.4.0 in:
pyproject.toml__init__.py
Validation
- Test suite: 15 passed
- One multiprocessing warning observed in integration execution context