Skip to content

Export TSV: Incorporate alignment#173

Merged
singjc merged 33 commits intoPyProphet:masterfrom
singjc:master
Nov 27, 2025
Merged

Export TSV: Incorporate alignment#173
singjc merged 33 commits intoPyProphet:masterfrom
singjc:master

Conversation

@singjc
Copy link
Contributor

@singjc singjc commented Nov 26, 2025

This pull request adds support for exporting aligned features from OpenSWATH results when alignment data is available, allowing recovery of peaks with good alignment scores. It introduces new configuration options and CLI flags to control alignment-based feature recovery, and implements logic to merge aligned features into export results, ensuring only high-quality alignments and reference features are included.

Alignment feature export support:

  • Added use_alignment and max_alignment_pep options to ExportIOConfig and CLI commands (export_tsv, export_matrix) to enable and configure alignment-based feature recovery. [1] [2] [3] [4] [5] [6] [7]
  • Implemented logic in parquet.py to detect presence of alignment files and, if enabled, merge aligned features passing the alignment PEP threshold and with reference features passing MS2 QVALUE into export results. [1] [2] [3] [4] [5]

Data merging and integrity:

  • Ensured that aligned features not already present in base results are added, and merged alignment scores and reference info into both recovered and existing features. Also assigned alignment group IDs to reference features for correct grouping.

Codebase enhancements:

  • Added a new method _fetch_alignment_features in parquet.py to load and filter alignment data, including robust error handling and column mapping for integration.
  • Minor import adjustment to support new functionality.

Copilot AI and others added 30 commits October 24, 2025 20:53
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…parquet handling

Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
… data

Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…overy

Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…by converting to Int64

Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…olumn

- Added 'pep' column to the output of test_pyprophet_export.test_osw_analysis with split_parquet set to False.
- Updated output of test_pyprophet_export.test_osw_analysis with split_parquet set to True to reflect the addition of the 'pep' column.
…xport

Add SCORE_ALIGNMENT integration and MS2 PEP to export TSV/matrix methods with auto-detection and quality control
…up_id to reference features

- Cast REFERENCE_FEATURE_ID to BIGINT/INTEGER in SQL queries to prevent precision loss
- Add logic to assign alignment_group_id to reference features
- Applied fixes to split_parquet.py, parquet.py, and osw.py export modules

Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
The CAST operations in JOIN conditions prevented database indexes from being used,
causing ~50 minute performance regression. Solution: let database use native integer
types for fast joins, then cast result columns in pandas to preserve precision.

Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Added CAST(FEATURE.ID AS INTEGER) in SELECT clauses to ensure pandas reads
large feature IDs correctly. CAST in SELECT preserves precision without the
performance penalty of CAST in JOIN conditions.

Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…up ID assignment

Added comprehensive documentation on:
- Precision preservation for large feature IDs using CAST in SELECT
- alignment_group_id assignment to both aligned and reference features
- Performance comparison table showing CAST placement impact
- Updated workflow diagrams to show new columns and processing steps

Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…issue

Fix alignment reference feature ID precision loss and missing group assignment
…ment handling; improve precision for ID columns and streamline SQL queries.
@singjc singjc merged commit 3173ac3 into PyProphet:master Nov 27, 2025
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants