Export TSV: Incorporate alignment#173
Merged
singjc merged 33 commits intoPyProphet:masterfrom Nov 27, 2025
Merged
Conversation
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…parquet handling Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
… data Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…overy Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…by converting to Int64 Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…olumn - Added 'pep' column to the output of test_pyprophet_export.test_osw_analysis with split_parquet set to False. - Updated output of test_pyprophet_export.test_osw_analysis with split_parquet set to True to reflect the addition of the 'pep' column.
…xport Add SCORE_ALIGNMENT integration and MS2 PEP to export TSV/matrix methods with auto-detection and quality control
…up_id to reference features - Cast REFERENCE_FEATURE_ID to BIGINT/INTEGER in SQL queries to prevent precision loss - Add logic to assign alignment_group_id to reference features - Applied fixes to split_parquet.py, parquet.py, and osw.py export modules Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
The CAST operations in JOIN conditions prevented database indexes from being used, causing ~50 minute performance regression. Solution: let database use native integer types for fast joins, then cast result columns in pandas to preserve precision. Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Added CAST(FEATURE.ID AS INTEGER) in SELECT clauses to ensure pandas reads large feature IDs correctly. CAST in SELECT preserves precision without the performance penalty of CAST in JOIN conditions. Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…up ID assignment Added comprehensive documentation on: - Precision preservation for large feature IDs using CAST in SELECT - alignment_group_id assignment to both aligned and reference features - Performance comparison table showing CAST placement impact - Updated workflow diagrams to show new columns and processing steps Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…issue Fix alignment reference feature ID precision loss and missing group assignment
…ment handling; improve precision for ID columns and streamline SQL queries.
…and EXP_IM_RIGHTWIDTH columns
…ing and IM boundaries
…core column references
… from split Parquet files
…ibutions function
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request adds support for exporting aligned features from OpenSWATH results when alignment data is available, allowing recovery of peaks with good alignment scores. It introduces new configuration options and CLI flags to control alignment-based feature recovery, and implements logic to merge aligned features into export results, ensuring only high-quality alignments and reference features are included.
Alignment feature export support:
use_alignmentandmax_alignment_pepoptions toExportIOConfigand CLI commands (export_tsv,export_matrix) to enable and configure alignment-based feature recovery. [1] [2] [3] [4] [5] [6] [7]parquet.pyto detect presence of alignment files and, if enabled, merge aligned features passing the alignment PEP threshold and with reference features passing MS2 QVALUE into export results. [1] [2] [3] [4] [5]Data merging and integrity:
Codebase enhancements:
_fetch_alignment_featuresinparquet.pyto load and filter alignment data, including robust error handling and column mapping for integration.