Skip to content

Conversation

@singjc
Copy link
Contributor

@singjc singjc commented Oct 29, 2025

This pull request introduces a new option to control whether transition-level data is included in the exported parquet files, and significantly improves the flexibility and completeness of score column handling in the export process. It also enhances the robustness of SQL query generation and table creation for various export scenarios.

The most important changes are:

New feature: Control inclusion of transition data in exports

  • Added an include_transition_data option to both the configuration (ExportIOConfig) and the CLI (export_parquet), allowing users to choose whether to include transition-level data in the exported parquet files. The export logic now respects this option and skips transition data export if disabled. [1] [2] [3] [4] [5] [6] [7] [8]

Score column and SQL query improvements

  • Added dynamic detection and inclusion of transition-level score columns in export queries, using the new _build_transition_score_columns_and_join helper. Transition score columns are now included only if the relevant table exists, and appropriate NULL columns are added to maintain table consistency between precursor and transition exports. [1] [2] [3] [4] [5] [6] [7] [8]

  • Enhanced temporary table creation to dynamically include all relevant score columns (MS1, MS2, IPF, peptide, protein, and transition scores) with appropriate types, ensuring the exported schema matches the data. [1] [2]

Robustness and correctness fixes

  • Improved filtering of columns when preparing export column info to ensure only columns with valid types are included, preventing potential SQL errors.

  • Fixed SQL join logic in peptide/protein score table generation to include RUN_ID in the join condition, ensuring correct mapping of global and non-global scores. [1] [2]

Copilot AI and others added 18 commits October 29, 2025 02:49
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…quet export

Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…t matching RUN_ID

Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…n_data flag

Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Add SCORE_ table export and transition data control flag to parquet export
Added SCORE_IPF table handling in _build_score_column_selection_and_joins() method and related export functions to include IPF scores when exporting OSW files to parquet format.

Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Added test_parquet_export_with_ipf to verify that SCORE_IPF columns are correctly included when exporting OSW files with IPF scoring to parquet format.

Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Include SCORE_IPF table in OSW to parquet export
@singjc singjc merged commit ca64baf into PyProphet:master Oct 29, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants