-
Notifications
You must be signed in to change notification settings - Fork 21
[UPDATE] export parquet, include score tables into export if present #170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…quet export Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…t matching RUN_ID Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
…n_data flag Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Add SCORE_ table export and transition data control flag to parquet export
Added SCORE_IPF table handling in _build_score_column_selection_and_joins() method and related export functions to include IPF scores when exporting OSW files to parquet format. Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Added test_parquet_export_with_ipf to verify that SCORE_IPF columns are correctly included when exporting OSW files with IPF scoring to parquet format. Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Include SCORE_IPF table in OSW to parquet export
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new option to control whether transition-level data is included in the exported parquet files, and significantly improves the flexibility and completeness of score column handling in the export process. It also enhances the robustness of SQL query generation and table creation for various export scenarios.
The most important changes are:
New feature: Control inclusion of transition data in exports
include_transition_dataoption to both the configuration (ExportIOConfig) and the CLI (export_parquet), allowing users to choose whether to include transition-level data in the exported parquet files. The export logic now respects this option and skips transition data export if disabled. [1] [2] [3] [4] [5] [6] [7] [8]Score column and SQL query improvements
Added dynamic detection and inclusion of transition-level score columns in export queries, using the new
_build_transition_score_columns_and_joinhelper. Transition score columns are now included only if the relevant table exists, and appropriate NULL columns are added to maintain table consistency between precursor and transition exports. [1] [2] [3] [4] [5] [6] [7] [8]Enhanced temporary table creation to dynamically include all relevant score columns (MS1, MS2, IPF, peptide, protein, and transition scores) with appropriate types, ensuring the exported schema matches the data. [1] [2]
Robustness and correctness fixes
Improved filtering of columns when preparing export column info to ensure only columns with valid types are included, preventing potential SQL errors.
Fixed SQL join logic in peptide/protein score table generation to include
RUN_IDin the join condition, ensuring correct mapping of global and non-global scores. [1] [2]