fix(maxQuant): Remove unmodified peptides from PTM dataset#116
fix(maxQuant): Remove unmodified peptides from PTM dataset#116tonywu1999 merged 4 commits intodevelfrom
Conversation
|
""" WalkthroughThe internal logic of the Changes
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Pull Request Overview
This PR refines the MaxQtoMSstatsPTMFormat converter to remove unmodified peptides from the PTM dataset when requested and to return early when protein evidence is supplied.
- Moves the default
MSstatsPTMformatassignment into branch-specific code and adds an early return for theevidence_protcase - Adds filtering of
msstatsptm_inputto drop peptides lacking the specifiedmod_idwhenuse_unmod_peptides = FALSE - Adjusts list construction for PTM/PROTEIN outputs based on the new branching logic
Comments suppressed due to low confidence (3)
R/converters.R:663
- The new conditional branch for filtering unmodified peptides lacks unit tests. Please add tests covering both
use_unmod_peptides = TRUEandFALSEscenarios to verify expected outputs.
if (use_unmod_peptides){
R/converters.R:659
- [nitpick] Quotation style for list names is inconsistent across branches (sometimes quoted, sometimes not). Standardize to unquoted identifiers (e.g., PTM = …) for clarity and consistency.
"PROTEIN" = msstats.abun)
R/converters.R:667
- Construction of
MSstatsPTMformatis duplicated in multiple branches. Consider extracting a helper or building the list dynamically to reduce repetition.
MSstatsPTMformat = list(PTM = msstatsptm_input,
| MSstatsPTMformat = list(PTM = msstatsptm_input, | ||
| PROTEIN = msstats.abun) | ||
| } else { | ||
| msstatsptm_input = msstatsptm_input[grepl(mod_id, msstatsptm_input$PeptideSequence),] |
There was a problem hiding this comment.
Using grepl without fixed = TRUE treats mod_id as a regex. If mod_id is literal, add fixed = TRUE to avoid unintended regex interpretation.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
inst/tinytest/test_converters.R (1)
8-15: Update function documentation to include the new parameter.The function signature was updated to include a
global_profilingparameter, but the documentation comment above doesn't mention this new parameter.#' Validate positive number of rows from converter output #' #' @author Anthony Wu #' #' @param msstats_ptm_input A list containing PTM and PROTEIN data tables +#' @param global_profiling Boolean indicating whether to validate PROTEIN data presence (default TRUE) .validatePositiveNumberOfRows = function(msstats_ptm_input, global_profiling = TRUE) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
inst/tinytest/test_converters.R(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (4)
inst/tinytest/test_converters.R (4)
8-15: Validation logic correctly implements the expected behavior.The conditional logic appropriately handles both scenarios:
- When
global_profiling=TRUE: validates both PTM and PROTEIN data have rows- When
global_profiling=FALSE: validates PTM has rows but PROTEIN is NULLThis aligns with the PR objective to optionally exclude unmodified peptides (global profiling data).
96-108: Test case correctly validates use_unmod_peptides=FALSE behavior.The test properly:
- Calls
MaxQtoMSstatsPTMFormatwithuse_unmod_peptides=FALSE- Uses
global_profiling=FALSEin validation to expect only PTM data- Validates expected protein IDs and PTM modifications are present
- Confirms all peptides contain the phosphorylation modification
This test case effectively verifies the PR's core functionality.
176-190: Test case for MaxQ LF properly mirrors the TMT test structure.The LF test case follows the same pattern as the TMT test:
- Sets
use_unmod_peptides=FALSE- Validates with
global_profiling=FALSE- Checks for expected protein IDs and PTM presence
The consistency between test cases is good for maintainability.
90-94: Minor formatting improvement enhances readability.The line breaks in the
.validatePtmSubstringcalls improve code readability without changing functionality.
Motivation and Context
Unmodified peptides were showing up in the PTM dataset after conversion for MaxQuant. We should filter out these peptides since they add additional hypotheses to test that are not necessary to test.
Changes
Testing
Checklist Before Requesting a Review
Summary by CodeRabbit
Summary by CodeRabbit