Conversation
…ength, and sync schema
fixed example and doc
📝 WalkthroughWalkthroughThis PR updates the ProForma parsing API ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 0 |
| Duplication | 2 |
TIP This summary will be updated as you push new changes. Give us feedback
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
qpx/converters/maxquant/psm_adapter.py (1)
165-173:⚠️ Potential issue | 🔴 CriticalCritical:
else Nonebranch will raiseTypeErroron tuple unpacking.When
peptidoformis falsy, the conditional expression evaluates toNone, andpeptidoform, modifications = NoneraisesTypeError: cannot unpack non-iterable NoneType object. The outer_transform_batchtry/except then silently skips the row, which means any PSM whereto_proforma(...)returned an empty string is dropped without a meaningful reason. The fallback tuple must match the unpacking arity.🛠️ Proposed fix
- peptidoform, modifications = ( - from_proforma( - peptidoform, - sequence, - site_scores=site_scores, - ) - if peptidoform - else None - ) + if peptidoform: + peptidoform, modifications = from_proforma( + peptidoform, + sequence, + site_scores=site_scores, + ) + else: + modifications = None + peptidoform = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@qpx/converters/maxquant/psm_adapter.py` around lines 165 - 173, The conditional expression unpacks into two variables (peptidoform, modifications) but returns None when peptidoform is falsy, causing a TypeError; change the fallback to a 2-tuple that matches the unpacking (e.g., replace "else None" with "else (peptidoform, None)" or "else (None, None)") so that peptidoform and modifications are always assigned; update the expression in _transform_batch where from_proforma(...) is called to use the two-element fallback.
🧹 Nitpick comments (5)
qpx/converters/fragpipe/feature_adapter.py (1)
373-387: Optional: consolidateto_proforma+to_modificationsinto one call.
to_modifications(assigned_mods_str, sequence)now returns(peptidoform, modifications)and internally performs the sameto_proformacomputation already done on Line 373. You can avoid the redundant parse by unpacking both values from a single call:♻️ Proposed refactor
- peptidoform = to_proforma(assigned_mods_str, sequence) + # peptidoform + modifications come from a single parse below @@ - # Modifications (reuse assigned_mods_str already extracted for peptidoform) - modifications = None - if assigned_mods_str: - _, modifications = to_modifications(assigned_mods_str, sequence) + # Peptidoform (ProForma) + structured modifications + if assigned_mods_str: + peptidoform, modifications = to_modifications(assigned_mods_str, sequence) + else: + peptidoform = to_proforma(assigned_mods_str, sequence) + modifications = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@qpx/converters/fragpipe/feature_adapter.py` around lines 373 - 387, The code redundantly calls to_proforma(...) and then to_modifications(...), but to_modifications(assigned_mods_str, sequence) already returns (peptidoform, modifications); replace the separate to_proforma call with a single call to to_modifications and unpack both peptidoform and modifications from it (assign peptidoform, modifications = to_modifications(assigned_mods_str, sequence)), ensuring you preserve the existing behavior when assigned_mods_str is falsy (i.e., only call/unpack when assigned_mods_str is present and keep modifications = None otherwise); update references to peptidoform and modifications accordingly and remove the old to_proforma(...) invocation.qpx/converters/diann/feature_adapter.py (1)
233-234: Optional: consolidate into a singleto_modificationscall.
to_modifications(modified_seq, sequence)now returns(peptidoform, modifications)and callsto_proformainternally, making the separateto_proforma(modified_seq)on Line 233 redundant for each unique precursor.- peptidoform = to_proforma(modified_seq) - _, modifications = to_modifications(modified_seq, sequence) + peptidoform, modifications = to_modifications(modified_seq, sequence)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@qpx/converters/diann/feature_adapter.py` around lines 233 - 234, The code currently calls to_proforma(modified_seq) and then to_modifications(modified_seq, sequence); since to_modifications now returns (peptidoform, modifications) and internally calls to_proforma, replace the two calls with a single call that unpacks both values: peptidoform, modifications = to_modifications(modified_seq, sequence); remove the redundant to_proforma invocation and ensure any later use of peptidoform relies on this single assignment (look for occurrences of peptidoform, modifications, to_proforma, and to_modifications in the surrounding scope to update accordingly).qpx/converters/fragpipe/psm_adapter.py (1)
194-247: Optional: avoid parsing Assigned Modifications twice.Lines 194–196 build
peptidoformviato_proforma, and Lines 245–247 re-parse the sameAssigned Modificationsviato_modifications(which internally callsto_proformaagain). Sinceto_modificationsnow returns(peptidoform, modifications), a single call can populate both and drop the discarded first element:♻️ Proposed refactor
- # Peptidoform -- build ProForma from sequence + Assigned Modifications - assigned_mods_raw = row.get("Assigned Modifications") - assigned_mods_str = str(assigned_mods_raw) if pd.notna(assigned_mods_raw) and assigned_mods_raw else "" - peptidoform = to_proforma(assigned_mods_str, sequence) + # Peptidoform + modifications -- one parse of Assigned Modifications + assigned_mods_raw = row.get("Assigned Modifications") + assigned_mods_str = str(assigned_mods_raw) if pd.notna(assigned_mods_raw) and assigned_mods_raw else "" + if assigned_mods_str: + peptidoform, modifications = to_modifications(assigned_mods_str, sequence) + else: + peptidoform = to_proforma(assigned_mods_str, sequence) + modifications = None @@ - # Modifications -- parse Assigned Modifications if present - modifications = None - assigned_mods = row.get("Assigned Modifications") - if pd.notna(assigned_mods) and assigned_mods: - _, modifications = to_modifications(str(assigned_mods), sequence)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@qpx/converters/fragpipe/psm_adapter.py` around lines 194 - 247, The code parses "Assigned Modifications" twice: first calling to_proforma(...) to build peptidoform and later calling to_modifications(...) which also returns (peptidoform, modifications); update the logic to call to_modifications(str(assigned_mods), sequence) once, capture both peptidoform and modifications from its return value, remove the earlier standalone to_proforma(...) call (the variable peptidoform should be set from to_modifications), and ensure any downstream uses (peptidoform, modifications) reference these single-source variables (look for to_proforma, to_modifications, assigned_mods_raw/assigned_mods, and peptidoform in this block).qpx/converters/ptm.py (1)
155-192: Leading-dot stripping only applies when parens are present.
_normalize_peptidoformreturns early at line 155-156 when"(" not in peptidoform, so a bare".PEPTIDEK"(no mods) would not have its leading"."removed. In practice the dot only appears with mzTab paren mods so this is probably fine, but if future callers pass bare dot-prefixed forms they would leak through unparsed. Worth considering movingremoveprefix(".")above the early return, or documenting the assumption.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@qpx/converters/ptm.py` around lines 155 - 192, The function _normalize_peptidoform currently only strips a leading "." after checking for "(" so inputs like ".PEPTIDEK" bypass normalization; move the call peptidoform = peptidoform.removeprefix(".") to the top of the function (before the if "(" not in peptidoform early return) so leading dots are always removed, keeping the rest of the existing parsing logic (the while loop, bracket conversion, and N-term dash insertion) unchanged.qpx/converters/quantms/feature_adapter.py (1)
982-1006: Stale_proforma_cachetype annotation.The cache now holds
(peptidoform, modifications)tuples (see lines 1006 and 1140) but the annotation still saysdict[tuple[str, str], list | None]. Update todict[tuple[str, str], tuple[str, list | None]](ortuple[str, Optional[list[dict]]]) for accuracy.🛠️ Proposed fix
- _proforma_cache: dict[tuple[str, str], list | None] = {} + _proforma_cache: dict[tuple[str, str], tuple[str, list[dict] | None]] = {}Apply in both
_transform_batch_lfq(line 982) and_transform_batch_isobaric(line 1117).Also applies to: 1117-1140
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@qpx/converters/quantms/feature_adapter.py` around lines 982 - 1006, The _proforma_cache annotation is stale: it currently reads dict[tuple[str, str], list | None] but you store (peptidoform, modifications) tuples returned by _from_proforma; update the type to reflect tuple[str, list | None] (or tuple[str, Optional[list[dict]]]) wherever declared (notably the _proforma_cache in _transform_batch_lfq and the corresponding cache in _transform_batch_isobaric) so the key stays tuple[str,str] and the value is tuple[str, list | None]; keep references to _from_proforma and the variables peptidoform/modifications to locate the usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@qpx/converters/maxquant/feature_adapter.py`:
- Around line 286-289: The current code calls peptidoform = to_proforma(...)
then discards the normalized peptidoform returned by from_proforma by doing "_,
modifications = from_proforma(...)" which leaves the emitted peptidoform
unnormalized and diverges from the PSM adapter; change the assignment to capture
both the normalized peptidoform and modifications (peptidoform, modifications =
from_proforma(peptidoform, sequence)) and if from_proforma returns None/None
fall back to the original stringified to_proforma value so peptidoform stays a
string; update the usage sites that expect a string (e.g., later logic that
assumes peptidoform is a string) accordingly.
---
Outside diff comments:
In `@qpx/converters/maxquant/psm_adapter.py`:
- Around line 165-173: The conditional expression unpacks into two variables
(peptidoform, modifications) but returns None when peptidoform is falsy, causing
a TypeError; change the fallback to a 2-tuple that matches the unpacking (e.g.,
replace "else None" with "else (peptidoform, None)" or "else (None, None)") so
that peptidoform and modifications are always assigned; update the expression in
_transform_batch where from_proforma(...) is called to use the two-element
fallback.
---
Nitpick comments:
In `@qpx/converters/diann/feature_adapter.py`:
- Around line 233-234: The code currently calls to_proforma(modified_seq) and
then to_modifications(modified_seq, sequence); since to_modifications now
returns (peptidoform, modifications) and internally calls to_proforma, replace
the two calls with a single call that unpacks both values: peptidoform,
modifications = to_modifications(modified_seq, sequence); remove the redundant
to_proforma invocation and ensure any later use of peptidoform relies on this
single assignment (look for occurrences of peptidoform, modifications,
to_proforma, and to_modifications in the surrounding scope to update
accordingly).
In `@qpx/converters/fragpipe/feature_adapter.py`:
- Around line 373-387: The code redundantly calls to_proforma(...) and then
to_modifications(...), but to_modifications(assigned_mods_str, sequence) already
returns (peptidoform, modifications); replace the separate to_proforma call with
a single call to to_modifications and unpack both peptidoform and modifications
from it (assign peptidoform, modifications = to_modifications(assigned_mods_str,
sequence)), ensuring you preserve the existing behavior when assigned_mods_str
is falsy (i.e., only call/unpack when assigned_mods_str is present and keep
modifications = None otherwise); update references to peptidoform and
modifications accordingly and remove the old to_proforma(...) invocation.
In `@qpx/converters/fragpipe/psm_adapter.py`:
- Around line 194-247: The code parses "Assigned Modifications" twice: first
calling to_proforma(...) to build peptidoform and later calling
to_modifications(...) which also returns (peptidoform, modifications); update
the logic to call to_modifications(str(assigned_mods), sequence) once, capture
both peptidoform and modifications from its return value, remove the earlier
standalone to_proforma(...) call (the variable peptidoform should be set from
to_modifications), and ensure any downstream uses (peptidoform, modifications)
reference these single-source variables (look for to_proforma, to_modifications,
assigned_mods_raw/assigned_mods, and peptidoform in this block).
In `@qpx/converters/ptm.py`:
- Around line 155-192: The function _normalize_peptidoform currently only strips
a leading "." after checking for "(" so inputs like ".PEPTIDEK" bypass
normalization; move the call peptidoform = peptidoform.removeprefix(".") to the
top of the function (before the if "(" not in peptidoform early return) so
leading dots are always removed, keeping the rest of the existing parsing logic
(the while loop, bracket conversion, and N-term dash insertion) unchanged.
In `@qpx/converters/quantms/feature_adapter.py`:
- Around line 982-1006: The _proforma_cache annotation is stale: it currently
reads dict[tuple[str, str], list | None] but you store (peptidoform,
modifications) tuples returned by _from_proforma; update the type to reflect
tuple[str, list | None] (or tuple[str, Optional[list[dict]]]) wherever declared
(notably the _proforma_cache in _transform_batch_lfq and the corresponding cache
in _transform_batch_isobaric) so the key stays tuple[str,str] and the value is
tuple[str, list | None]; keep references to _from_proforma and the variables
peptidoform/modifications to locate the usage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b050e0cb-9464-42ea-9925-b9603468e36d
⛔ Files ignored due to path filters (2)
docs/include/example.feature.parquetis excluded by!**/*.parquetdocs/include/example.psm.parquetis excluded by!**/*.parquet
📒 Files selected for processing (14)
docs/spec/psm.mdqpx/converters/diann/constants.pyqpx/converters/diann/feature_adapter.pyqpx/converters/fragpipe/constants.pyqpx/converters/fragpipe/feature_adapter.pyqpx/converters/fragpipe/psm_adapter.pyqpx/converters/maxquant/feature_adapter.pyqpx/converters/maxquant/psm_adapter.pyqpx/converters/ptm.pyqpx/converters/quantms/feature_adapter.pyqpx/converters/quantms/psm_adapter.pyqpx/core/data/schemas/psm.yamltests/converters/test_converters.pytests/converters/test_ptm.py
| peptidoform = to_proforma( | ||
| str(row.get(r.get("modified_sequence", "Modified sequence"), "")), | ||
| ) | ||
| modifications = from_proforma(peptidoform, sequence) if peptidoform else None | ||
| _, modifications = from_proforma(peptidoform, sequence) if peptidoform else (None, None) |
There was a problem hiding this comment.
Inconsistent peptidoform handling vs. maxquant/psm_adapter.py.
Here the normalized peptidoform returned by from_proforma is discarded (_, modifications = ...), so the emitted peptidoform field keeps the raw to_proforma(...) output. In the sibling PSM adapter (and in quantms/psm_adapter.py) the pattern peptidoform, modifications = from_proforma(...) is used, meaning features and PSMs for the same peptide may now carry slightly different peptidoform strings. Consider aligning the feature adapter:
- _, modifications = from_proforma(peptidoform, sequence) if peptidoform else (None, None)
+ peptidoform, modifications = (
+ from_proforma(peptidoform, sequence) if peptidoform else (peptidoform, None)
+ )Note the fallback preserves the empty/original peptidoform instead of nulling it, since downstream fields (Line 382) assume it is a string.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@qpx/converters/maxquant/feature_adapter.py` around lines 286 - 289, The
current code calls peptidoform = to_proforma(...) then discards the normalized
peptidoform returned by from_proforma by doing "_, modifications =
from_proforma(...)" which leaves the emitted peptidoform unnormalized and
diverges from the PSM adapter; change the assignment to capture both the
normalized peptidoform and modifications (peptidoform, modifications =
from_proforma(peptidoform, sequence)) and if from_proforma returns None/None
fall back to the original stringified to_proforma value so peptidoform stays a
string; update the usage sites that expect a string (e.g., later logic that
assumes peptidoform is a string) accordingly.
Summary by CodeRabbit
Documentation
Bug Fixes