Skip to content

Minor fix related with MSnet developments #200

Merged
ypriverol merged 19 commits intomainfrom
dev
Apr 18, 2026
Merged

Minor fix related with MSnet developments #200
ypriverol merged 19 commits intomainfrom
dev

Conversation

@ypriverol
Copy link
Copy Markdown
Member

@ypriverol ypriverol commented Apr 18, 2026

Summary by CodeRabbit

  • Documentation

    • Updated PSM schema documentation with improved formatting and examples for peptidoform notation and ion type annotations.
  • Bug Fixes

    • Enhanced ProForma parsing with better normalization of N-terminal modifications and improved consistency in modification handling across data converters.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

📝 Walkthrough

Walkthrough

This PR updates the ProForma parsing API (from_proforma()) to return a tuple containing both the normalized peptidoform and modifications list instead of just the modifications. Corresponding updates cascade through all converter modules (DIANN, FragPipe, MaxQuant, QuantMS), peptidoform normalization logic, and test suites to handle the new return signature.

Changes

Cohort / File(s) Summary
Documentation
docs/spec/psm.md, qpx/core/data/schemas/psm.yaml
Updated table formatting and examples for optional PSM fields; changed peptidoform examples to ProForma-like bracketed format (e.g., [Acetyl]-...) and ion_type_array examples to include charge numbers (e.g., b1, y2, a2).
Core PTM Parsing
qpx/converters/ptm.py
Updated from_proforma() to return tuple[str, list[dict] | None] instead of list[dict] | None; added normalization logic for N-term bracket notation (inserting - separator), leading dot removal, and explicit position assignment (N-term mods get position=0, others use last_aa tracking).
DIANN Converter
qpx/converters/diann/constants.py, qpx/converters/diann/feature_adapter.py
Updated to_modifications() return type to tuple[str, list[dict] | None] and adjusted caller in feature adapter to unpack result as _, modifications = ....
FragPipe Converter
qpx/converters/fragpipe/constants.py, qpx/converters/fragpipe/feature_adapter.py, qpx/converters/fragpipe/psm_adapter.py
Updated to_modifications() return type signature and docstring; adjusted both feature and PSM adapters to destructure result as _, modifications.
MaxQuant Converter
qpx/converters/maxquant/feature_adapter.py, qpx/converters/maxquant/psm_adapter.py
Updated callers of from_proforma() to destructure as peptidoform, modifications and (_, modifications) respectively, with fallback handling for empty peptidoform cases.
QuantMS Converter
qpx/converters/quantms/feature_adapter.py, qpx/converters/quantms/psm_adapter.py
Extended ProForma lookup to store raw and normalized peptidoforms; updated LFQ SQL query to build sequence from normalized peptideform via regex; modified Python cache structure and from_proforma() calls to unpack both returned values.
Tests
tests/converters/test_converters.py, tests/converters/test_ptm.py
Updated ProForma parsing tests to unpack return values as _, result or peptidoform, result; added assertion validating normalized peptidoform output; adjusted expected normalization examples to include N-term - separator.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • timosachsenberg
  • jpfeuffer
  • Shen-YuFei

Poem

🐰 A peptide's form now holds two truths,
The normalized sequence and modifications' roots,
Through converters they flow with unpacking delight,
Each adapter adjusts to the new API's might,
Tuples unwrapped, the schema shines bright! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title 'Minor fix related with MSnet developments' is vague and does not accurately describe the substantial changes made throughout the codebase. Replace with a descriptive title that captures the main change, such as: 'Update ProForma parsing API to return normalized peptidoform alongside modifications' or 'Refactor to_modifications and from_proforma to return peptidoform-modifications tuples'.
Docstring Coverage ⚠️ Warning Docstring coverage is 57.58% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 complexity · 2 duplication

Metric Results
Complexity 0
Duplication 2

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
qpx/converters/maxquant/psm_adapter.py (1)

165-173: ⚠️ Potential issue | 🔴 Critical

Critical: else None branch will raise TypeError on tuple unpacking.

When peptidoform is falsy, the conditional expression evaluates to None, and peptidoform, modifications = None raises TypeError: cannot unpack non-iterable NoneType object. The outer _transform_batch try/except then silently skips the row, which means any PSM where to_proforma(...) returned an empty string is dropped without a meaningful reason. The fallback tuple must match the unpacking arity.

🛠️ Proposed fix
-        peptidoform, modifications = (
-            from_proforma(
-                peptidoform,
-                sequence,
-                site_scores=site_scores,
-            )
-            if peptidoform
-            else None
-        )
+        if peptidoform:
+            peptidoform, modifications = from_proforma(
+                peptidoform,
+                sequence,
+                site_scores=site_scores,
+            )
+        else:
+            modifications = None
+            peptidoform = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@qpx/converters/maxquant/psm_adapter.py` around lines 165 - 173, The
conditional expression unpacks into two variables (peptidoform, modifications)
but returns None when peptidoform is falsy, causing a TypeError; change the
fallback to a 2-tuple that matches the unpacking (e.g., replace "else None" with
"else (peptidoform, None)" or "else (None, None)") so that peptidoform and
modifications are always assigned; update the expression in _transform_batch
where from_proforma(...) is called to use the two-element fallback.
🧹 Nitpick comments (5)
qpx/converters/fragpipe/feature_adapter.py (1)

373-387: Optional: consolidate to_proforma + to_modifications into one call.

to_modifications(assigned_mods_str, sequence) now returns (peptidoform, modifications) and internally performs the same to_proforma computation already done on Line 373. You can avoid the redundant parse by unpacking both values from a single call:

♻️ Proposed refactor
-        peptidoform = to_proforma(assigned_mods_str, sequence)
+        # peptidoform + modifications come from a single parse below
@@
-        # Modifications (reuse assigned_mods_str already extracted for peptidoform)
-        modifications = None
-        if assigned_mods_str:
-            _, modifications = to_modifications(assigned_mods_str, sequence)
+        # Peptidoform (ProForma) + structured modifications
+        if assigned_mods_str:
+            peptidoform, modifications = to_modifications(assigned_mods_str, sequence)
+        else:
+            peptidoform = to_proforma(assigned_mods_str, sequence)
+            modifications = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@qpx/converters/fragpipe/feature_adapter.py` around lines 373 - 387, The code
redundantly calls to_proforma(...) and then to_modifications(...), but
to_modifications(assigned_mods_str, sequence) already returns (peptidoform,
modifications); replace the separate to_proforma call with a single call to
to_modifications and unpack both peptidoform and modifications from it (assign
peptidoform, modifications = to_modifications(assigned_mods_str, sequence)),
ensuring you preserve the existing behavior when assigned_mods_str is falsy
(i.e., only call/unpack when assigned_mods_str is present and keep modifications
= None otherwise); update references to peptidoform and modifications
accordingly and remove the old to_proforma(...) invocation.
qpx/converters/diann/feature_adapter.py (1)

233-234: Optional: consolidate into a single to_modifications call.

to_modifications(modified_seq, sequence) now returns (peptidoform, modifications) and calls to_proforma internally, making the separate to_proforma(modified_seq) on Line 233 redundant for each unique precursor.

-            peptidoform = to_proforma(modified_seq)
-            _, modifications = to_modifications(modified_seq, sequence)
+            peptidoform, modifications = to_modifications(modified_seq, sequence)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@qpx/converters/diann/feature_adapter.py` around lines 233 - 234, The code
currently calls to_proforma(modified_seq) and then
to_modifications(modified_seq, sequence); since to_modifications now returns
(peptidoform, modifications) and internally calls to_proforma, replace the two
calls with a single call that unpacks both values: peptidoform, modifications =
to_modifications(modified_seq, sequence); remove the redundant to_proforma
invocation and ensure any later use of peptidoform relies on this single
assignment (look for occurrences of peptidoform, modifications, to_proforma, and
to_modifications in the surrounding scope to update accordingly).
qpx/converters/fragpipe/psm_adapter.py (1)

194-247: Optional: avoid parsing Assigned Modifications twice.

Lines 194–196 build peptidoform via to_proforma, and Lines 245–247 re-parse the same Assigned Modifications via to_modifications (which internally calls to_proforma again). Since to_modifications now returns (peptidoform, modifications), a single call can populate both and drop the discarded first element:

♻️ Proposed refactor
-        # Peptidoform -- build ProForma from sequence + Assigned Modifications
-        assigned_mods_raw = row.get("Assigned Modifications")
-        assigned_mods_str = str(assigned_mods_raw) if pd.notna(assigned_mods_raw) and assigned_mods_raw else ""
-        peptidoform = to_proforma(assigned_mods_str, sequence)
+        # Peptidoform + modifications -- one parse of Assigned Modifications
+        assigned_mods_raw = row.get("Assigned Modifications")
+        assigned_mods_str = str(assigned_mods_raw) if pd.notna(assigned_mods_raw) and assigned_mods_raw else ""
+        if assigned_mods_str:
+            peptidoform, modifications = to_modifications(assigned_mods_str, sequence)
+        else:
+            peptidoform = to_proforma(assigned_mods_str, sequence)
+            modifications = None
@@
-        # Modifications -- parse Assigned Modifications if present
-        modifications = None
-        assigned_mods = row.get("Assigned Modifications")
-        if pd.notna(assigned_mods) and assigned_mods:
-            _, modifications = to_modifications(str(assigned_mods), sequence)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@qpx/converters/fragpipe/psm_adapter.py` around lines 194 - 247, The code
parses "Assigned Modifications" twice: first calling to_proforma(...) to build
peptidoform and later calling to_modifications(...) which also returns
(peptidoform, modifications); update the logic to call
to_modifications(str(assigned_mods), sequence) once, capture both peptidoform
and modifications from its return value, remove the earlier standalone
to_proforma(...) call (the variable peptidoform should be set from
to_modifications), and ensure any downstream uses (peptidoform, modifications)
reference these single-source variables (look for to_proforma, to_modifications,
assigned_mods_raw/assigned_mods, and peptidoform in this block).
qpx/converters/ptm.py (1)

155-192: Leading-dot stripping only applies when parens are present.

_normalize_peptidoform returns early at line 155-156 when "(" not in peptidoform, so a bare ".PEPTIDEK" (no mods) would not have its leading "." removed. In practice the dot only appears with mzTab paren mods so this is probably fine, but if future callers pass bare dot-prefixed forms they would leak through unparsed. Worth considering moving removeprefix(".") above the early return, or documenting the assumption.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@qpx/converters/ptm.py` around lines 155 - 192, The function
_normalize_peptidoform currently only strips a leading "." after checking for
"(" so inputs like ".PEPTIDEK" bypass normalization; move the call peptidoform =
peptidoform.removeprefix(".") to the top of the function (before the if "(" not
in peptidoform early return) so leading dots are always removed, keeping the
rest of the existing parsing logic (the while loop, bracket conversion, and
N-term dash insertion) unchanged.
qpx/converters/quantms/feature_adapter.py (1)

982-1006: Stale _proforma_cache type annotation.

The cache now holds (peptidoform, modifications) tuples (see lines 1006 and 1140) but the annotation still says dict[tuple[str, str], list | None]. Update to dict[tuple[str, str], tuple[str, list | None]] (or tuple[str, Optional[list[dict]]]) for accuracy.

🛠️ Proposed fix
-        _proforma_cache: dict[tuple[str, str], list | None] = {}
+        _proforma_cache: dict[tuple[str, str], tuple[str, list[dict] | None]] = {}

Apply in both _transform_batch_lfq (line 982) and _transform_batch_isobaric (line 1117).

Also applies to: 1117-1140

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@qpx/converters/quantms/feature_adapter.py` around lines 982 - 1006, The
_proforma_cache annotation is stale: it currently reads dict[tuple[str, str],
list | None] but you store (peptidoform, modifications) tuples returned by
_from_proforma; update the type to reflect tuple[str, list | None] (or
tuple[str, Optional[list[dict]]]) wherever declared (notably the _proforma_cache
in _transform_batch_lfq and the corresponding cache in
_transform_batch_isobaric) so the key stays tuple[str,str] and the value is
tuple[str, list | None]; keep references to _from_proforma and the variables
peptidoform/modifications to locate the usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@qpx/converters/maxquant/feature_adapter.py`:
- Around line 286-289: The current code calls peptidoform = to_proforma(...)
then discards the normalized peptidoform returned by from_proforma by doing "_,
modifications = from_proforma(...)" which leaves the emitted peptidoform
unnormalized and diverges from the PSM adapter; change the assignment to capture
both the normalized peptidoform and modifications (peptidoform, modifications =
from_proforma(peptidoform, sequence)) and if from_proforma returns None/None
fall back to the original stringified to_proforma value so peptidoform stays a
string; update the usage sites that expect a string (e.g., later logic that
assumes peptidoform is a string) accordingly.

---

Outside diff comments:
In `@qpx/converters/maxquant/psm_adapter.py`:
- Around line 165-173: The conditional expression unpacks into two variables
(peptidoform, modifications) but returns None when peptidoform is falsy, causing
a TypeError; change the fallback to a 2-tuple that matches the unpacking (e.g.,
replace "else None" with "else (peptidoform, None)" or "else (None, None)") so
that peptidoform and modifications are always assigned; update the expression in
_transform_batch where from_proforma(...) is called to use the two-element
fallback.

---

Nitpick comments:
In `@qpx/converters/diann/feature_adapter.py`:
- Around line 233-234: The code currently calls to_proforma(modified_seq) and
then to_modifications(modified_seq, sequence); since to_modifications now
returns (peptidoform, modifications) and internally calls to_proforma, replace
the two calls with a single call that unpacks both values: peptidoform,
modifications = to_modifications(modified_seq, sequence); remove the redundant
to_proforma invocation and ensure any later use of peptidoform relies on this
single assignment (look for occurrences of peptidoform, modifications,
to_proforma, and to_modifications in the surrounding scope to update
accordingly).

In `@qpx/converters/fragpipe/feature_adapter.py`:
- Around line 373-387: The code redundantly calls to_proforma(...) and then
to_modifications(...), but to_modifications(assigned_mods_str, sequence) already
returns (peptidoform, modifications); replace the separate to_proforma call with
a single call to to_modifications and unpack both peptidoform and modifications
from it (assign peptidoform, modifications = to_modifications(assigned_mods_str,
sequence)), ensuring you preserve the existing behavior when assigned_mods_str
is falsy (i.e., only call/unpack when assigned_mods_str is present and keep
modifications = None otherwise); update references to peptidoform and
modifications accordingly and remove the old to_proforma(...) invocation.

In `@qpx/converters/fragpipe/psm_adapter.py`:
- Around line 194-247: The code parses "Assigned Modifications" twice: first
calling to_proforma(...) to build peptidoform and later calling
to_modifications(...) which also returns (peptidoform, modifications); update
the logic to call to_modifications(str(assigned_mods), sequence) once, capture
both peptidoform and modifications from its return value, remove the earlier
standalone to_proforma(...) call (the variable peptidoform should be set from
to_modifications), and ensure any downstream uses (peptidoform, modifications)
reference these single-source variables (look for to_proforma, to_modifications,
assigned_mods_raw/assigned_mods, and peptidoform in this block).

In `@qpx/converters/ptm.py`:
- Around line 155-192: The function _normalize_peptidoform currently only strips
a leading "." after checking for "(" so inputs like ".PEPTIDEK" bypass
normalization; move the call peptidoform = peptidoform.removeprefix(".") to the
top of the function (before the if "(" not in peptidoform early return) so
leading dots are always removed, keeping the rest of the existing parsing logic
(the while loop, bracket conversion, and N-term dash insertion) unchanged.

In `@qpx/converters/quantms/feature_adapter.py`:
- Around line 982-1006: The _proforma_cache annotation is stale: it currently
reads dict[tuple[str, str], list | None] but you store (peptidoform,
modifications) tuples returned by _from_proforma; update the type to reflect
tuple[str, list | None] (or tuple[str, Optional[list[dict]]]) wherever declared
(notably the _proforma_cache in _transform_batch_lfq and the corresponding cache
in _transform_batch_isobaric) so the key stays tuple[str,str] and the value is
tuple[str, list | None]; keep references to _from_proforma and the variables
peptidoform/modifications to locate the usage.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b050e0cb-9464-42ea-9925-b9603468e36d

📥 Commits

Reviewing files that changed from the base of the PR and between 6429ad8 and e0f7f41.

⛔ Files ignored due to path filters (2)
  • docs/include/example.feature.parquet is excluded by !**/*.parquet
  • docs/include/example.psm.parquet is excluded by !**/*.parquet
📒 Files selected for processing (14)
  • docs/spec/psm.md
  • qpx/converters/diann/constants.py
  • qpx/converters/diann/feature_adapter.py
  • qpx/converters/fragpipe/constants.py
  • qpx/converters/fragpipe/feature_adapter.py
  • qpx/converters/fragpipe/psm_adapter.py
  • qpx/converters/maxquant/feature_adapter.py
  • qpx/converters/maxquant/psm_adapter.py
  • qpx/converters/ptm.py
  • qpx/converters/quantms/feature_adapter.py
  • qpx/converters/quantms/psm_adapter.py
  • qpx/core/data/schemas/psm.yaml
  • tests/converters/test_converters.py
  • tests/converters/test_ptm.py

Comment on lines 286 to +289
peptidoform = to_proforma(
str(row.get(r.get("modified_sequence", "Modified sequence"), "")),
)
modifications = from_proforma(peptidoform, sequence) if peptidoform else None
_, modifications = from_proforma(peptidoform, sequence) if peptidoform else (None, None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent peptidoform handling vs. maxquant/psm_adapter.py.

Here the normalized peptidoform returned by from_proforma is discarded (_, modifications = ...), so the emitted peptidoform field keeps the raw to_proforma(...) output. In the sibling PSM adapter (and in quantms/psm_adapter.py) the pattern peptidoform, modifications = from_proforma(...) is used, meaning features and PSMs for the same peptide may now carry slightly different peptidoform strings. Consider aligning the feature adapter:

-        _, modifications = from_proforma(peptidoform, sequence) if peptidoform else (None, None)
+        peptidoform, modifications = (
+            from_proforma(peptidoform, sequence) if peptidoform else (peptidoform, None)
+        )

Note the fallback preserves the empty/original peptidoform instead of nulling it, since downstream fields (Line 382) assume it is a string.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@qpx/converters/maxquant/feature_adapter.py` around lines 286 - 289, The
current code calls peptidoform = to_proforma(...) then discards the normalized
peptidoform returned by from_proforma by doing "_, modifications =
from_proforma(...)" which leaves the emitted peptidoform unnormalized and
diverges from the PSM adapter; change the assignment to capture both the
normalized peptidoform and modifications (peptidoform, modifications =
from_proforma(peptidoform, sequence)) and if from_proforma returns None/None
fall back to the original stringified to_proforma value so peptidoform stays a
string; update the usage sites that expect a string (e.g., later logic that
assumes peptidoform is a string) accordingly.

@ypriverol ypriverol merged commit 3f9982d into main Apr 18, 2026
15 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants