Minor fix related with MSnet developments by ypriverol · Pull Request #200 · bigbio/qpx

ypriverol · 2026-04-18T15:56:06Z

Summary by CodeRabbit

Documentation
- Updated PSM schema documentation with improved formatting and examples for peptidoform notation and ion type annotations.
Bug Fixes
- Enhanced ProForma parsing with better normalization of N-terminal modifications and improved consistency in modification handling across data converters.

u

Dev

…ength, and sync schema

fixed example and doc

coderabbitai · 2026-04-18T15:56:33Z

📝 Walkthrough

Walkthrough

This PR updates the ProForma parsing API (from_proforma()) to return a tuple containing both the normalized peptidoform and modifications list instead of just the modifications. Corresponding updates cascade through all converter modules (DIANN, FragPipe, MaxQuant, QuantMS), peptidoform normalization logic, and test suites to handle the new return signature.

Changes

Cohort / File(s)	Summary
Documentation `docs/spec/psm.md`, `qpx/core/data/schemas/psm.yaml`	Updated table formatting and examples for optional PSM fields; changed `peptidoform` examples to ProForma-like bracketed format (e.g., `[Acetyl]-...`) and `ion_type_array` examples to include charge numbers (e.g., `b1`, `y2`, `a2`).
Core PTM Parsing `qpx/converters/ptm.py`	Updated `from_proforma()` to return `tuple[str, list[dict] \| None]` instead of `list[dict] \| None`; added normalization logic for N-term bracket notation (inserting `-` separator), leading dot removal, and explicit position assignment (N-term mods get `position=0`, others use `last_aa` tracking).
DIANN Converter `qpx/converters/diann/constants.py`, `qpx/converters/diann/feature_adapter.py`	Updated `to_modifications()` return type to `tuple[str, list[dict] \| None]` and adjusted caller in feature adapter to unpack result as `_, modifications = ...`.
FragPipe Converter `qpx/converters/fragpipe/constants.py`, `qpx/converters/fragpipe/feature_adapter.py`, `qpx/converters/fragpipe/psm_adapter.py`	Updated `to_modifications()` return type signature and docstring; adjusted both feature and PSM adapters to destructure result as `_, modifications`.
MaxQuant Converter `qpx/converters/maxquant/feature_adapter.py`, `qpx/converters/maxquant/psm_adapter.py`	Updated callers of `from_proforma()` to destructure as `peptidoform, modifications` and `(_, modifications)` respectively, with fallback handling for empty peptidoform cases.
QuantMS Converter `qpx/converters/quantms/feature_adapter.py`, `qpx/converters/quantms/psm_adapter.py`	Extended ProForma lookup to store raw and normalized peptidoforms; updated LFQ SQL query to build sequence from normalized peptideform via regex; modified Python cache structure and `from_proforma()` calls to unpack both returned values.
Tests `tests/converters/test_converters.py`, `tests/converters/test_ptm.py`	Updated ProForma parsing tests to unpack return values as `_, result` or `peptidoform, result`; added assertion validating normalized peptidoform output; adjusted expected normalization examples to include N-term `-` separator.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

First refactoring of the documentation. #161: Updates to docs/spec/psm.md for PSM specification documentation that this PR further modifies with formatting and example changes.

Suggested reviewers

timosachsenberg
jpfeuffer
Shen-YuFei

Poem

🐰 A peptide's form now holds two truths,
The normalized sequence and modifications' roots,
Through converters they flow with unpacking delight,
Each adapter adjusts to the new API's might,
Tuples unwrapped, the schema shines bright! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title 'Minor fix related with MSnet developments' is vague and does not accurately describe the substantial changes made throughout the codebase.	Replace with a descriptive title that captures the main change, such as: 'Update ProForma parsing API to return normalized peptidoform alongside modifications' or 'Refactor to_modifications and from_proforma to return peptidoform-modifications tuples'.
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.58% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codacy-production · 2026-04-18T15:57:28Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 complexity · 2 duplication

Metric Results

Complexity 0

Duplication 2

View in Codacy

_{TIP This summary will be updated as you push new changes. Give us feedback}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

qpx/converters/maxquant/psm_adapter.py (1)

165-173: ⚠️ Potential issue | 🔴 Critical

Critical: else None branch will raise TypeError on tuple unpacking.

When peptidoform is falsy, the conditional expression evaluates to None, and peptidoform, modifications = None raises TypeError: cannot unpack non-iterable NoneType object. The outer _transform_batch try/except then silently skips the row, which means any PSM where to_proforma(...) returned an empty string is dropped without a meaningful reason. The fallback tuple must match the unpacking arity.

🛠️ Proposed fix

-        peptidoform, modifications = (
-            from_proforma(
-                peptidoform,
-                sequence,
-                site_scores=site_scores,
-            )
-            if peptidoform
-            else None
-        )
+        if peptidoform:
+            peptidoform, modifications = from_proforma(
+                peptidoform,
+                sequence,
+                site_scores=site_scores,
+            )
+        else:
+            modifications = None
+            peptidoform = None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@qpx/converters/maxquant/psm_adapter.py` around lines 165 - 173, The
conditional expression unpacks into two variables (peptidoform, modifications)
but returns None when peptidoform is falsy, causing a TypeError; change the
fallback to a 2-tuple that matches the unpacking (e.g., replace "else None" with
"else (peptidoform, None)" or "else (None, None)") so that peptidoform and
modifications are always assigned; update the expression in _transform_batch
where from_proforma(...) is called to use the two-element fallback.

🧹 Nitpick comments (5)

qpx/converters/fragpipe/feature_adapter.py (1)

373-387: Optional: consolidate to_proforma + to_modifications into one call.

to_modifications(assigned_mods_str, sequence) now returns (peptidoform, modifications) and internally performs the same to_proforma computation already done on Line 373. You can avoid the redundant parse by unpacking both values from a single call:

♻️ Proposed refactor

-        peptidoform = to_proforma(assigned_mods_str, sequence)
+        # peptidoform + modifications come from a single parse below
@@
-        # Modifications (reuse assigned_mods_str already extracted for peptidoform)
-        modifications = None
-        if assigned_mods_str:
-            _, modifications = to_modifications(assigned_mods_str, sequence)
+        # Peptidoform (ProForma) + structured modifications
+        if assigned_mods_str:
+            peptidoform, modifications = to_modifications(assigned_mods_str, sequence)
+        else:
+            peptidoform = to_proforma(assigned_mods_str, sequence)
+            modifications = None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@qpx/converters/fragpipe/feature_adapter.py` around lines 373 - 387, The code
redundantly calls to_proforma(...) and then to_modifications(...), but
to_modifications(assigned_mods_str, sequence) already returns (peptidoform,
modifications); replace the separate to_proforma call with a single call to
to_modifications and unpack both peptidoform and modifications from it (assign
peptidoform, modifications = to_modifications(assigned_mods_str, sequence)),
ensuring you preserve the existing behavior when assigned_mods_str is falsy
(i.e., only call/unpack when assigned_mods_str is present and keep modifications
= None otherwise); update references to peptidoform and modifications
accordingly and remove the old to_proforma(...) invocation.

qpx/converters/diann/feature_adapter.py (1)

233-234: Optional: consolidate into a single to_modifications call.

to_modifications(modified_seq, sequence) now returns (peptidoform, modifications) and calls to_proforma internally, making the separate to_proforma(modified_seq) on Line 233 redundant for each unique precursor.

-            peptidoform = to_proforma(modified_seq)
-            _, modifications = to_modifications(modified_seq, sequence)
+            peptidoform, modifications = to_modifications(modified_seq, sequence)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@qpx/converters/diann/feature_adapter.py` around lines 233 - 234, The code
currently calls to_proforma(modified_seq) and then
to_modifications(modified_seq, sequence); since to_modifications now returns
(peptidoform, modifications) and internally calls to_proforma, replace the two
calls with a single call that unpacks both values: peptidoform, modifications =
to_modifications(modified_seq, sequence); remove the redundant to_proforma
invocation and ensure any later use of peptidoform relies on this single
assignment (look for occurrences of peptidoform, modifications, to_proforma, and
to_modifications in the surrounding scope to update accordingly).

qpx/converters/fragpipe/psm_adapter.py (1)

194-247: Optional: avoid parsing Assigned Modifications twice.

Lines 194–196 build peptidoform via to_proforma, and Lines 245–247 re-parse the same Assigned Modifications via to_modifications (which internally calls to_proforma again). Since to_modifications now returns (peptidoform, modifications), a single call can populate both and drop the discarded first element:

♻️ Proposed refactor

-        # Peptidoform -- build ProForma from sequence + Assigned Modifications
-        assigned_mods_raw = row.get("Assigned Modifications")
-        assigned_mods_str = str(assigned_mods_raw) if pd.notna(assigned_mods_raw) and assigned_mods_raw else ""
-        peptidoform = to_proforma(assigned_mods_str, sequence)
+        # Peptidoform + modifications -- one parse of Assigned Modifications
+        assigned_mods_raw = row.get("Assigned Modifications")
+        assigned_mods_str = str(assigned_mods_raw) if pd.notna(assigned_mods_raw) and assigned_mods_raw else ""
+        if assigned_mods_str:
+            peptidoform, modifications = to_modifications(assigned_mods_str, sequence)
+        else:
+            peptidoform = to_proforma(assigned_mods_str, sequence)
+            modifications = None
@@
-        # Modifications -- parse Assigned Modifications if present
-        modifications = None
-        assigned_mods = row.get("Assigned Modifications")
-        if pd.notna(assigned_mods) and assigned_mods:
-            _, modifications = to_modifications(str(assigned_mods), sequence)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@qpx/converters/fragpipe/psm_adapter.py` around lines 194 - 247, The code
parses "Assigned Modifications" twice: first calling to_proforma(...) to build
peptidoform and later calling to_modifications(...) which also returns
(peptidoform, modifications); update the logic to call
to_modifications(str(assigned_mods), sequence) once, capture both peptidoform
and modifications from its return value, remove the earlier standalone
to_proforma(...) call (the variable peptidoform should be set from
to_modifications), and ensure any downstream uses (peptidoform, modifications)
reference these single-source variables (look for to_proforma, to_modifications,
assigned_mods_raw/assigned_mods, and peptidoform in this block).

qpx/converters/ptm.py (1)

155-192: Leading-dot stripping only applies when parens are present.

_normalize_peptidoform returns early at line 155-156 when "(" not in peptidoform, so a bare ".PEPTIDEK" (no mods) would not have its leading "." removed. In practice the dot only appears with mzTab paren mods so this is probably fine, but if future callers pass bare dot-prefixed forms they would leak through unparsed. Worth considering moving removeprefix(".") above the early return, or documenting the assumption.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@qpx/converters/ptm.py` around lines 155 - 192, The function
_normalize_peptidoform currently only strips a leading "." after checking for
"(" so inputs like ".PEPTIDEK" bypass normalization; move the call peptidoform =
peptidoform.removeprefix(".") to the top of the function (before the if "(" not
in peptidoform early return) so leading dots are always removed, keeping the
rest of the existing parsing logic (the while loop, bracket conversion, and
N-term dash insertion) unchanged.

qpx/converters/quantms/feature_adapter.py (1)

982-1006: Stale _proforma_cache type annotation.

The cache now holds (peptidoform, modifications) tuples (see lines 1006 and 1140) but the annotation still says dict[tuple[str, str], list | None]. Update to dict[tuple[str, str], tuple[str, list | None]] (or tuple[str, Optional[list[dict]]]) for accuracy.
🛠️ Proposed fix
-        _proforma_cache: dict[tuple[str, str], list | None] = {}
+        _proforma_cache: dict[tuple[str, str], tuple[str, list[dict] | None]] = {}
Apply in both _transform_batch_lfq (line 982) and _transform_batch_isobaric (line 1117).
Also applies to: 1117-1140
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@qpx/converters/quantms/feature_adapter.py` around lines 982 - 1006, The
_proforma_cache annotation is stale: it currently reads dict[tuple[str, str],
list | None] but you store (peptidoform, modifications) tuples returned by
_from_proforma; update the type to reflect tuple[str, list | None] (or
tuple[str, Optional[list[dict]]]) wherever declared (notably the _proforma_cache
in _transform_batch_lfq and the corresponding cache in
_transform_batch_isobaric) so the key stays tuple[str,str] and the value is
tuple[str, list | None]; keep references to _from_proforma and the variables
peptidoform/modifications to locate the usage.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@qpx/converters/maxquant/feature_adapter.py`:
- Around line 286-289: The current code calls peptidoform = to_proforma(...)
then discards the normalized peptidoform returned by from_proforma by doing "_,
modifications = from_proforma(...)" which leaves the emitted peptidoform
unnormalized and diverges from the PSM adapter; change the assignment to capture
both the normalized peptidoform and modifications (peptidoform, modifications =
from_proforma(peptidoform, sequence)) and if from_proforma returns None/None
fall back to the original stringified to_proforma value so peptidoform stays a
string; update the usage sites that expect a string (e.g., later logic that
assumes peptidoform is a string) accordingly.

---

Outside diff comments:
In `@qpx/converters/maxquant/psm_adapter.py`:
- Around line 165-173: The conditional expression unpacks into two variables
(peptidoform, modifications) but returns None when peptidoform is falsy, causing
a TypeError; change the fallback to a 2-tuple that matches the unpacking (e.g.,
replace "else None" with "else (peptidoform, None)" or "else (None, None)") so
that peptidoform and modifications are always assigned; update the expression in
_transform_batch where from_proforma(...) is called to use the two-element
fallback.

---

Nitpick comments:
In `@qpx/converters/diann/feature_adapter.py`:
- Around line 233-234: The code currently calls to_proforma(modified_seq) and
then to_modifications(modified_seq, sequence); since to_modifications now
returns (peptidoform, modifications) and internally calls to_proforma, replace
the two calls with a single call that unpacks both values: peptidoform,
modifications = to_modifications(modified_seq, sequence); remove the redundant
to_proforma invocation and ensure any later use of peptidoform relies on this
single assignment (look for occurrences of peptidoform, modifications,
to_proforma, and to_modifications in the surrounding scope to update
accordingly).

In `@qpx/converters/fragpipe/feature_adapter.py`:
- Around line 373-387: The code redundantly calls to_proforma(...) and then
to_modifications(...), but to_modifications(assigned_mods_str, sequence) already
returns (peptidoform, modifications); replace the separate to_proforma call with
a single call to to_modifications and unpack both peptidoform and modifications
from it (assign peptidoform, modifications = to_modifications(assigned_mods_str,
sequence)), ensuring you preserve the existing behavior when assigned_mods_str
is falsy (i.e., only call/unpack when assigned_mods_str is present and keep
modifications = None otherwise); update references to peptidoform and
modifications accordingly and remove the old to_proforma(...) invocation.

In `@qpx/converters/fragpipe/psm_adapter.py`:
- Around line 194-247: The code parses "Assigned Modifications" twice: first
calling to_proforma(...) to build peptidoform and later calling
to_modifications(...) which also returns (peptidoform, modifications); update
the logic to call to_modifications(str(assigned_mods), sequence) once, capture
both peptidoform and modifications from its return value, remove the earlier
standalone to_proforma(...) call (the variable peptidoform should be set from
to_modifications), and ensure any downstream uses (peptidoform, modifications)
reference these single-source variables (look for to_proforma, to_modifications,
assigned_mods_raw/assigned_mods, and peptidoform in this block).

In `@qpx/converters/ptm.py`:
- Around line 155-192: The function _normalize_peptidoform currently only strips
a leading "." after checking for "(" so inputs like ".PEPTIDEK" bypass
normalization; move the call peptidoform = peptidoform.removeprefix(".") to the
top of the function (before the if "(" not in peptidoform early return) so
leading dots are always removed, keeping the rest of the existing parsing logic
(the while loop, bracket conversion, and N-term dash insertion) unchanged.

In `@qpx/converters/quantms/feature_adapter.py`:
- Around line 982-1006: The _proforma_cache annotation is stale: it currently
reads dict[tuple[str, str], list | None] but you store (peptidoform,
modifications) tuples returned by _from_proforma; update the type to reflect
tuple[str, list | None] (or tuple[str, Optional[list[dict]]]) wherever declared
(notably the _proforma_cache in _transform_batch_lfq and the corresponding cache
in _transform_batch_isobaric) so the key stays tuple[str,str] and the value is
tuple[str, list | None]; keep references to _from_proforma and the variables
peptidoform/modifications to locate the usage.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b050e0cb-9464-42ea-9925-b9603468e36d

📥 Commits

Reviewing files that changed from the base of the PR and between 6429ad8 and e0f7f41.

⛔ Files ignored due to path filters (2)

docs/include/example.feature.parquet is excluded by !**/*.parquet
docs/include/example.psm.parquet is excluded by !**/*.parquet

📒 Files selected for processing (14)

docs/spec/psm.md
qpx/converters/diann/constants.py
qpx/converters/diann/feature_adapter.py
qpx/converters/fragpipe/constants.py
qpx/converters/fragpipe/feature_adapter.py
qpx/converters/fragpipe/psm_adapter.py
qpx/converters/maxquant/feature_adapter.py
qpx/converters/maxquant/psm_adapter.py
qpx/converters/ptm.py
qpx/converters/quantms/feature_adapter.py
qpx/converters/quantms/psm_adapter.py
qpx/core/data/schemas/psm.yaml
tests/converters/test_converters.py
tests/converters/test_ptm.py

coderabbitai · 2026-04-18T16:01:12Z

        peptidoform = to_proforma(
            str(row.get(r.get("modified_sequence", "Modified sequence"), "")),
        )
-        modifications = from_proforma(peptidoform, sequence) if peptidoform else None
+        _, modifications = from_proforma(peptidoform, sequence) if peptidoform else (None, None)


⚠️ Potential issue | 🟡 Minor

Inconsistent peptidoform handling vs. maxquant/psm_adapter.py.

Here the normalized peptidoform returned by from_proforma is discarded (_, modifications = ...), so the emitted peptidoform field keeps the raw to_proforma(...) output. In the sibling PSM adapter (and in quantms/psm_adapter.py) the pattern peptidoform, modifications = from_proforma(...) is used, meaning features and PSMs for the same peptide may now carry slightly different peptidoform strings. Consider aligning the feature adapter:

- _, modifications = from_proforma(peptidoform, sequence) if peptidoform else (None, None) + peptidoform, modifications = ( + from_proforma(peptidoform, sequence) if peptidoform else (peptidoform, None) + )

Note the fallback preserves the empty/original peptidoform instead of nulling it, since downstream fields (Line 382) assume it is a string.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@qpx/converters/maxquant/feature_adapter.py` around lines 286 - 289, The current code calls peptidoform = to_proforma(...) then discards the normalized peptidoform returned by from_proforma by doing "_, modifications = from_proforma(...)" which leaves the emitted peptidoform unnormalized and diverges from the PSM adapter; change the assignment to capture both the normalized peptidoform and modifications (peptidoform, modifications = from_proforma(peptidoform, sequence)) and if from_proforma returns None/None fall back to the original stringified to_proforma value so peptidoform stays a string; update the usage sites that expect a string (e.g., later logic that assumes peptidoform is a string) accordingly.

daichengxin and others added 19 commits November 29, 2023 08:41

Merge pull request #1 from bigbio/dev

3bcca2b

u

Merge pull request #2 from bigbio/dev

b234cb2

Dev

Merge pull request #3 from bigbio/dev

ef3a95e

Dev

Merge pull request #5 from bigbio/dev

348cab5

Dev

fixed example and doc

56bf30e

Update example.psm.parquet

297eaa6

fixed proforma

c37c82f

fixed

9d6528f

updated

c5a10f0

update

59b0d8c

Update test_ptm.py

4997959

fixed

abce64f

lint

e3a2d98

Update feature_adapter.py

30a4704

Update feature_adapter.py

f07f5dd

fix format

62244a9

format

84af3c7

fix(converters): fix None unpacking crashes, type annotations, line l…

8c9cb91

…ength, and sync schema

Merge pull request #199 from daichengxin/dev

e0f7f41

fixed example and doc

coderabbitai Bot reviewed Apr 18, 2026

View reviewed changes

ypriverol merged commit 3f9982d into main Apr 18, 2026
15 checks passed

coderabbitai Bot mentioned this pull request Apr 20, 2026

Minor changes #202

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor fix related with MSnet developments #200

Minor fix related with MSnet developments #200
ypriverol merged 19 commits intomainfrom
dev

ypriverol commented Apr 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 18, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

codacy-production Bot commented Apr 18, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ypriverol commented Apr 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

codacy-production Bot commented Apr 18, 2026

Up to standards ✅

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ypriverol commented Apr 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 18, 2026 •

edited

Loading