Add PEP section parsing and peptide-protein group resolution by Shen-YuFei · Pull Request #201 · bigbio/qpx

Shen-YuFei · 2026-04-20T02:36:08Z

This pull request refactors and extends the mzTab loader and quantms feature adapter to support peptide-level (PEP) sections, and improves how peptide-to-protein mappings are handled for downstream processing. The most significant changes are the generalization of the mzTab parser to handle multiple sections (including peptides), the addition of a peptide-to-protein mapping step, and the use of this mapping to resolve ambiguous protein groups in feature extraction.

mzTab parsing and section handling improvements:

Generalized the mzTab loader in qpx/converters/mztab.py to support parsing of the peptide (PEP) section, in addition to proteins and PSMs, by introducing a configurable section dispatch mechanism. This includes new constants for peptide section prefixes and a unified approach for handling headers and data lines across all sections. [1] [2]
Refactored both the classic and fast mzTab loading paths to use the new dispatch logic, ensuring consistent handling of proteins, peptides, and PSMs, and unified cleanup of temporary files in the fast loader.

Feature adapter and peptide-protein mapping:

Added construction of a peptide-to-protein mapping (_pep_protein_map) in qpx/converters/quantms/feature_adapter.py, using the newly loaded peptide section to map unambiguously resolved peptides to their corresponding protein accession. This mapping is now built as part of the LFQ conversion process. [1] [2] [3]
Updated feature record extraction to use the peptide-to-protein map to resolve protein groups: if a peptide maps to multiple proteins but is unambiguously resolved in the PEP section, the correct single accession is used. [1] [2]

These changes make the mzTab loader more robust and extensible, and improve the biological accuracy of quantms feature extraction by leveraging peptide-level protein inference.

…tion

coderabbitai · 2026-04-20T02:36:15Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 12309324-b215-4506-8cc3-d53a3856037c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codacy-production · 2026-04-20T02:38:03Z

Not up to standards ⛔

🔴 Issues 2 minor

Alerts:
⚠ 2 issues (≤ 0 issues of at least minor severity)

Results:
2 new issues

Category Results

Documentation 2 minor

View in Codacy

🟢 Metrics 14 complexity · 2 duplication

Metric Results

Complexity 14

Duplication 2

View in Codacy

_{TIP This summary will be updated as you push new changes. Give us feedback}

Copilot

Pull request overview

This PR extends the mzTab loading and QuantMS feature extraction pipeline to incorporate peptide (PEP) sections, and uses peptide-level protein inference to resolve ambiguous protein groups during feature record construction.

Changes:

Generalizes load_mztab_sections() to parse multiple mzTab sections (proteins, peptides, PSMs) via a dispatch mechanism, supporting both classic and fast load paths.
Adds a peptide→protein lookup built from the mzTab PEP section and uses it to resolve multi-accession protein groups in QuantMS LFQ feature extraction.
Refactors fast loader temp-file handling and cleanup into shared helpers.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`qpx/converters/mztab.py`	Adds peptide section support via section dispatch; refactors fast loader to stream sections into temp TSVs for DuckDB ingestion.
`qpx/converters/quantms/feature_adapter.py`	Builds a peptide→protein mapping from the `peptides` table and uses it to resolve protein groups when forming feature records.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+# Section definitions: (header_prefix, data_prefix, table_name, dedup_col)
+_MZTAB_SECTIONS = [
+    (_PROTEIN_HEADER_PREFIX, _PROTEIN_LINE_PREFIX, "proteins", "accession"),
+    (_PSM_HEADER_PREFIX, _PSM_LINE_PREFIX, "psms", "sequence"),
+    (_PEPTIDE_HEADER_PREFIX, _PEPTIDE_LINE_PREFIX, "peptides", "sequence"),
+]


                acc_list = protein_name.split(";") if protein_name else []
+                if len(acc_list) > 1 and sequence in _pep_map:
+                    resolved = _pep_map[sequence]
+                    acc_list = [resolved]


+        if not self._table_exists("peptides"):
+            self.logger.info("No mzTab peptides table — skipping peptide protein map")
+            return {}
+


+            if not line:
+                continue
+            parts = line.split("\t")
+            prefix = parts[0][:3] if parts else ""
+            _dispatch_line(prefix, parts, header_map, data_map, on_metadata, on_header, on_data)


… R1732, D407/D413)

…m .codacy.yml

Shen-YuFei added 2 commits April 18, 2026 23:27

feat(mztab): add PEP section parsing and peptide-protein group resolu…

cf54ec2

…tion

Merge branch 'bigbio:dev' into dev

f296530

Copilot AI review requested due to automatic review settings April 20, 2026 02:36

Copilot started reviewing on behalf of Shen-YuFei April 20, 2026 02:36 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

Shen-YuFei added 5 commits April 20, 2026 10:56

fix(quantms): address Codacy + Copilot PR bigbio#201 review issues

bc3631f

Merge branch 'dev' of https://github.com/Shen-YuFei/qpx into dev

861370b

fix(mztab): resolve Codacy PR bigbio#201 issues (R0913, C0301, W0718,…

f006959

… R1732, D407/D413)

fix(codacy): fix C0301, D413 and remove invalid prospector config fro…

d859bd8

…m .codacy.yml

fix(codacy): disable pylint C0301 (ruff enforces line-length=130)

90d171e

ypriverol merged commit 62a665b into bigbio:dev Apr 20, 2026
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PEP section parsing and peptide-protein group resolution#201

Add PEP section parsing and peptide-protein group resolution#201
ypriverol merged 7 commits intobigbio:devfrom
Shen-YuFei:dev

Shen-YuFei commented Apr 20, 2026

Uh oh!

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading

Review skipped

Uh oh!

codacy-production Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Shen-YuFei commented Apr 20, 2026

Uh oh!

coderabbitai Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

codacy-production Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Not up to standards ⛔

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading

codacy-production Bot commented Apr 20, 2026 •

edited

Loading