Skip to content

Add PEP section parsing and peptide-protein group resolution#201

Merged
ypriverol merged 7 commits intobigbio:devfrom
Shen-YuFei:dev
Apr 20, 2026
Merged

Add PEP section parsing and peptide-protein group resolution#201
ypriverol merged 7 commits intobigbio:devfrom
Shen-YuFei:dev

Conversation

@Shen-YuFei
Copy link
Copy Markdown
Collaborator

This pull request refactors and extends the mzTab loader and quantms feature adapter to support peptide-level (PEP) sections, and improves how peptide-to-protein mappings are handled for downstream processing. The most significant changes are the generalization of the mzTab parser to handle multiple sections (including peptides), the addition of a peptide-to-protein mapping step, and the use of this mapping to resolve ambiguous protein groups in feature extraction.

mzTab parsing and section handling improvements:

  • Generalized the mzTab loader in qpx/converters/mztab.py to support parsing of the peptide (PEP) section, in addition to proteins and PSMs, by introducing a configurable section dispatch mechanism. This includes new constants for peptide section prefixes and a unified approach for handling headers and data lines across all sections. [1] [2]
  • Refactored both the classic and fast mzTab loading paths to use the new dispatch logic, ensuring consistent handling of proteins, peptides, and PSMs, and unified cleanup of temporary files in the fast loader.

Feature adapter and peptide-protein mapping:

  • Added construction of a peptide-to-protein mapping (_pep_protein_map) in qpx/converters/quantms/feature_adapter.py, using the newly loaded peptide section to map unambiguously resolved peptides to their corresponding protein accession. This mapping is now built as part of the LFQ conversion process. [1] [2] [3]
  • Updated feature record extraction to use the peptide-to-protein map to resolve protein groups: if a peptide maps to multiple proteins but is unambiguously resolved in the PEP section, the correct single accession is used. [1] [2]

These changes make the mzTab loader more robust and extensible, and improve the biological accuracy of quantms feature extraction by leveraging peptide-level protein inference.

Copilot AI review requested due to automatic review settings April 20, 2026 02:36
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 20, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 12309324-b215-4506-8cc3-d53a3856037c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 20, 2026

Not up to standards ⛔

🔴 Issues 2 minor

Alerts:
⚠ 2 issues (≤ 0 issues of at least minor severity)

Results:
2 new issues

Category Results
Documentation 2 minor

View in Codacy

🟢 Metrics 14 complexity · 2 duplication

Metric Results
Complexity 14
Duplication 2

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the mzTab loading and QuantMS feature extraction pipeline to incorporate peptide (PEP) sections, and uses peptide-level protein inference to resolve ambiguous protein groups during feature record construction.

Changes:

  • Generalizes load_mztab_sections() to parse multiple mzTab sections (proteins, peptides, PSMs) via a dispatch mechanism, supporting both classic and fast load paths.
  • Adds a peptide→protein lookup built from the mzTab PEP section and uses it to resolve multi-accession protein groups in QuantMS LFQ feature extraction.
  • Refactors fast loader temp-file handling and cleanup into shared helpers.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
qpx/converters/mztab.py Adds peptide section support via section dispatch; refactors fast loader to stream sections into temp TSVs for DuckDB ingestion.
qpx/converters/quantms/feature_adapter.py Builds a peptide→protein mapping from the peptides table and uses it to resolve protein groups when forming feature records.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread qpx/converters/mztab.py
Comment on lines +80 to +85
# Section definitions: (header_prefix, data_prefix, table_name, dedup_col)
_MZTAB_SECTIONS = [
(_PROTEIN_HEADER_PREFIX, _PROTEIN_LINE_PREFIX, "proteins", "accession"),
(_PSM_HEADER_PREFIX, _PSM_LINE_PREFIX, "psms", "sequence"),
(_PEPTIDE_HEADER_PREFIX, _PEPTIDE_LINE_PREFIX, "peptides", "sequence"),
]
Comment on lines 606 to +609
acc_list = protein_name.split(";") if protein_name else []
if len(acc_list) > 1 and sequence in _pep_map:
resolved = _pep_map[sequence]
acc_list = [resolved]
if not self._table_exists("peptides"):
self.logger.info("No mzTab peptides table — skipping peptide protein map")
return {}

Comment thread qpx/converters/mztab.py Outdated
Comment on lines +199 to +203
if not line:
continue
parts = line.split("\t")
prefix = parts[0][:3] if parts else ""
_dispatch_line(prefix, parts, header_map, data_map, on_metadata, on_header, on_data)
@ypriverol ypriverol merged commit 62a665b into bigbio:dev Apr 20, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants