Skip to content

feat(converters): add OpenMS QPX enrichment and MuData export support#204

Merged
ypriverol merged 8 commits intobigbio:devfrom
Shen-YuFei:dev
Apr 28, 2026
Merged

feat(converters): add OpenMS QPX enrichment and MuData export support#204
ypriverol merged 8 commits intobigbio:devfrom
Shen-YuFei:dev

Conversation

@Shen-YuFei
Copy link
Copy Markdown
Collaborator

This pull request introduces comprehensive support for OpenMS QPX enrichment and improves DIA-NN conversion accuracy in the QPX toolkit. The most significant changes are the addition of a new Docker build and publishing workflow, a Dockerfile for containerization, and enhancements to the DIA-NN and OpenMS conversion pipelines. These updates make it easier to deploy QPX as a container, expand supported workflows, and improve provenance tracking.

Dockerization and CI/CD:

  • Added a .dockerignore file to exclude non-essential files from Docker build context, ensuring smaller and more secure images.
  • Introduced a Dockerfile for building a minimal QPX image with the necessary dependencies and CLI tool, supporting the [mudata] extra.
  • Added a GitHub Actions workflow .github/workflows/docker-publish.yml to automate building and publishing the QPX Docker image to GHCR, triggered on pushes, tags, PRs, and manual runs.

OpenMS QPX Enrichment Support:

  • Added a new convert openms command to the CLI (qpx/cli/convert.py), which enriches OpenMS ProteomicsLFQ -out_qpx Parquet files into a full QPX dataset using SDRF metadata. [1] [2]
  • Implemented and exposed OpenMSConverter in the codebase, including module exports and new converter logic. [1] [2] [3]

DIA-NN Conversion Improvements:

  • Enhanced DIA-NN conversion to accept a --diann-log option, auto-detecting and recording the DIA-NN version from the provided log file for improved provenance. [1] [2] [3] [4] [5] [6] [7]
  • Improved DIA-NN protein group adapter to robustly strip common file extensions from run column names, supporting more file types and ensuring consistency. [1] [2] [3] [4] [5]

Dependency Management:

  • Added a new mudata extra to pyproject.toml and included it in the all and dev extras to support MuData export in Docker and development environments.

These changes collectively improve QPX's usability, reproducibility, and compatibility with modern proteomics workflows.

Copilot AI review requested due to automatic review settings April 25, 2026 09:03
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 25, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4f06ee36-dc2c-4601-b8bb-539646339874

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Shen-YuFei Shen-YuFei requested review from ypriverol and removed request for Copilot April 25, 2026 09:03
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 25, 2026

Not up to standards ⛔

🔴 Issues 1 medium · 7 minor

Alerts:
⚠ 8 issues (≤ 0 issues of at least minor severity)

Results:
8 new issues

Category Results
BestPractice 1 medium
Documentation 7 minor

View in Codacy

🟢 Metrics 46 complexity · 6 duplication

Metric Results
Complexity 46
Duplication 6

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@ypriverol
Copy link
Copy Markdown
Member

@Shen-YuFei Why do we need a DockerFile?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds OpenMS -out_qpx enrichment support and improves dataset provenance/export ergonomics across QPX, including containerization for easier deployment.

Changes:

  • Introduce OpenMSConverter + qpxc convert openms to enrich OpenMS -out_qpx core tables into a full QPX dataset (metadata + provenance + ontology + dataset tables).
  • Improve MuData export robustness (NA filtering in uns, per-table intensity label detection, and h5py-safe string sanitization).
  • Enhance DIA-NN conversion provenance (auto-detect DIA-NN version from log) and broaden run column extension stripping; add Docker image + GHCR publish workflow.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
qpx/converters/openms/converter.py New OpenMS enrichment orchestrator: discover/validate/copy core parquet files and generate metadata/provenance/ontology/dataset outputs.
qpx/converters/openms/__init__.py Export OpenMSConverter from the OpenMS converters module.
qpx/cli/convert.py Add qpxc convert openms command; add --diann-log option for DIA-NN provenance enrichment.
qpx/converters/diann/converter.py Parse DIA-NN version from a summary log and record it in provenance + dataset metadata.
qpx/converters/diann/pg_adapter.py Make run column detection/normalization more robust by stripping multiple common raw-file extensions.
qpx/mudata.py Fix MuData export edge cases: handle pd.NA/pd.NaT, detect intensity labels per table, and sanitize string-like columns for h5mu serialization.
qpx/converters/__init__.py Expose OpenMSConverter at the top-level converters package.
tests/converters/test_openms_converter.py Add tests for OpenMS enrichment: file discovery, full bundle output, schema validation, provenance, and SDRF/no-SDRF behavior.
tests/unit/test_mudata.py Add regression tests for MuData export metadata attachment and per-table intensity label detection.
pyproject.toml Add mudata optional extra and include it in all/dev dependency sets.
Dockerfile Provide a slim container build that installs QPX with [mudata] extra and sets version fallback for SCM-based versioning.
.github/workflows/docker-publish.yml Add GHCR build/publish workflow for Docker image (push/tag/PR/manual triggers).
.dockerignore Reduce Docker build context by excluding tests, docs, and other non-essential directories/files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +65 to +66
if score and "name" in score and score["name"]:
names.add(score["name"])
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _collect_score_names, the code looks for a name key inside each additional_scores entry, but the QPX score struct uses score_name (see qpx/core/data/schemas/types.yaml). As written, this will fail to collect any score names and will silently omit score-derived ontology entries. Update the extraction to read score_name (and consider guarding for both keys if you need backward compatibility).

Suggested change
if score and "name" in score and score["name"]:
names.add(score["name"])
if score:
score_name = score.get("score_name") or score.get("name")
if score_name:
names.add(score_name)

Copilot uses AI. Check for mistakes.
Comment on lines +242 to +246
"tool_uri": None,
"parameters": None,
"config": None,
"output_views": [SAMPLE, RUN, ONTOLOGY],
},
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_build_provenance() always lists SAMPLE and RUN in output_views, but convert() can skip SDRF conversion (and thus never create sample/run files). This makes provenance inaccurate for runs without sdrf_path. Consider building output_views dynamically based on whether SDRF conversion actually ran (and whether ontology was written).

Copilot uses AI. Check for mistakes.
Comment thread qpx/cli/convert.py
Comment on lines +804 to +805
help="SDRF metadata file path (for sample/run generation)",
required=True,
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The openms CLI command requires --sdrf-file, but OpenMSConverter supports running without SDRF (and there’s a test asserting sample/run are skipped when SDRF is absent). Either make --sdrf-file optional (and update help/examples accordingly), or remove the no-SDRF codepath/tests so the CLI and converter behavior stay consistent.

Suggested change
help="SDRF metadata file path (for sample/run generation)",
required=True,
help="Optional SDRF metadata file path; if omitted, sample/run generation is skipped",
required=False,

Copilot uses AI. Check for mistakes.
@Shen-YuFei
Copy link
Copy Markdown
Collaborator Author

@Shen-YuFei Why do we need a DockerFile?

@ypriverol QPX is not on BioConda yet, so there's no BioContainers image available. The Dockerfile is currently the only way to build the ghcr.io/bigbio/qpx image that quantms depends on. If you plan to publish QPX to BioConda, we can remove it afterwards.

@ypriverol ypriverol merged commit dc2a252 into bigbio:dev Apr 28, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants