feat(converters): add OpenMS QPX enrichment and MuData export support#204
feat(converters): add OpenMS QPX enrichment and MuData export support#204ypriverol merged 8 commits intobigbio:devfrom
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| BestPractice | 1 medium |
| Documentation | 7 minor |
🟢 Metrics 46 complexity · 6 duplication
Metric Results Complexity 46 Duplication 6
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
|
@Shen-YuFei Why do we need a DockerFile? |
There was a problem hiding this comment.
Pull request overview
Adds OpenMS -out_qpx enrichment support and improves dataset provenance/export ergonomics across QPX, including containerization for easier deployment.
Changes:
- Introduce
OpenMSConverter+qpxc convert openmsto enrich OpenMS-out_qpxcore tables into a full QPX dataset (metadata + provenance + ontology + dataset tables). - Improve MuData export robustness (NA filtering in
uns, per-table intensity label detection, and h5py-safe string sanitization). - Enhance DIA-NN conversion provenance (auto-detect DIA-NN version from log) and broaden run column extension stripping; add Docker image + GHCR publish workflow.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
qpx/converters/openms/converter.py |
New OpenMS enrichment orchestrator: discover/validate/copy core parquet files and generate metadata/provenance/ontology/dataset outputs. |
qpx/converters/openms/__init__.py |
Export OpenMSConverter from the OpenMS converters module. |
qpx/cli/convert.py |
Add qpxc convert openms command; add --diann-log option for DIA-NN provenance enrichment. |
qpx/converters/diann/converter.py |
Parse DIA-NN version from a summary log and record it in provenance + dataset metadata. |
qpx/converters/diann/pg_adapter.py |
Make run column detection/normalization more robust by stripping multiple common raw-file extensions. |
qpx/mudata.py |
Fix MuData export edge cases: handle pd.NA/pd.NaT, detect intensity labels per table, and sanitize string-like columns for h5mu serialization. |
qpx/converters/__init__.py |
Expose OpenMSConverter at the top-level converters package. |
tests/converters/test_openms_converter.py |
Add tests for OpenMS enrichment: file discovery, full bundle output, schema validation, provenance, and SDRF/no-SDRF behavior. |
tests/unit/test_mudata.py |
Add regression tests for MuData export metadata attachment and per-table intensity label detection. |
pyproject.toml |
Add mudata optional extra and include it in all/dev dependency sets. |
Dockerfile |
Provide a slim container build that installs QPX with [mudata] extra and sets version fallback for SCM-based versioning. |
.github/workflows/docker-publish.yml |
Add GHCR build/publish workflow for Docker image (push/tag/PR/manual triggers). |
.dockerignore |
Reduce Docker build context by excluding tests, docs, and other non-essential directories/files. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if score and "name" in score and score["name"]: | ||
| names.add(score["name"]) |
There was a problem hiding this comment.
In _collect_score_names, the code looks for a name key inside each additional_scores entry, but the QPX score struct uses score_name (see qpx/core/data/schemas/types.yaml). As written, this will fail to collect any score names and will silently omit score-derived ontology entries. Update the extraction to read score_name (and consider guarding for both keys if you need backward compatibility).
| if score and "name" in score and score["name"]: | |
| names.add(score["name"]) | |
| if score: | |
| score_name = score.get("score_name") or score.get("name") | |
| if score_name: | |
| names.add(score_name) |
| "tool_uri": None, | ||
| "parameters": None, | ||
| "config": None, | ||
| "output_views": [SAMPLE, RUN, ONTOLOGY], | ||
| }, |
There was a problem hiding this comment.
_build_provenance() always lists SAMPLE and RUN in output_views, but convert() can skip SDRF conversion (and thus never create sample/run files). This makes provenance inaccurate for runs without sdrf_path. Consider building output_views dynamically based on whether SDRF conversion actually ran (and whether ontology was written).
| help="SDRF metadata file path (for sample/run generation)", | ||
| required=True, |
There was a problem hiding this comment.
The openms CLI command requires --sdrf-file, but OpenMSConverter supports running without SDRF (and there’s a test asserting sample/run are skipped when SDRF is absent). Either make --sdrf-file optional (and update help/examples accordingly), or remove the no-SDRF codepath/tests so the CLI and converter behavior stay consistent.
| help="SDRF metadata file path (for sample/run generation)", | |
| required=True, | |
| help="Optional SDRF metadata file path; if omitted, sample/run generation is skipped", | |
| required=False, |
@ypriverol QPX is not on BioConda yet, so there's no BioContainers image available. The Dockerfile is currently the only way to build the ghcr.io/bigbio/qpx image that quantms depends on. If you plan to publish QPX to BioConda, we can remove it afterwards. |
This pull request introduces comprehensive support for OpenMS QPX enrichment and improves DIA-NN conversion accuracy in the QPX toolkit. The most significant changes are the addition of a new Docker build and publishing workflow, a Dockerfile for containerization, and enhancements to the DIA-NN and OpenMS conversion pipelines. These updates make it easier to deploy QPX as a container, expand supported workflows, and improve provenance tracking.
Dockerization and CI/CD:
.dockerignorefile to exclude non-essential files from Docker build context, ensuring smaller and more secure images.Dockerfilefor building a minimal QPX image with the necessary dependencies and CLI tool, supporting the[mudata]extra..github/workflows/docker-publish.ymlto automate building and publishing the QPX Docker image to GHCR, triggered on pushes, tags, PRs, and manual runs.OpenMS QPX Enrichment Support:
convert openmscommand to the CLI (qpx/cli/convert.py), which enriches OpenMS ProteomicsLFQ-out_qpxParquet files into a full QPX dataset using SDRF metadata. [1] [2]OpenMSConverterin the codebase, including module exports and new converter logic. [1] [2] [3]DIA-NN Conversion Improvements:
--diann-logoption, auto-detecting and recording the DIA-NN version from the provided log file for improved provenance. [1] [2] [3] [4] [5] [6] [7]Dependency Management:
mudataextra topyproject.tomland included it in theallanddevextras to support MuData export in Docker and development environments.These changes collectively improve QPX's usability, reproducibility, and compatibility with modern proteomics workflows.