From 0ba27e92f9f3b1c0d14ba259341a6ba5822ce943 Mon Sep 17 00:00:00 2001 From: Austin Macdonald Date: Tue, 2 Dec 2025 16:04:38 -0600 Subject: [PATCH 01/12] Add design docs for image/profile refactor MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Proposal to separate image storage from execution configuration: - Clean registry URL scheme (docker://, quay://, ghcr://) - Native OCI format storage with git-annex tracking - Execution profiles (YAML) for reusable run configurations - Profile inheritance with clobber semantics - CLI overrides (--image, --exec) for flexibility Builds on the skopeo branch which implements OCI storage. Includes: - Main proposal: docs/design/image-container-refactor.md - ReproNim integration: docs/design/image-container-refactor-repronim.md - Tutorial: docs/design/tutorial-mriqc-workflow.md - ReproNim example: docs/design/repronim-containers-mriqc-example.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- .../image-container-refactor-repronim.md | 387 ++++++++++++++++ docs/design/image-container-refactor.md | 437 ++++++++++++++++++ .../repronim-containers-mriqc-example.md | 332 +++++++++++++ docs/design/tutorial-mriqc-workflow.md | 237 ++++++++++ 4 files changed, 1393 insertions(+) create mode 100644 docs/design/image-container-refactor-repronim.md create mode 100644 docs/design/image-container-refactor.md create mode 100644 docs/design/repronim-containers-mriqc-example.md create mode 100644 docs/design/tutorial-mriqc-workflow.md diff --git a/docs/design/image-container-refactor-repronim.md b/docs/design/image-container-refactor-repronim.md new file mode 100644 index 0000000..302c3bf --- /dev/null +++ b/docs/design/image-container-refactor-repronim.md @@ -0,0 +1,387 @@ +# ReproNim/containers Integration with Refactored datalad-container + +This document describes how ReproNim/containers can leverage the refactored datalad-container architecture. + +--- + +## New Capabilities + +### 1. Multiple Image Formats and Versions + +With native format storage and versioned image directories, ReproNim can provide: + +**Multiple versions per image:** +``` +.datalad/containers/images/ +├── mriqc/ +│ ├── 23.1.0/ +│ │ ├── image/ # OCI directory +│ │ └── image.sif # Optional SIF +│ └── 24.0.0/ +│ └── image/ +└── fmriprep/ + ├── 23.2.0/ + │ └── image/ + └── 24.1.0/ + └── image/ +``` + +### Benefits of OCI format for ReproNim: + +- **Layer deduplication** - Many neuroimaging containers share base layers +- **Incremental updates** - Only changed layers need to be fetched +- **Registry retrieval** - git-annex can fetch layers directly from Docker Hub +- **Multi-runtime** - Works with apptainer, podman, docker without conversion + +--- + +## 2. Execution Profiles + +ReproNim ships curated base profiles alongside images. Users extend these for their specific needs. + +### ReproNim Base Profiles + +```yaml +# .datalad/containers/profiles/mriqc.yaml +# Base MRIQC profile - sane defaults for most users + +image: mriqc/23.1.0 +exec: apptainer exec --cleanenv {img} {cmd} +``` + +```yaml +# .datalad/containers/profiles/fmriprep.yaml +# Base fMRIPrep profile + +image: fmriprep/23.2.0 +exec: apptainer exec --cleanenv {img} {cmd} +``` + +### User Extensions + +Users create their own profiles that extend ReproNim's base: + +```yaml +# my-analysis/.datalad/containers/profiles/mriqc-mylab.yaml + +extends: inputs/containers/.datalad/containers/profiles/mriqc.yaml +exec: apptainer exec --cleanenv --nv --bind /scratch:/scratch --bind /data/mylab:/input {img} {cmd} +``` + +```yaml +# my-analysis/.datalad/containers/profiles/mriqc-gpu.yaml + +extends: inputs/containers/.datalad/containers/profiles/mriqc.yaml +exec: apptainer exec --cleanenv --nv {img} {cmd} +``` + +**Key point:** ReproNim provides the base. Users clobber `exec` with their environment-specific settings. No runtime-specific profiles (apptainer-gpu, podman-default, etc.) - users know what they need. + +--- + +## 3. Dataset Structure + +### Proposed ReproNim/containers layout: + +``` +ReproNim/containers/ +├── .datalad/containers/ +│ ├── images/ +│ │ ├── mriqc/ +│ │ │ ├── 23.1.0/ +│ │ │ │ ├── image/ +│ │ │ │ └── image.sif +│ │ │ └── 24.0.0/ +│ │ │ └── image/ +│ │ └── fmriprep/ +│ │ ├── 23.2.0/ +│ │ │ └── image/ +│ │ └── 24.1.0/ +│ │ └── image/ +│ └── profiles/ +│ ├── mriqc.yaml +│ ├── mriqc-24.yaml +│ ├── fmriprep.yaml +│ └── fmriprep-24.yaml +│ +├── binds/ +│ ├── HOME/ # Fake home with .bashrc, .gitconfig +│ └── zoneinfo/UTC # Timezone file +│ +├── scripts/ +│ ├── setup-env.sh # Pre-run hook for profiles +│ ├── cleanup.sh # Post-run hook for profiles +│ └── freeze_versions # Version pinning tool +│ +└── README.md +``` + +Provenance (source URL, digest, fetch time) is stored in git commits, not separate files. + +--- + +## 4. Workflow Examples + +### Basic Usage + +```bash +# Get ReproNim containers +datalad clone https://github.com/ReproNim/containers inputs/containers + +# List available images and profiles +datalad containers-images -d inputs/containers +datalad containers-profiles -d inputs/containers + +# Run with base profile +datalad containers-run -d inputs/containers --profile mriqc \ + mriqc /bids /outputs participant +``` + +### Creating a Lab-Specific Profile + +```bash +# Create your own profile extending ReproNim's base +mkdir -p .datalad/containers/profiles +cat > .datalad/containers/profiles/mriqc-mylab.yaml << 'EOF' +extends: inputs/containers/.datalad/containers/profiles/mriqc.yaml +exec: apptainer exec --cleanenv --nv --bind /scratch:/scratch --bind /gpfs/mylab:/data {img} {cmd} +EOF + +# Use it +datalad containers-run --profile mriqc-mylab \ + mriqc /data/bids /data/outputs participant +``` + +### One-Off Override + +```bash +# Use base profile but override exec for this run +datalad containers-run -d inputs/containers --profile mriqc \ + --exec "apptainer exec --cleanenv --nv {img} {cmd}" \ + mriqc /bids /outputs participant +``` + +### Using a Specific Version + +```bash +# Create profile for newer version +cat > .datalad/containers/profiles/mriqc-24.yaml << 'EOF' +image: mriqc/24.0.0 +exec: apptainer exec --cleanenv {img} {cmd} +EOF + +datalad containers-run --profile mriqc-24 \ + mriqc /bids /outputs participant +``` + +--- + +## 5. Migration Path + +### Phase 1: Add new structure (non-breaking) + +- Add `.datalad/containers/images/` with versioned directories +- Add `.datalad/containers/profiles/` YAML files +- Add OCI images alongside existing SIF files +- Keep existing `.datalad/config` entries + +### Phase 2: Recommend new approach + +- Default documentation uses profiles +- Legacy config still works +- Add migration guide + +### Phase 3: Simplify storage + +- Consider generating SIF on-demand with `containers-convert` +- Or keep both for user convenience + +--- + +## 6. Benefits Summary + +| Aspect | Current | With Refactor | +|--------|---------|---------------| +| Image formats | SIF only | SIF + OCI | +| Image versions | Flat naming | Structured versioning | +| Layer sharing | None | Deduplication across containers | +| Execution config | Hardcoded in wrapper | Base profiles, user-extendable | +| Runtime flexibility | Singularity only | Any runtime via profile | +| Provenance | Wrapper invocation | Actual command recorded | +| Customization | Fork or edit wrapper | Extend profile, clobber exec | +| HPC adaptation | Manual | User creates their own profile | + +--- + +## 7. Replacing singularity_cmd with Profiles + +The current `singularity_cmd` shim provides many features. With profile extensions, most can be handled declaratively. + +### Current shim features + +| Feature | Description | +|---------|-------------| +| Fake HOME | Custom home with minimal .bashrc/.gitconfig | +| Git config passthrough | Copies user.name, user.email, annex.pidlock, safe.directory | +| Isolated /tmp | Creates temp dir, binds as /tmp and /var/tmp, cleans up | +| Environment sanitization | `--cleanenv`, `--contain` flags | +| DATALAD_CONTAINER_NAME | Exports via SINGULARITYENV_*/APPTAINERENV_* | +| Timezone handling | Binds zoneinfo/UTC to /etc/localtime | +| Matplotlib fix | Sets MPLCONFIGDIR=/tmp/mpl-config | +| Docker fallback | Runs singularity inside Docker on non-Linux | +| Duct integration | Optional resource monitoring wrapper | + +### What profiles can handle today + +Static flags work directly in `exec`: + +```yaml +exec: >- + apptainer exec + --cleanenv + --contain + -H code/containers/binds/HOME + -B code/containers/binds/zoneinfo/UTC:/etc/localtime + -B {pwd} + --pwd {pwd} + {img} + {cmd} +``` + +### Upstream RFE: Advanced Profile Features + +To fully replace the shim, datalad-container would need these profile extensions: + +#### 1. `env` section - static environment variables + +```yaml +env: + SINGULARITYENV_MPLCONFIGDIR: /tmp/mpl-config + APPTAINERENV_MPLCONFIGDIR: /tmp/mpl-config +``` + +#### 2. `pre-run` hook - script executed before container + +```yaml +pre-run: code/containers/scripts/setup-env.sh +``` + +The script would: +- Create temp directory, export as `TMPDIR` +- Generate .gitconfig from current git config +- Export `SINGULARITYENV_DATALAD_CONTAINER_NAME` + +#### 3. `post-run` hook - script executed after container + +```yaml +post-run: code/containers/scripts/cleanup.sh +``` + +The script would: +- Remove temp directory + +#### 4. `{env.VARNAME}` placeholder - reference env vars in exec + +```yaml +exec: >- + apptainer exec + -B {env.TMPDIR}:/tmp + -H {env.BHOME} + {img} + {cmd} +``` + +### Full profile replacing singularity_cmd + +With these extensions, a profile could fully replace the shim: + +```yaml +# .datalad/containers/profiles/mriqc-repronim.yaml + +image: mriqc/23.1.0 + +pre-run: code/containers/scripts/setup-env.sh +post-run: code/containers/scripts/cleanup.sh + +env: + SINGULARITYENV_MPLCONFIGDIR: /tmp/mpl-config + APPTAINERENV_MPLCONFIGDIR: /tmp/mpl-config + +exec: >- + apptainer exec + --cleanenv + --contain + -H {env.BHOME} + -B {env.TMPDIR}:/tmp + -B {env.TMPDIR}/var:/var/tmp + -B code/containers/binds/zoneinfo/UTC:/etc/localtime + -B {pwd} + --pwd {pwd} + -W {env.TMPDIR} + {img} + {cmd} +``` + +### Docker fallback as separate profile + +Instead of runtime detection, use a separate profile: + +```yaml +# .datalad/containers/profiles/mriqc-repronim-docker.yaml +# For non-Linux systems (macOS, Windows with Docker) + +image: mriqc/23.1.0 + +pre-run: code/containers/scripts/setup-env-docker.sh + +env: + SINGULARITYENV_MPLCONFIGDIR: /tmp/mpl-config + +exec: >- + docker run + --privileged + --rm + -e UID={env.UID} + -e GID={env.GID} + -v {env.TMPDIR}:{env.TMPDIR} + -v {pwd}:{pwd} + -v code/containers/binds/HOME:{env.BHOME} + -w {pwd} + repronim/containers:latest + exec + --cleanenv + -H {env.BHOME} + -B {env.TMPDIR}:/tmp + --pwd {pwd} + {img} + {cmd} +``` + +Users on macOS would use `--profile mriqc-repronim-docker` explicitly. + +### Summary: Upstream RFE for datalad-container + +To enable ReproNim to replace `singularity_cmd` with profiles: + +1. **`env:` section** - Static environment variable definitions +2. **`pre-run:` hook** - Script to run before exec (can export env vars) +3. **`post-run:` hook** - Script to run after exec (cleanup) +4. **`{env.VARNAME}` placeholder** - Reference environment variables in exec template + +These are optional extensions - basic profiles work without them. + +--- + +## 8. Open Questions for ReproNim + +1. **Dual format storage** - Ship both OCI and SIF, or OCI-only with on-demand conversion? + +2. **Base profile scope** - Just image + minimal exec, or include ReproNim sanitization defaults? + +3. **Profile per version** - One profile per version (mriqc-23.yaml, mriqc-24.yaml) or update profile to point to latest? + +4. **Backward compatibility** - How long to maintain legacy `.datalad/config` entries and `singularity_cmd`? + +5. **Docker fallback** - Separate profile (explicit) or some detection mechanism? + +6. **Duct integration** - Wrapper in pre-run, or dedicated profile field? diff --git a/docs/design/image-container-refactor.md b/docs/design/image-container-refactor.md new file mode 100644 index 0000000..3c9d62a --- /dev/null +++ b/docs/design/image-container-refactor.md @@ -0,0 +1,437 @@ +# Proposal: Image and Execution Profile Refactor + +> **Related work:** The [`skopeo` branch](https://github.com/datalad/datalad-container/tree/skopeo) implements OCI image storage via Skopeo. This proposal builds on that foundation. + +## 1. Overview + +This proposal refactors datalad-container to cleanly separate **image storage** from **execution configuration**. The result is a simpler, more flexible system where: + +- Images are versioned artifacts with full provenance +- Execution profiles define how to run images, and can be shared/extended +- Provenance records the actual command that ran, not a shim + +--- + +## 2. Current Problems + +### 2.1 URL Scheme Conflates Multiple Concerns + +The current URL scheme mixes source registry, storage format, and execution method: + +```bash +docker://... # Singularity pulls directly, stores as SIF +dhub://... # Python adapter saves as tar, runs via shim +oci:docker://... # Skopeo saves as OCI directory, runs via shim +shub://... # Singularity Hub +``` + +Users must understand implementation details to choose the "right" URL scheme. + +### 2.2 Execution Configuration is Inflexible + +Execution is either: +- **Hardcoded in Python** - `docker_run()`, `podman_run()` with fixed flags +- **Baked in at add-time** - `cmdexec` set once, hard to change +- **Hidden from provenance** - shim invocation recorded, not actual command + +Scientists need to run the same image differently: +- GPU vs CPU +- Different bind mounts +- Different runtimes (apptainer on HPC, podman on laptop) + +### 2.3 Provenance Records Shims, Not Commands + +Current run records show: +``` +cmd: {python} -m datalad_container.adapters.oci run {img} {cmd} +``` + +This doesn't capture: +- Which runtime actually executed +- What flags were used +- The actual command that ran + +Someone replaying the record doesn't know what really happened. + +### 2.4 Adding Runtimes Requires Code Changes + +Supporting a new runtime (e.g., podman) requires: +- New Python functions (`podman_run()`) +- Runtime detection logic +- Testing across platforms + +This doesn't scale and adds maintenance burden. + +--- + +## 3. Proposed Design + +### Summary + +**Two concepts only:** + +1. **Image** - versioned artifact with provenance, no execution semantics +2. **Execution Profile** (`profile`) - reusable execution recipe that references an image + +**Key principles:** + +- Protocol indicates registry source (`docker://`, `quay://`, `ghcr://`) +- Images stored in native OCI format, tracked by git-annex +- Profiles are YAML files that define `image` + `exec` template +- Profiles can extend other profiles (clobber semantics, no merging) +- CLI args (`--image`, `--exec`) override profile fields +- Provenance records the resolved command, not shim invocation + +**File layout:** + +``` +.datalad/containers/ +├── images/ +│ └── mriqc/ +│ ├── 23.1.0/ +│ │ └── image/ # OCI directory +│ └── 24.0.0/ +│ └── image/ +└── profiles/ + ├── mriqc.yaml # Base profile + └── mriqc-gpu.yaml # Extended profile +``` + +--- + +### 3.1 Image Storage Layer + +#### Registry URL Scheme + +Protocol indicates registry source only: + +```bash +docker://org/repo:tag # Docker Hub (docker.io) +quay://org/repo:tag # Quay.io +ghcr://org/repo:tag # GitHub Container Registry +shub://org/repo:tag # Singularity Hub (legacy) +``` + +**Usage:** +```bash +datalad containers-add mriqc/23.1.0 --url docker://nipreps/mriqc:23.1.0 +datalad containers-add samtools/1.9 --url quay://biocontainers/samtools:1.9 +``` + +#### Storage Format + +Images stored in their native format: + +| Registry Type | Storage Format | +|---------------|----------------| +| Docker/OCI registries | OCI directory structure | +| Singularity Hub | SIF file | +| Local SIF file | Copy as-is | + +OCI directory structure: +``` +.datalad/containers/images/mriqc/23.1.0/ +└── image/ + ├── blobs/sha256/... # Layers (git-annex tracked) + ├── index.json + └── oci-layout +``` + +#### Git-annex Integration + +Individual layers get registry URLs for efficient retrieval: +```bash +git annex whereis .datalad/containers/images/mriqc/23.1.0/image/blobs/sha256/abc123 +# → docker.io/nipreps/mriqc@sha256:abc123 +``` + +#### Provenance + +Git commits capture provenance: +- Source URL +- Fetch timestamp +- Content checksums (via git-annex) + +No separate provenance file needed - git is the provenance store. + +#### Format Conversion (Optional) + +Convert OCI to SIF for HPC performance: +```bash +datalad containers-convert mriqc/23.1.0 --to sif +# Creates: .datalad/containers/images/mriqc/23.1.0/image.sif +``` + +Both formats can coexist; profiles reference the appropriate one. + +--- + +### 3.2 Execution Layer + +#### Execution Profiles + +Profiles are YAML files in `.datalad/containers/profiles/`: + +```yaml +# .datalad/containers/profiles/mriqc.yaml +image: mriqc/23.1.0 +exec: apptainer exec --cleanenv {img} {cmd} +``` + +```yaml +# .datalad/containers/profiles/mriqc-gpu.yaml +extends: mriqc +exec: apptainer exec --cleanenv --nv {img} {cmd} +``` + +#### Clobber Semantics + +Child profiles **completely replace** parent values. No magic merging. + +If you want parent's flags plus yours, copy them explicitly. This is intentional - you see exactly what will run. + +#### Cross-Dataset Extension + +Profiles can extend profiles from subdatasets: + +```yaml +# my-analysis/.datalad/containers/profiles/mriqc-local.yaml +extends: code/containers/.datalad/containers/profiles/mriqc.yaml +exec: apptainer exec --cleanenv --bind /scratch:/scratch {img} {cmd} +``` + +The path is explicit and unambiguous. + +#### Placeholder Expansion + +- `{img}` - resolves to image path (e.g., `oci:.datalad/containers/images/mriqc/23.1.0/image`) +- `{cmd}` - the command arguments + +#### No Automatic Runtime Detection + +Earlier designs considered inspecting `exec` to auto-detect runtime. **This is explicitly rejected.** + +Each profile is explicit about its runtime: +- `mriqc-apptainer.yaml` - user writes `apptainer exec oci:{img} {cmd}` +- `mriqc-podman.yaml` - user writes `podman run {img} {cmd}` +- `mriqc-docker.yaml` - user writes `docker run {img} {cmd}` + +**Benefits:** +- No Python code per runtime +- Users control every flag +- Provenance records exactly what the user specified +- New runtimes require zero code changes + +#### Provenance + +Run records capture the **resolved command**: + +```json +{ + "cmd": "apptainer exec --cleanenv --nv oci:.datalad/containers/images/mriqc/23.1.0/image mriqc /input /output participant", + "profile": "mriqc-gpu", + "profile-source": ".datalad/containers/profiles/mriqc-gpu.yaml" +} +``` + +The `cmd` is what actually ran. The profile reference is informational. + +--- + +## 4. Interface Changes + +### containers-add + +```bash +# Add image from registry +datalad containers-add mriqc/23.1.0 --url docker://nipreps/mriqc:23.1.0 + +# Add another version +datalad containers-add mriqc/24.0.0 --url docker://nipreps/mriqc:24.0.0 +``` + +### containers-run + +CLI args override profile fields. Precedence (highest to lowest): +1. CLI args (`--image`, `--exec`) +2. Profile fields +3. Extended profile fields (clobbered, not merged) + +```bash +# Use profile as-is +datalad containers-run --profile mriqc + +# Override just exec (keep profile's image) +datalad containers-run --profile mriqc --exec "apptainer exec --nv {img} {cmd}" + +# Override just image (keep profile's exec) +datalad containers-run --profile mriqc --image mriqc/24.0.0 + +# Override both +datalad containers-run --profile mriqc --image mriqc/24.0.0 --exec "..." + +# No profile (must specify both) +datalad containers-run --image mriqc/23.1.0 --exec "apptainer exec {img} {cmd}" +``` + +### New Commands + +```bash +# List images +datalad containers-images + +# List profiles +datalad containers-profiles + +# Convert image format +datalad containers-convert --to sif +``` + +--- + +## 5. Summary + +| Aspect | Current | Proposed | +|--------|---------|----------| +| URL scheme | Mixed semantics | Protocol = registry | +| Storage format | Determined by URL | Native format from source | +| Provenance | Shim invocation | Actual command in git | +| Execution config | Baked in at add-time | Execution profiles | +| Profile reuse | Not possible | Profiles extend profiles | +| Runtime support | Requires code changes | Zero code changes | +| Configuration | .datalad/config (INI) | YAML files | + +--- + +## 6. Implementation Plan + +This work breaks into three phases that can be implemented and tested independently. + +### Phase 1: Image Storage (builds on `skopeo` branch) + +The `skopeo` branch already provides: +- OCI directory storage via Skopeo +- Git-annex tracking of layers +- Registry URL linking for efficient retrieval + +**What's NOT done yet:** +- Clean URL scheme (`docker://` instead of `oci:docker://`) +- Support for `quay://`, `ghcr://` protocols +- Versioned image directories (`.datalad/containers/images///`) +- Remove `cmdexec` requirement from `containers-add` + +**Milestone:** Images can be added and used with `datalad run` directly: + +```bash +# Add image +datalad containers-add mriqc/23.1.0 --url docker://nipreps/mriqc:23.1.0 + +# Use with datalad run (no containers-run needed) +datalad run \ + --input .datalad/containers/images/mriqc/23.1.0 \ + --output outputs/ \ + "apptainer exec oci:.datalad/containers/images/mriqc/23.1.0/image mriqc ..." +``` + +This alone is valuable - full provenance, no shims, user controls execution. + +--- + +### Phase 2: Recreate containers-run (no profiles) + +Rebuild `containers-run` with explicit `--image` and `--exec` flags: + +```bash +datalad containers-run \ + --image mriqc/23.1.0 \ + --exec "apptainer exec {img} {cmd}" \ + mriqc /data /output participant +``` + +**What this provides:** +- `--image` resolves to `.datalad/containers/images///` +- `--exec` is the execution template (required at this phase) +- `{img}` and `{cmd}` placeholder expansion +- Provenance records the resolved command + +**What's NOT done yet:** +- No profiles +- Must specify both `--image` and `--exec` every time + +**Milestone:** `containers-run` works without profiles, giving explicit control. + +--- + +### Phase 3: Execution Profiles + +Add YAML execution profile system on top of Phase 2: + +```bash +# With profile +datalad containers-run --profile mriqc mriqc /data /output participant + +# Override profile's exec +datalad containers-run --profile mriqc --exec "apptainer exec --nv {img} {cmd}" ... + +# Override profile's image +datalad containers-run --profile mriqc --image mriqc/24.0.0 ... +``` + +**What this provides:** +- YAML profile files in `.datalad/containers/profiles/` +- Profile inheritance with clobber semantics +- CLI overrides (`--image`, `--exec`) take precedence +- `containers-profiles` command to list available profiles + +**Value for ReproNim/containers:** +- Ship base profiles alongside images +- Users extend with their environment-specific settings +- Execution profiles are the "curated execution recipes" - ReproNim's value-add + +**Milestone:** Full execution profile system, backward compatible with Phase 2 (can still use `--image` + `--exec` directly). + +--- + +## 7. Open Questions + +1. **Custom registries** - How to specify private/custom OCI registries? + +2. **Profile discovery** - How to list available profiles from subdatasets? + +3. **Profile validation** - Warn if profile references unavailable image or runtime? + +4. **Placeholder expansion** - What placeholders beyond `{img}` and `{cmd}`? (`{pwd}`, `{uid}`?) + +5. **Image format conversion** - On-demand or explicit `containers-convert` command? + +6. **Backward compatibility** - How to handle existing datasets using old format? + + Current format stores in `.datalad/config`: + ```ini + datalad.containers.mriqc.image = .datalad/environments/mriqc/image + datalad.containers.mriqc.cmdexec = {python} -m datalad_container.adapters.oci run {img} {cmd} + ``` + + Issues to resolve: + - **`-n` flag**: Keep as alias for `--profile`? Or separate lookup for old vs new? + - **Image paths**: Old `.datalad/environments//image` vs new `.datalad/containers/images///image` + - **No version in old naming**: Old `mriqc` vs new `mriqc/23.1.0` - use `latest` as default version? + - **Shim cmdexec**: Old shims still work but defeat provenance goals + + Possible approaches: + + **A. `containers-migrate` command:** + - Move images: `.datalad/environments//` → `.datalad/containers/images//latest/` + - Generate profile from old cmdexec + - Remove old config entries + - Optionally prompt user to simplify shim exec to direct invocation + + **B. Dual lookup (no migration required):** + - `-n` checks new profiles first, falls back to old config + - Old datasets keep working without changes + - New features only available after migration + + **C. Deprecation without migration:** + - Old format works but emits warnings + - Document manual migration steps + - Eventually remove old code paths diff --git a/docs/design/repronim-containers-mriqc-example.md b/docs/design/repronim-containers-mriqc-example.md new file mode 100644 index 0000000..a32cf50 --- /dev/null +++ b/docs/design/repronim-containers-mriqc-example.md @@ -0,0 +1,332 @@ +# Example: MRIQC Workflow with ReproNim/containers + +This example demonstrates using ReproNim/containers as a subdataset to run MRIQC for MRI quality control. We extend the provided base profile with our own environment-specific settings. + +--- + +## Prerequisites + +```bash +# Install datalad and extensions +pip install datalad datalad-container + +# Install a container runtime (at least one) +# Apptainer (recommended for HPC) +sudo apt-get install apptainer + +# Or Podman (rootless Docker alternative) +sudo apt-get install podman + +# Install Skopeo for OCI image fetching +sudo apt-get install skopeo +``` + +--- + +## Step 1: Create Your Analysis Dataset + +Following YODA principles, create a dataset that will contain everything needed for the analysis. + +```bash +# Create the analysis dataset +datalad create -c text2git mriqc-analysis +cd mriqc-analysis +``` + +--- + +## Step 2: Install ReproNim Containers + +```bash +# Install the containers collection as a subdataset +datalad install -d . -s https://github.com/ReproNim/containers code/containers + +# List available images +datalad containers-images -d code/containers +# mriqc/23.1.0 +# mriqc/24.0.0 +# fmriprep/23.2.0 +# ... + +# List available profiles +datalad containers-profiles -d code/containers +# mriqc +# fmriprep +# ... +``` + +--- + +## Step 3: Install Input Data + +```bash +# Install demo BIDS dataset +datalad install -d . -s https://github.com/ReproNim/ds000003-demo sourcedata/raw +``` + +--- + +## Step 4: Create Your Execution Profile + +ReproNim provides a base profile, but we need to customize it for our environment. + +```bash +# Create profiles directory +mkdir -p .datalad/containers/profiles + +# Create our lab-specific profile +cat > .datalad/containers/profiles/mriqc-dartmouth.yaml << 'EOF' +# MRIQC profile for Dartmouth HPC (Discovery cluster) + +extends: code/containers/.datalad/containers/profiles/mriqc.yaml + +# Clobber the exec with our environment-specific settings +exec: >- + apptainer exec + --cleanenv + --bind /scratch/$USER:/scratch + --bind /dartfs-hpc/rc/lab/mylab:/data + {img} + {cmd} +EOF + +# Commit the profile +datalad save -m "Add Dartmouth-specific MRIQC profile" +``` + +**What's happening here:** +- We extend ReproNim's base `mriqc` profile +- We clobber `exec` with our HPC-specific bind mounts +- The `{img}` placeholder will resolve to the OCI image path +- The `{cmd}` placeholder will be replaced with the actual command + +--- + +## Step 5: Ignore Working Directory + +MRIQC needs a working directory for intermediate files: + +```bash +echo "workdir/" > .gitignore +datalad save -m "Ignore workdir" .gitignore +``` + +--- + +## Step 6: Run the Analysis + +```bash +# Run MRIQC using our custom profile +datalad containers-run \ + --profile mriqc-dartmouth \ + --input sourcedata/raw \ + --output . \ + mriqc /data/bids /data/outputs participant group -w /scratch/workdir +``` + +**What happens:** +1. Profile is resolved: `mriqc-dartmouth` → extends `mriqc` → references `mriqc/23.1.0` +2. Image is fetched if needed (OCI layers from Docker Hub) +3. `{img}` expands to `oci:.datalad/containers/images/mriqc/23.1.0/image` +4. `{cmd}` expands to `mriqc /data/bids /data/outputs participant group -w /scratch/workdir` +5. Full command is executed and recorded in provenance + +--- + +## Step 7: Verify Provenance + +```bash +git log --oneline -1 +# abc123 [DATALAD RUNCMD] apptainer exec --cleanenv --bind ... + +git show --quiet +``` + +**Provenance shows the actual command:** +```json +{ + "cmd": "apptainer exec --cleanenv --bind /scratch/$USER:/scratch --bind /dartfs-hpc/rc/lab/mylab:/data oci:.datalad/containers/images/mriqc/23.1.0/image mriqc /data/bids /data/outputs participant group -w /scratch/workdir", + "profile": "mriqc-dartmouth", + "profile-source": ".datalad/containers/profiles/mriqc-dartmouth.yaml", + "inputs": ["sourcedata/raw"], + "outputs": ["."] +} +``` + +No shims. No hidden behavior. The exact command that ran. + +--- + +## Alternative: One-Off Execution + +If you don't want to create a profile, you can override exec directly: + +```bash +datalad containers-run \ + -d code/containers \ + --profile mriqc \ + --exec "apptainer exec --cleanenv --nv {img} {cmd}" \ + --input sourcedata/raw \ + --output . \ + mriqc sourcedata/raw . participant --participant-label 01 +``` + +--- + +## Alternative: GPU-Enabled Run + +Create a GPU profile for compute-intensive jobs: + +```bash +cat > .datalad/containers/profiles/mriqc-gpu.yaml << 'EOF' +extends: code/containers/.datalad/containers/profiles/mriqc.yaml + +exec: >- + apptainer exec + --cleanenv + --nv + --bind /scratch/$USER:/scratch + {img} + {cmd} +EOF + +datalad save -m "Add GPU-enabled MRIQC profile" + +# Use it +datalad containers-run \ + --profile mriqc-gpu \ + --input sourcedata/raw \ + --output . \ + mriqc sourcedata/raw . participant +``` + +--- + +## Alternative: Using Podman Instead + +```bash +cat > .datalad/containers/profiles/mriqc-podman.yaml << 'EOF' +extends: code/containers/.datalad/containers/profiles/mriqc.yaml + +exec: >- + podman run + --rm + --userns=keep-id + -v {pwd}:/work:Z + -w /work + {img} + {cmd} +EOF + +datalad save -m "Add Podman MRIQC profile" + +datalad containers-run \ + --profile mriqc-podman \ + --input sourcedata/raw \ + --output . \ + mriqc /work/sourcedata/raw /work/outputs participant +``` + +--- + +## Step 8: Update to Newer Version + +To use a newer MRIQC version: + +```bash +# Create profile pointing to newer version +cat > .datalad/containers/profiles/mriqc-24.yaml << 'EOF' +extends: code/containers/.datalad/containers/profiles/mriqc.yaml + +# Override image to use version 24 +image: mriqc/24.0.0 + +exec: >- + apptainer exec + --cleanenv + --bind /scratch/$USER:/scratch + {img} + {cmd} +EOF + +datalad save -m "Add MRIQC 24.0.0 profile" + +# Run with new version +datalad containers-run \ + --profile mriqc-24 \ + --input sourcedata/raw \ + --output outputs-v24 \ + mriqc sourcedata/raw outputs-v24 participant +``` + +--- + +## Final Dataset Structure + +``` +mriqc-analysis/ +├── .datalad/ +│ └── containers/ +│ └── profiles/ +│ ├── mriqc-dartmouth.yaml +│ ├── mriqc-gpu.yaml +│ └── mriqc-podman.yaml +├── .gitignore +├── code/ +│ └── containers/ # ReproNim/containers subdataset +│ └── .datalad/containers/ +│ ├── images/ +│ │ └── mriqc/ +│ │ ├── 23.1.0/ +│ │ │ └── image/ +│ │ └── 24.0.0/ +│ │ └── image/ +│ └── profiles/ +│ └── mriqc.yaml # Base profile we extend +├── sourcedata/ +│ └── raw/ # ds000003-demo subdataset +├── outputs/ # MRIQC outputs +└── workdir/ # Ignored working directory +``` + +--- + +## Key Takeaways + +1. **Profiles are explicit** - You see exactly what will run +2. **Profiles are composable** - Extend base profiles with your settings +3. **Clobber semantics** - Your `exec` replaces the parent's entirely +4. **Provenance is complete** - The actual command is recorded, not a shim +5. **Environment-specific** - Each lab/HPC can have its own profile +6. **Version-controlled** - Profiles are committed with your analysis + +--- + +## Troubleshooting + +**"Image not found"** +```bash +# Check if image exists +ls code/containers/.datalad/containers/images/mriqc/ + +# Fetch if needed +datalad get code/containers/.datalad/containers/images/mriqc/23.1.0/ +``` + +**"Permission denied" with Apptainer** +```bash +# Check bind mount permissions +ls -la /scratch/$USER + +# Ensure OCI directory is readable +ls -la code/containers/.datalad/containers/images/mriqc/23.1.0/image/ +``` + +**Profile not found** +```bash +# List available profiles +datalad containers-profiles + +# Check profile syntax +cat .datalad/containers/profiles/mriqc-dartmouth.yaml +``` diff --git a/docs/design/tutorial-mriqc-workflow.md b/docs/design/tutorial-mriqc-workflow.md new file mode 100644 index 0000000..4fdab1e --- /dev/null +++ b/docs/design/tutorial-mriqc-workflow.md @@ -0,0 +1,237 @@ +# Tutorial: MRIQC Quality Control with datalad-container + +This tutorial demonstrates using datalad-container to download a container image directly, create an execution profile from scratch, and run a containerized analysis. + +--- + +## Prerequisites + +```bash +# Install datalad and extensions +pip install datalad datalad-container + +# Install container tools +sudo apt-get install apptainer skopeo +``` + +--- + +## Step 1: Create Your Analysis Dataset + +```bash +datalad create -c text2git mriqc-analysis +cd mriqc-analysis +``` + +--- + +## Step 2: Add the MRIQC Container Image + +Download the container image directly from Docker Hub: + +```bash +# Add the MRIQC image (OCI format) +datalad containers-add mriqc/23.1.0 --url docker://nipreps/mriqc:23.1.0 + +# Verify the image was added +ls .datalad/containers/images/mriqc/23.1.0/ +# image/ (OCI directory structure) + +# Provenance is in the git commit +git log --oneline -1 +# abc123 Add container image mriqc/23.1.0 from docker://nipreps/mriqc:23.1.0 +``` + +--- + +## Step 3: Create an Execution Profile + +Profiles define HOW to run the image. Create one for your environment: + +```bash +mkdir -p .datalad/containers/profiles + +cat > .datalad/containers/profiles/mriqc.yaml << 'EOF' +# Basic MRIQC execution profile +image: mriqc/23.1.0 + +exec: >- + apptainer exec + --cleanenv + {img} + {cmd} +EOF + +datalad save -m "Add MRIQC image and profile" +``` + +**What the placeholders mean:** +- `{img}` - Resolves to `oci:.datalad/containers/images/mriqc/23.1.0/image` +- `{cmd}` - Replaced with your command arguments + +--- + +## Step 4: Install Input Data + +```bash +# Install a demo BIDS dataset +datalad install -d . -s https://github.com/ReproNim/ds000003-demo sourcedata/raw + +# Ignore working directory +echo "workdir/" >> .gitignore +datalad save -m "Add input data, ignore workdir" +``` + +--- + +## Step 5: Run MRIQC + +```bash +datalad containers-run \ + --profile mriqc \ + --input sourcedata/raw \ + --output . \ + mriqc sourcedata/raw outputs participant --participant-label 01 +``` + +**What happens:** +1. Profile `mriqc` is loaded from `.datalad/containers/profiles/mriqc.yaml` +2. `{img}` expands to `oci:.datalad/containers/images/mriqc/23.1.0/image` +3. `{cmd}` expands to `mriqc sourcedata/raw outputs participant --participant-label 01` +4. Full command is executed and recorded in git history + +--- + +## Step 6: Verify Provenance + +```bash +git log --oneline -1 +# abc123 [DATALAD RUNCMD] apptainer exec --cleanenv ... +``` + +The commit message contains the exact command that ran - no hidden shims. + +--- + +## Creating Environment-Specific Profiles + +### HPC Profile with Bind Mounts + +```bash +cat > .datalad/containers/profiles/mriqc-hpc.yaml << 'EOF' +extends: mriqc + +exec: >- + apptainer exec + --cleanenv + --bind /scratch/$USER:/scratch + --bind /data/datasets:/data + {img} + {cmd} +EOF +``` + +### GPU-Enabled Profile + +```bash +cat > .datalad/containers/profiles/mriqc-gpu.yaml << 'EOF' +extends: mriqc + +exec: >- + apptainer exec + --cleanenv + --nv + {img} + {cmd} +EOF +``` + +### Using Podman Instead + +```bash +cat > .datalad/containers/profiles/mriqc-podman.yaml << 'EOF' +image: mriqc/23.1.0 + +exec: >- + podman run + --rm + --userns=keep-id + -v {pwd}:/work:Z + -w /work + {img} + {cmd} +EOF +``` + +Use any profile: +```bash +datalad containers-run --profile mriqc-gpu ... +``` + +--- + +## Adding a New Version + +```bash +# Add version 24.0.0 +datalad containers-add mriqc/24.0.0 --url docker://nipreps/mriqc:24.0.0 + +# Create profile for new version +cat > .datalad/containers/profiles/mriqc-24.yaml << 'EOF' +extends: mriqc +image: mriqc/24.0.0 +EOF + +# Run with new version +datalad containers-run --profile mriqc-24 ... +``` + +--- + +## One-Off Execution Override + +Don't want to create a profile? Override exec directly: + +```bash +datalad containers-run \ + --image mriqc/23.1.0 \ + --exec "apptainer exec --cleanenv --nv {img} {cmd}" \ + --input sourcedata/raw \ + --output . \ + mriqc sourcedata/raw outputs participant +``` + +--- + +## Final Dataset Structure + +``` +mriqc-analysis/ +├── .datalad/ +│ └── containers/ +│ ├── images/ +│ │ └── mriqc/ +│ │ ├── 23.1.0/ +│ │ │ └── image/ # OCI directory +│ │ └── 24.0.0/ +│ │ └── image/ +│ └── profiles/ +│ ├── mriqc.yaml +│ ├── mriqc-hpc.yaml +│ ├── mriqc-gpu.yaml +│ └── mriqc-podman.yaml +├── sourcedata/ +│ └── raw/ # BIDS input +├── outputs/ # MRIQC results +└── workdir/ # Ignored +``` + +--- + +## Key Points + +1. **Images are just artifacts** - Downloaded once, stored in OCI format +2. **Profiles define execution** - How to invoke the container for your environment +3. **Profiles extend profiles** - Inherit and override (clobber semantics) +4. **Provenance is transparent** - Actual command recorded, not a shim +5. **Runtime override** - Use `--exec` for one-off customization From 90d07882a36bbc8c0db582dd92c1f43a6e7b2276 Mon Sep 17 00:00:00 2001 From: Austin Macdonald Date: Wed, 3 Dec 2025 09:18:18 -0600 Subject: [PATCH 02/12] Add Breaking Changes section for Phase 1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Documents: - URL scheme changes (docker:// now OCI, removed dhub://, oci:, shub://) - Storage path changes (.datalad/containers/images///) - Execution config removal (cmdexec not set) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- docs/design/image-container-refactor.md | 47 ++++++++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/docs/design/image-container-refactor.md b/docs/design/image-container-refactor.md index 3c9d62a..a235648 100644 --- a/docs/design/image-container-refactor.md +++ b/docs/design/image-container-refactor.md @@ -392,7 +392,52 @@ datalad containers-run --profile mriqc --image mriqc/24.0.0 ... --- -## 7. Open Questions +## 7. Breaking Changes (Phase 1) + +Phase 1 introduces breaking changes to simplify the URL scheme and remove execution semantics from `containers-add`. + +### URL Scheme Changes + +| Old Scheme | New Behavior | +|------------|--------------| +| `docker://org/repo:tag` | **BREAKING**: Now stores as OCI directory via Skopeo (was: Singularity build to SIF) | +| `quay://org/repo:tag` | **NEW**: Stores as OCI directory via Skopeo | +| `ghcr://org/repo:tag` | **NEW**: Stores as OCI directory via Skopeo | +| `oci:docker://...` | **REMOVED**: Use `docker://` instead | +| `dhub://...` | **REMOVED**: Use `docker://` instead | +| `shub://...` | **REMOVED**: Singularity Hub deprecated | + +**Migration path for `docker://` users who want SIF:** +```bash +# Old way (created SIF directly) +datalad containers-add myimg --url docker://org/repo:tag + +# New way (OCI storage, convert to SIF separately) +datalad containers-add myimg --url docker://org/repo:tag +datalad containers-convert myimg --to sif # Phase 1+ TODO +``` + +### Storage Path Changes + +| Aspect | Old | New | +|--------|-----|-----| +| Default location | `.datalad/environments//image` | `.datalad/containers/images///image/` | +| Version in path | No | Yes (extracted from URL tag, defaults to `latest`) | +| Config storage | `.datalad/config` entries | Same (image path updated) | + +### Execution Configuration Changes + +| Aspect | Old | New (Phase 1) | +|--------|-----|---------------| +| `--call-fmt` parameter | Required or auto-guessed | **REMOVED** (commented out, TODO Phase 4) | +| `cmdexec` config | Set automatically | **NOT SET** | +| Auto-detection | Based on URL scheme | None - images have no execution semantics | + +**Why this matters:** Phase 1 images are "just storage" - they don't know how to run. Users must specify execution explicitly via `datalad run` or wait for Phase 2's `--exec` flag. + +--- + +## 8. Open Questions 1. **Custom registries** - How to specify private/custom OCI registries? From 6067e8a240fb43fd9ee703892e563e8c20390590 Mon Sep 17 00:00:00 2001 From: Austin Macdonald Date: Wed, 3 Dec 2025 09:20:20 -0600 Subject: [PATCH 03/12] Update containers-add interface to use name:version format MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Use colon separator for versioning (like Docker tags) - Document Docker daemon loading and tagging behavior - Show version defaulting from URL tag - Update usage examples throughout 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- docs/design/image-container-refactor.md | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/docs/design/image-container-refactor.md b/docs/design/image-container-refactor.md index a235648..707e7a0 100644 --- a/docs/design/image-container-refactor.md +++ b/docs/design/image-container-refactor.md @@ -114,8 +114,8 @@ shub://org/repo:tag # Singularity Hub (legacy) **Usage:** ```bash -datalad containers-add mriqc/23.1.0 --url docker://nipreps/mriqc:23.1.0 -datalad containers-add samtools/1.9 --url quay://biocontainers/samtools:1.9 +datalad containers-add mriqc:23.1.0 --url docker://nipreps/mriqc:23.1.0 +datalad containers-add samtools:1.9 --url quay://biocontainers/samtools:1.9 ``` #### Storage Format @@ -243,13 +243,26 @@ The `cmd` is what actually ran. The profile reference is informational. ### containers-add ```bash -# Add image from registry -datalad containers-add mriqc/23.1.0 --url docker://nipreps/mriqc:23.1.0 +# Add image from registry (name:version format, like Docker tags) +datalad containers-add mriqc:23.1.0 --url docker://nipreps/mriqc:23.1.0 # Add another version -datalad containers-add mriqc/24.0.0 --url docker://nipreps/mriqc:24.0.0 +datalad containers-add mriqc:24.0.0 --url docker://nipreps/mriqc:24.0.0 + +# Version defaults to URL tag if not specified +datalad containers-add alpine --url docker://alpine:3.18 +# Creates alpine:3.18 + +# Override URL tag with explicit version +datalad containers-add alpine:prod --url docker://alpine:3.18 +# Creates alpine:prod ``` +After `containers-add`, the image is: +- Stored at `.datalad/containers/images///image/` +- Loaded into Docker daemon as `datalad-container/:` +- Ready to use: `docker run --rm datalad-container/mriqc:23.1.0 ...` + ### containers-run CLI args override profile fields. Precedence (highest to lowest): From 9d52e9b516c50e16ad9b75dea48caa31effcc143 Mon Sep 17 00:00:00 2001 From: Austin Macdonald Date: Wed, 3 Dec 2025 09:26:38 -0600 Subject: [PATCH 04/12] Update Phase 1 and Phase 2 examples to match implementation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Use name:version format (colon separator) - {img} expands to Docker image name (datalad-container/name:version) - Show docker run examples instead of apptainer/oci paths 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- docs/design/image-container-refactor.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/docs/design/image-container-refactor.md b/docs/design/image-container-refactor.md index 707e7a0..095e27e 100644 --- a/docs/design/image-container-refactor.md +++ b/docs/design/image-container-refactor.md @@ -337,13 +337,13 @@ The `skopeo` branch already provides: ```bash # Add image -datalad containers-add mriqc/23.1.0 --url docker://nipreps/mriqc:23.1.0 +datalad containers-add mriqc:23.1.0 --url docker://nipreps/mriqc:23.1.0 # Use with datalad run (no containers-run needed) datalad run \ --input .datalad/containers/images/mriqc/23.1.0 \ --output outputs/ \ - "apptainer exec oci:.datalad/containers/images/mriqc/23.1.0/image mriqc ..." + "docker run --rm datalad-container/mriqc:23.1.0 mriqc ..." ``` This alone is valuable - full provenance, no shims, user controls execution. @@ -356,16 +356,17 @@ Rebuild `containers-run` with explicit `--image` and `--exec` flags: ```bash datalad containers-run \ - --image mriqc/23.1.0 \ - --exec "apptainer exec {img} {cmd}" \ + --image mriqc:23.1.0 \ + --exec "docker run --rm {img} {cmd}" \ mriqc /data /output participant ``` **What this provides:** -- `--image` resolves to `.datalad/containers/images///` +- `--image` specifies container name:version +- `{img}` expands to Docker image name (`datalad-container/mriqc:23.1.0`) +- `{cmd}` expands to the command arguments - `--exec` is the execution template (required at this phase) -- `{img}` and `{cmd}` placeholder expansion -- Provenance records the resolved command +- Provenance records the resolved command (not a shim) **What's NOT done yet:** - No profiles From 20cebbc5627347bbc1e4e9dbe7a6c2cb3b7bca4d Mon Sep 17 00:00:00 2001 From: Austin Macdonald Date: Wed, 3 Dec 2025 09:40:34 -0600 Subject: [PATCH 05/12] Add {img_path} placeholder documentation for Phase 2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Document both Docker and Apptainer usage patterns - List all placeholders: {img}, {img_path}, {cmd} - Note support for multiple runtimes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- docs/design/image-container-refactor.md | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/docs/design/image-container-refactor.md b/docs/design/image-container-refactor.md index 095e27e..5d3c110 100644 --- a/docs/design/image-container-refactor.md +++ b/docs/design/image-container-refactor.md @@ -355,17 +355,28 @@ This alone is valuable - full provenance, no shims, user controls execution. Rebuild `containers-run` with explicit `--image` and `--exec` flags: ```bash +# With Docker datalad containers-run \ --image mriqc:23.1.0 \ --exec "docker run --rm {img} {cmd}" \ mriqc /data /output participant + +# With Apptainer/Singularity +datalad containers-run \ + --image mriqc:23.1.0 \ + --exec "apptainer exec oci:{img_path} {cmd}" \ + mriqc /data /output participant ``` +**Placeholders:** +- `{img}` - Docker image name (`datalad-container/mriqc:23.1.0`) +- `{img_path}` - OCI directory path (`.datalad/containers/images/mriqc/23.1.0/image`) +- `{cmd}` - command arguments + **What this provides:** - `--image` specifies container name:version -- `{img}` expands to Docker image name (`datalad-container/mriqc:23.1.0`) -- `{cmd}` expands to the command arguments - `--exec` is the execution template (required at this phase) +- Works with Docker, Apptainer, Singularity, Podman, etc. - Provenance records the resolved command (not a shim) **What's NOT done yet:** From d4f2c498908debf34906c175c7cd839c42fdb94d Mon Sep 17 00:00:00 2001 From: Austin Macdonald Date: Wed, 3 Dec 2025 09:43:04 -0600 Subject: [PATCH 06/12] Fix placeholder documentation to match implementation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - {img} = Docker image name (datalad-container/name:version) - {img_path} = OCI directory path - Update profile examples to use correct placeholders 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- docs/design/image-container-refactor.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/design/image-container-refactor.md b/docs/design/image-container-refactor.md index 5d3c110..62c75dc 100644 --- a/docs/design/image-container-refactor.md +++ b/docs/design/image-container-refactor.md @@ -204,7 +204,8 @@ The path is explicit and unambiguous. #### Placeholder Expansion -- `{img}` - resolves to image path (e.g., `oci:.datalad/containers/images/mriqc/23.1.0/image`) +- `{img}` - Docker image name (`datalad-container/mriqc:23.1.0`) +- `{img_path}` - OCI directory path (`.datalad/containers/images/mriqc/23.1.0/image`) - `{cmd}` - the command arguments #### No Automatic Runtime Detection @@ -212,9 +213,9 @@ The path is explicit and unambiguous. Earlier designs considered inspecting `exec` to auto-detect runtime. **This is explicitly rejected.** Each profile is explicit about its runtime: -- `mriqc-apptainer.yaml` - user writes `apptainer exec oci:{img} {cmd}` +- `mriqc-apptainer.yaml` - user writes `apptainer exec oci:{img_path} {cmd}` - `mriqc-podman.yaml` - user writes `podman run {img} {cmd}` -- `mriqc-docker.yaml` - user writes `docker run {img} {cmd}` +- `mriqc-docker.yaml` - user writes `docker run --rm {img} {cmd}` **Benefits:** - No Python code per runtime From 88489d6347c686e52b9e79c4548c84713c6d9f76 Mon Sep 17 00:00:00 2001 From: Austin Macdonald Date: Wed, 3 Dec 2025 10:15:27 -0600 Subject: [PATCH 07/12] Update Phase 3 status to COMPLETE MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Mark Phase 3 (Execution Profiles) as complete - Update examples to use tested docker-default/apptainer-default profiles - Document implemented features and test commands 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- docs/design/image-container-refactor.md | 32 +++++++++++++++++-------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/docs/design/image-container-refactor.md b/docs/design/image-container-refactor.md index 62c75dc..5ab2331 100644 --- a/docs/design/image-container-refactor.md +++ b/docs/design/image-container-refactor.md @@ -380,34 +380,46 @@ datalad containers-run \ - Works with Docker, Apptainer, Singularity, Podman, etc. - Provenance records the resolved command (not a shim) -**What's NOT done yet:** -- No profiles -- Must specify both `--image` and `--exec` every time - **Milestone:** `containers-run` works without profiles, giving explicit control. --- ### Phase 3: Execution Profiles +**Status:** COMPLETE (2025-12-03) Add YAML execution profile system on top of Phase 2: ```bash # With profile -datalad containers-run --profile mriqc mriqc /data /output participant +datalad containers-run --profile docker-default -- sh -c "echo hello" # Override profile's exec -datalad containers-run --profile mriqc --exec "apptainer exec --nv {img} {cmd}" ... +datalad containers-run --profile docker-default --exec "docker run --rm {img} {cmd}" ... # Override profile's image -datalad containers-run --profile mriqc --image mriqc/24.0.0 ... +datalad containers-run --profile docker-default --image alpine:3.18 ... ``` -**What this provides:** +**Implemented:** - YAML profile files in `.datalad/containers/profiles/` -- Profile inheritance with clobber semantics -- CLI overrides (`--image`, `--exec`) take precedence +- Profile inheritance with clobber semantics via `extends:` key +- CLI overrides (`--image`, `--exec`) take precedence over profile - `containers-profiles` command to list available profiles +- Early validation: error if profile references missing image +- pyyaml dependency added + +**Tested:** +```bash +# docker-default.yaml: image: alpine:latest, exec: docker run --rm ... {img} {cmd} +datalad containers-run --profile docker-default -- sh -c "echo hello" + +# apptainer-default.yaml: image: alpine:latest, exec: apptainer exec oci:{img_path} {cmd} +datalad containers-run --profile apptainer-default -- sh -c "echo hello" + +# docker-myoverrides.yaml: extends: docker-default, exec: ... -e MY_VAR=hello {img} {cmd} +datalad containers-run --profile docker-myoverrides -- sh -c 'echo $MY_VAR' +# outputs: hello +``` **Value for ReproNim/containers:** - Ship base profiles alongside images From e6f96899630407cfc0879797b868c3b5595ebdec Mon Sep 17 00:00:00 2001 From: Austin Macdonald Date: Thu, 4 Dec 2025 08:38:10 -0600 Subject: [PATCH 08/12] Update ReproNim integration doc to match implementation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add "very rough draft" status note - Document available placeholders: {img}, {img_path}, {cmd} - Add runtime-only profiles option (no image, require --image) - Update examples to use name:version format (mriqc:23.1.0) - Fix placeholder usage: oci:{img_path} for apptainer, {img} for docker - Add datalad save commands to workflow examples - Simplify Docker profile example 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- .../image-container-refactor-repronim.md | 123 +++++++++++------- 1 file changed, 77 insertions(+), 46 deletions(-) diff --git a/docs/design/image-container-refactor-repronim.md b/docs/design/image-container-refactor-repronim.md index 302c3bf..d8255b5 100644 --- a/docs/design/image-container-refactor-repronim.md +++ b/docs/design/image-container-refactor-repronim.md @@ -2,6 +2,8 @@ This document describes how ReproNim/containers can leverage the refactored datalad-container architecture. +STATUS: Very rough draft! + --- ## New Capabilities @@ -26,6 +28,10 @@ With native format storage and versioned image directories, ReproNim can provide └── image/ ``` +**Image naming:** Use `name:version` format (like Docker tags): +- `mriqc:23.1.0` → `.datalad/containers/images/mriqc/23.1.0/image/` +- `fmriprep:24.1.0` → `.datalad/containers/images/fmriprep/24.1.0/image/` + ### Benefits of OCI format for ReproNim: - **Layer deduplication** - Many neuroimaging containers share base layers @@ -37,26 +43,51 @@ With native format storage and versioned image directories, ReproNim can provide ## 2. Execution Profiles -ReproNim ships curated base profiles alongside images. Users extend these for their specific needs. +ReproNim can ship curated execution profiles alongside images. Users extend these for their specific needs. + +### Available Placeholders + +- `{img}` - Docker image name (`datalad-container/mriqc:23.1.0`) +- `{img_path}` - OCI directory path (`.datalad/containers/images/mriqc/23.1.0/image`) +- `{cmd}` - command arguments ### ReproNim Base Profiles +**Option A: Profiles with image (ready to use)** + ```yaml # .datalad/containers/profiles/mriqc.yaml # Base MRIQC profile - sane defaults for most users -image: mriqc/23.1.0 -exec: apptainer exec --cleanenv {img} {cmd} +image: mriqc:23.1.0 +exec: apptainer exec --cleanenv oci:{img_path} {cmd} ``` ```yaml # .datalad/containers/profiles/fmriprep.yaml # Base fMRIPrep profile -image: fmriprep/23.2.0 -exec: apptainer exec --cleanenv {img} {cmd} +image: fmriprep:23.2.0 +exec: apptainer exec --cleanenv oci:{img_path} {cmd} ``` +**Option B: Runtime-only profiles (user specifies image)** + +```yaml +# .datalad/containers/profiles/apptainer-default.yaml +# No image - user must provide --image + +exec: apptainer exec oci:{img_path} {cmd} +``` + +```yaml +# .datalad/containers/profiles/docker-default.yaml + +exec: docker run --rm --user $(id -u):$(id -g) -v $(pwd):/work -w /work {img} {cmd} +``` + +Usage: `datalad containers-run --profile apptainer-default --image mriqc:23.1.0 ...` + ### User Extensions Users create their own profiles that extend ReproNim's base: @@ -65,17 +96,17 @@ Users create their own profiles that extend ReproNim's base: # my-analysis/.datalad/containers/profiles/mriqc-mylab.yaml extends: inputs/containers/.datalad/containers/profiles/mriqc.yaml -exec: apptainer exec --cleanenv --nv --bind /scratch:/scratch --bind /data/mylab:/input {img} {cmd} +exec: apptainer exec --cleanenv --nv --bind /scratch:/scratch --bind /data/mylab:/input oci:{img_path} {cmd} ``` ```yaml # my-analysis/.datalad/containers/profiles/mriqc-gpu.yaml extends: inputs/containers/.datalad/containers/profiles/mriqc.yaml -exec: apptainer exec --cleanenv --nv {img} {cmd} +exec: apptainer exec --cleanenv --nv oci:{img_path} {cmd} ``` -**Key point:** ReproNim provides the base. Users clobber `exec` with their environment-specific settings. No runtime-specific profiles (apptainer-gpu, podman-default, etc.) - users know what they need. +**Key point:** ReproNim provides the base. Users clobber `exec` with their environment-specific settings (clobber semantics - child completely replaces parent's exec, no merging). --- @@ -129,12 +160,14 @@ Provenance (source URL, digest, fetch time) is stored in git commits, not separa datalad clone https://github.com/ReproNim/containers inputs/containers # List available images and profiles -datalad containers-images -d inputs/containers +datalad containers-list -d inputs/containers datalad containers-profiles -d inputs/containers -# Run with base profile -datalad containers-run -d inputs/containers --profile mriqc \ - mriqc /bids /outputs participant +# Run with base profile (profile specifies image) +datalad containers-run --profile mriqc -- mriqc /bids /outputs participant + +# Or use runtime-only profile with explicit image +datalad containers-run --profile apptainer-default --image mriqc:23.1.0 -- mriqc /bids /outputs participant ``` ### Creating a Lab-Specific Profile @@ -144,21 +177,27 @@ datalad containers-run -d inputs/containers --profile mriqc \ mkdir -p .datalad/containers/profiles cat > .datalad/containers/profiles/mriqc-mylab.yaml << 'EOF' extends: inputs/containers/.datalad/containers/profiles/mriqc.yaml -exec: apptainer exec --cleanenv --nv --bind /scratch:/scratch --bind /gpfs/mylab:/data {img} {cmd} +exec: apptainer exec --cleanenv --nv --bind /scratch:/scratch --bind /gpfs/mylab:/data oci:{img_path} {cmd} EOF +# Save to dataset +datalad save -m "Add mriqc-mylab profile" .datalad/containers/profiles/ + # Use it -datalad containers-run --profile mriqc-mylab \ - mriqc /data/bids /data/outputs participant +datalad containers-run --profile mriqc-mylab -- mriqc /data/bids /data/outputs participant ``` ### One-Off Override ```bash -# Use base profile but override exec for this run -datalad containers-run -d inputs/containers --profile mriqc \ - --exec "apptainer exec --cleanenv --nv {img} {cmd}" \ - mriqc /bids /outputs participant +# Use base profile but override exec for this run (adds GPU support) +datalad containers-run --profile mriqc \ + --exec "apptainer exec --cleanenv --nv oci:{img_path} {cmd}" \ + -- mriqc /bids /outputs participant + +# Use base profile but override image (use newer version) +datalad containers-run --profile mriqc --image mriqc:24.0.0 \ + -- mriqc /bids /outputs participant ``` ### Using a Specific Version @@ -166,12 +205,12 @@ datalad containers-run -d inputs/containers --profile mriqc \ ```bash # Create profile for newer version cat > .datalad/containers/profiles/mriqc-24.yaml << 'EOF' -image: mriqc/24.0.0 -exec: apptainer exec --cleanenv {img} {cmd} +image: mriqc:24.0.0 +exec: apptainer exec --cleanenv oci:{img_path} {cmd} EOF -datalad containers-run --profile mriqc-24 \ - mriqc /bids /outputs participant +datalad save -m "Add mriqc-24 profile" .datalad/containers/profiles/ +datalad containers-run --profile mriqc-24 -- mriqc /bids /outputs participant ``` --- @@ -242,12 +281,15 @@ exec: >- --contain -H code/containers/binds/HOME -B code/containers/binds/zoneinfo/UTC:/etc/localtime - -B {pwd} - --pwd {pwd} - {img} + oci:{img_path} {cmd} ``` +**Available placeholders:** +- `{img}` - Docker image name (for docker/podman) +- `{img_path}` - OCI directory path (for apptainer with `oci:` prefix) +- `{cmd}` - command arguments + ### Upstream RFE: Advanced Profile Features To fully replace the shim, datalad-container would need these profile extensions: @@ -298,7 +340,7 @@ With these extensions, a profile could fully replace the shim: ```yaml # .datalad/containers/profiles/mriqc-repronim.yaml -image: mriqc/23.1.0 +image: mriqc:23.1.0 pre-run: code/containers/scripts/setup-env.sh post-run: code/containers/scripts/cleanup.sh @@ -315,10 +357,7 @@ exec: >- -B {env.TMPDIR}:/tmp -B {env.TMPDIR}/var:/var/tmp -B code/containers/binds/zoneinfo/UTC:/etc/localtime - -B {pwd} - --pwd {pwd} - -W {env.TMPDIR} - {img} + oci:{img_path} {cmd} ``` @@ -330,35 +369,27 @@ Instead of runtime detection, use a separate profile: # .datalad/containers/profiles/mriqc-repronim-docker.yaml # For non-Linux systems (macOS, Windows with Docker) -image: mriqc/23.1.0 +image: mriqc:23.1.0 pre-run: code/containers/scripts/setup-env-docker.sh env: - SINGULARITYENV_MPLCONFIGDIR: /tmp/mpl-config + MPLCONFIGDIR: /tmp/mpl-config exec: >- docker run - --privileged --rm - -e UID={env.UID} - -e GID={env.GID} - -v {env.TMPDIR}:{env.TMPDIR} - -v {pwd}:{pwd} - -v code/containers/binds/HOME:{env.BHOME} - -w {pwd} - repronim/containers:latest - exec - --cleanenv - -H {env.BHOME} - -B {env.TMPDIR}:/tmp - --pwd {pwd} + --user $(id -u):$(id -g) + -v $(pwd):/work + -w /work {img} {cmd} ``` Users on macOS would use `--profile mriqc-repronim-docker` explicitly. +**Note:** The Docker profile uses `{img}` (the Docker image name like `datalad-container/mriqc:23.1.0`) while apptainer profiles use `{img_path}` (the OCI directory path). + ### Summary: Upstream RFE for datalad-container To enable ReproNim to replace `singularity_cmd` with profiles: From 95d8f27f9316d60bb5f5647f18a65d20a1bccc55 Mon Sep 17 00:00:00 2001 From: Austin Macdonald Date: Thu, 4 Dec 2025 08:43:48 -0600 Subject: [PATCH 09/12] the commits contain expanded cmdexec so cmd is there --- docs/design/image-container-refactor.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/design/image-container-refactor.md b/docs/design/image-container-refactor.md index 5ab2331..0f9c16d 100644 --- a/docs/design/image-container-refactor.md +++ b/docs/design/image-container-refactor.md @@ -49,9 +49,8 @@ cmd: {python} -m datalad_container.adapters.oci run {img} {cmd} This doesn't capture: - Which runtime actually executed - What flags were used -- The actual command that ran -Someone replaying the record doesn't know what really happened. +So we have reproducibility if everything "just works", but if anything goes wrong it is not possible to compare container runtime invocations. ### 2.4 Adding Runtimes Requires Code Changes From cf01a0db1d4013d8441015f61552e5d88a4e4b84 Mon Sep 17 00:00:00 2001 From: Austin Macdonald Date: Thu, 4 Dec 2025 09:01:29 -0600 Subject: [PATCH 10/12] handrevisions --- docs/design/image-container-refactor.md | 27 +++++++++++++++---------- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/docs/design/image-container-refactor.md b/docs/design/image-container-refactor.md index 0f9c16d..39519fc 100644 --- a/docs/design/image-container-refactor.md +++ b/docs/design/image-container-refactor.md @@ -70,7 +70,7 @@ This doesn't scale and adds maintenance burden. **Two concepts only:** 1. **Image** - versioned artifact with provenance, no execution semantics -2. **Execution Profile** (`profile`) - reusable execution recipe that references an image +2. **Execution Profile** (`profile`) - execution recipes that can be reused **Key principles:** @@ -172,23 +172,21 @@ Both formats can coexist; profiles reference the appropriate one. Profiles are YAML files in `.datalad/containers/profiles/`: ```yaml -# .datalad/containers/profiles/mriqc.yaml -image: mriqc/23.1.0 -exec: apptainer exec --cleanenv {img} {cmd} +# .datalad/containers/profiles/alpine-docker.yaml +image: alpine:latest +exec: docker run --rm --user $(id -u):$(id -g) -v $(pwd):/work -w /work {img} {cmd} ``` ```yaml -# .datalad/containers/profiles/mriqc-gpu.yaml -extends: mriqc -exec: apptainer exec --cleanenv --nv {img} {cmd} +# .datalad/containers/profiles/alpine-apptainer.yaml +image: alpine:latest +exec: apptainer exec --cleanenv {img} {cmd} ``` #### Clobber Semantics Child profiles **completely replace** parent values. No magic merging. -If you want parent's flags plus yours, copy them explicitly. This is intentional - you see exactly what will run. - #### Cross-Dataset Extension Profiles can extend profiles from subdatasets: @@ -199,7 +197,14 @@ extends: code/containers/.datalad/containers/profiles/mriqc.yaml exec: apptainer exec --cleanenv --bind /scratch:/scratch {img} {cmd} ``` -The path is explicit and unambiguous. +While this provides only minimal value like this, further development of to make profiles more +composable could make this very flexible and reusable. + +```yaml +# my-analysis/.datalad/containers/profiles/mriqc-local.yaml +extends: code/containers/.datalad/containers/profiles/mriqc.yaml +binds: --bind /my/weird/scratch:/scratch +``` #### Placeholder Expansion @@ -291,7 +296,7 @@ datalad containers-run --image mriqc/23.1.0 --exec "apptainer exec {img} {cmd}" ```bash # List images -datalad containers-images +datalad containers-list # List profiles datalad containers-profiles From 874fbaa04ab8a6467c860b3ed783cac87869b975 Mon Sep 17 00:00:00 2001 From: Austin Macdonald Date: Fri, 5 Dec 2025 10:10:17 -0600 Subject: [PATCH 11/12] reject execution profiles-- too big, too different and backwards incompatible --- docs/design/oci-runtime-refactor.md | 144 ++++++++++++++++++ .../image-container-refactor-repronim.md | 0 .../image-container-refactor.md | 6 +- .../repronim-containers-mriqc-example.md | 0 .../tutorial-mriqc-workflow.md | 0 5 files changed, 147 insertions(+), 3 deletions(-) create mode 100644 docs/design/oci-runtime-refactor.md rename docs/design/{ => rejected-profiles}/image-container-refactor-repronim.md (100%) rename docs/design/{ => rejected-profiles}/image-container-refactor.md (98%) rename docs/design/{ => rejected-profiles}/repronim-containers-mriqc-example.md (100%) rename docs/design/{ => rejected-profiles}/tutorial-mriqc-workflow.md (100%) diff --git a/docs/design/oci-runtime-refactor.md b/docs/design/oci-runtime-refactor.md new file mode 100644 index 0000000..b79bd4d --- /dev/null +++ b/docs/design/oci-runtime-refactor.md @@ -0,0 +1,144 @@ +# OCI Storage Refactor + +This document describes changes to datalad-container's image storage and URL handling. + +Builds on the skopeo OCI image storage PR: https://github.com/datalad/datalad-container/pull/277/ + +## Goals + +1. **Support multiple runtimes** - Docker, Podman, Apptainer, Singularity should all work +2. **Don't reinvent the wheel** - Use existing patterns (cmdexec, shims) rather than new abstractions +3. **More intuitive configuration** - Current configuration path requires too much knowledge of internals; should feel natural to users familiar with containers +4. **Consistent with container conventions** - Follow OCI/Docker naming/tagging conventions users already know +5. **Preserve backwards compatibility** - Existing datasets and workflows continue to work +6. **Remain HPC-friendly** - Continue to work well on HPC systems (Apptainer/Singularity, no root) + +### Non-Goals + +- Runtime auto-detection (users specify what they want) +- Replacing cmdexec with a new execution system +- Breaking existing container configurations +- Improving reproducibility (already good, don't regress) + +### Currently missing: Multiple container runtimes, user choice + + - Currently: the runtime is automatically determined by the --url with confusing "protocols" ie `oci:docker://` + - To change, users have to write their own cmdexec string, understand placeholder syntax + - Should be: Simple flag or obvious configuration + +## Summary + +- `docker://` URLs now store images as OCI directories via Skopeo (not singularity build) +- Image paths include version: `.datalad/environments///image/` + - (allows names to match upstream repository) +- New `--runtime` flag on `containers-add` to select docker/podman/apptainer +- New `runtime` configuration option in `.datalad/config` + +## Completed Changes + +### 1. docker:// Uses OCI Storage + +**Before:** `docker://` triggered `singularity build`, creating a SIF file. + +**After:** `docker://` uses Skopeo to save as OCI directory structure. + +```bash +datalad containers-add alpine --url docker://alpine:3.18 +# Creates: .datalad/environments/alpine/3.18/image/ (OCI directory) +``` + +Benefits: +- Layer deduplication via git-annex +- Registry URLs linked to layers for efficient retrieval +- Works with docker/podman natively, apptainer via `oci:` prefix + +### 2. Versioned Storage Paths + +Versions can be tags or shas. + +**Before:** `.datalad/environments//image` + +**After:** `.datalad/environments///image/` + +Version is extracted from URL: +- `docker://alpine:3.18` → version `3.18` +- `docker://nipreps/mriqc:23.1.0` → version `23.1.0` +- `docker://alpine` → version `latest` +- `docker://alpine@sha256:abc123` → version `sha256_abc123` + +### 3. Execution via OCI Shim + +Images are executed via the OCI adapter shim: + +```ini +# .datalad/config +[datalad "containers.alpine"] + image = .datalad/environments/alpine/3.18/image + cmdexec = {python} -m datalad_container.adapters.oci run {img} {cmd} +``` + +The shim: +1. Loads OCI directory into container runtime (docker/podman) +2. Runs container with appropriate flags +3. Works transparently with `datalad containers-run` + +## Planned Changes + +### 4. Runtime Selection via `--runtime` Flag + +Add `--runtime` flag to `containers-add` for choosing execution runtime: + +```bash +datalad containers-add alpine --url docker://alpine:3.18 --runtime docker # default +datalad containers-add alpine --url docker://alpine:3.18 --runtime podman +datalad containers-add alpine --url docker://alpine:3.18 --runtime apptainer +``` + +**Behavior:** +- Stores runtime preference in config: `datalad.containers..runtime` +- OCI shim reads this config and executes with the appropriate runtime +- If `--runtime apptainer`: also converts OCI → SIF at add-time + +**Config examples:** + +Docker/Podman (uses OCI shim): +```ini +[datalad "containers.alpine"] + image = .datalad/environments/alpine/3.18/image + cmdexec = {python} -m datalad_container.adapters.oci run {img} {cmd} + runtime = docker +``` + +Apptainer (converts to SIF, uses singularity exec directly): +```ini +[datalad "containers.alpine"] + image = .datalad/environments/alpine/3.18/image.sif + cmdexec = singularity exec {img} {cmd} +``` + +### 5. Deprecate dhub:// + +`dhub://` uses docker pull + docker save (tar format). +`docker://` now uses skopeo + OCI directory (better for git-annex). + +Options: +- Deprecate with warning +- Remove entirely +- Make alias to docker:// + +## URL Scheme Summary + +| Scheme | Storage Format | Execution | +|--------|---------------|-----------| +| `docker://` | OCI directory | OCI shim (docker/podman) or apptainer | +| `shub://` | SIF file | singularity exec | + +**Removed/Deprecated:** +- `oci:docker://` - redundant, `docker://` now uses OCI storage +- `dhub://` - deprecated, use `docker://` instead + +## Backward Compatibility + +- Old containers (without version in path) continue to work +- `cmdexec` shim pattern preserved +- No changes to `containers-run` interface diff --git a/docs/design/image-container-refactor-repronim.md b/docs/design/rejected-profiles/image-container-refactor-repronim.md similarity index 100% rename from docs/design/image-container-refactor-repronim.md rename to docs/design/rejected-profiles/image-container-refactor-repronim.md diff --git a/docs/design/image-container-refactor.md b/docs/design/rejected-profiles/image-container-refactor.md similarity index 98% rename from docs/design/image-container-refactor.md rename to docs/design/rejected-profiles/image-container-refactor.md index 39519fc..b5a959b 100644 --- a/docs/design/image-container-refactor.md +++ b/docs/design/rejected-profiles/image-container-refactor.md @@ -283,13 +283,13 @@ datalad containers-run --profile mriqc datalad containers-run --profile mriqc --exec "apptainer exec --nv {img} {cmd}" # Override just image (keep profile's exec) -datalad containers-run --profile mriqc --image mriqc/24.0.0 +datalad containers-run --profile mriqc --image mriqc:24.0.0 # Override both -datalad containers-run --profile mriqc --image mriqc/24.0.0 --exec "..." +datalad containers-run --profile mriqc --image mriqc:24.0.0 --exec "..." # No profile (must specify both) -datalad containers-run --image mriqc/23.1.0 --exec "apptainer exec {img} {cmd}" +datalad containers-run --image mriqc:23.1.0 --exec "apptainer exec {img} {cmd}" ``` ### New Commands diff --git a/docs/design/repronim-containers-mriqc-example.md b/docs/design/rejected-profiles/repronim-containers-mriqc-example.md similarity index 100% rename from docs/design/repronim-containers-mriqc-example.md rename to docs/design/rejected-profiles/repronim-containers-mriqc-example.md diff --git a/docs/design/tutorial-mriqc-workflow.md b/docs/design/rejected-profiles/tutorial-mriqc-workflow.md similarity index 100% rename from docs/design/tutorial-mriqc-workflow.md rename to docs/design/rejected-profiles/tutorial-mriqc-workflow.md From 1661295bd48fa4755cd6181a34be841de370ee10 Mon Sep 17 00:00:00 2001 From: Austin Macdonald Date: Fri, 5 Dec 2025 10:49:30 -0600 Subject: [PATCH 12/12] proposed changes not complete --- docs/design/oci-runtime-refactor.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/docs/design/oci-runtime-refactor.md b/docs/design/oci-runtime-refactor.md index b79bd4d..5625bdd 100644 --- a/docs/design/oci-runtime-refactor.md +++ b/docs/design/oci-runtime-refactor.md @@ -28,13 +28,13 @@ Builds on the skopeo OCI image storage PR: https://github.com/datalad/datalad-co ## Summary -- `docker://` URLs now store images as OCI directories via Skopeo (not singularity build) +- `docker://` URLs store images as OCI directories via Skopeo (not singularity build) - Image paths include version: `.datalad/environments///image/` - (allows names to match upstream repository) - New `--runtime` flag on `containers-add` to select docker/podman/apptainer - New `runtime` configuration option in `.datalad/config` -## Completed Changes +## Proposed Changes ### 1. docker:// Uses OCI Storage @@ -82,8 +82,6 @@ The shim: 2. Runs container with appropriate flags 3. Works transparently with `datalad containers-run` -## Planned Changes - ### 4. Runtime Selection via `--runtime` Flag Add `--runtime` flag to `containers-add` for choosing execution runtime: