Skip to content

Config and Arbitrary Development Environments #47

@asmacdo

Description

@asmacdo

This isn't a firm proposal — just consolidating the discussions we've had across several issues and PRs, along with some of my thinking on where this could go. Opening this to get everyone's input in one place and encourage more!

Next steps:

  1. Discuss here — poke holes, raise concerns, add ideas
  2. Generally agree on the shape of the approach
  3. Write a design document (PR) for sharper, per-line discussion

Context

yolo needs to create arbitrary, persistent development environments for each project.
Today, every time someone needs a tool in the image, we hit the same debate: add it to the Dockerfile? Make it an --extras flag? A separate image?

This has come up repeatedly:

The --extras pattern was a good stopgap, but we can't encode install instructions for every tool every user might want. Meanwhile, yolo is fully capable of constructing environments ephemerally, but ephemeral environments aren't ideal for development — they need to be reconstructed every time.

Target audience

Our primary users are scientists, not software engineers.
Most will never write a Dockerfile and shouldn't have to.
Whatever we design, the common case needs to be as simple as adding a package name to a config file.

Discussion: How should environment customization work?

Some directions that have come up in prior discussions, consolidated here.

Pre-built base images

Publish a base image to a registry so yolo works out of the box with no build step.
What goes in the base? Just the minimum, or opinionated with group tools like datalad?

Config-driven packages

Let users list packages in config files (apt, pip, etc.) without writing a Dockerfile:

# in .git/yolo/config or ~/.config/yolo/config
YOLO_APT_PACKAGES=(ffmpeg imagemagick)
YOLO_PIP_PACKAGES=(datalad)

This could be the primary customization path for most users — a scientist who needs ffmpeg just adds it to their project config.

Custom Dockerfiles for power users

For anything that needs custom install steps, users could provide their own Dockerfile (using our base as FROM or not).
This would live outside our repo.

yolo as the single entrypoint

Currently setup-yolo.sh handles building and yolo handles running.
Should yolo handle both — pulling/building images as needed? With a base image in a registry, this would mean yolo works immediately after install.

Config precedence

Build-time config (image name, packages, Dockerfile path, registry) could follow the same precedence as existing runtime config:

CLI args > project config > user-wide config > defaults

Build behavior

Build on first run if image doesn't exist.
--rebuild to force.
Auto-detection of config changes could come later.

Alternative approaches

Two layers only: base image + custom Dockerfile

This is what Gitpod and Codespaces do — provide a base image, let users write a Dockerfile for customization. Simpler to implement and reason about. However, the gap between "use the base" and "write a Dockerfile" is too wide for our audience. A scientist who just needs ffmpeg shouldn't have to learn Docker to get it.
We're leaning away from this toward a config-driven middle path because that's where most of our potential users would actually be comfortable.

Other prior art

  • devcontainer features — composable install scripts with metadata. Well-specified but heavyweight; requires authoring feature scripts with a specific structure.
  • Nix / devenv — declarative, reproducible. Elegant but steep learning curve.
  • Docker official image variants — tag-based (python:3.12-slim). No composition, just pick one.

Open questions

  • CLI rewrite? Bash is hitting its limits for config parsing, registry logic, and the complexity ahead. Python? How much rewrite vs. incremental?
  • Registry? GHCR, Docker Hub, Quay, multiple?
  • Base image contents? Minimal vs. opinionated?
  • Alternative runtimes (add singularity/apptainer as a possible containerization tech #33) — Singularity/Apptainer is a related concern; good architecture now would make it easier later.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions