Skip to content

Comments

feat: add registry role for disconnected deployment#866

Open
fabiendupont wants to merge 1 commit intoseapath:mainfrom
fabiendupont:feat/add-registry-role
Open

feat: add registry role for disconnected deployment#866
fabiendupont wants to merge 1 commit intoseapath:mainfrom
fabiendupont:feat/add-registry-role

Conversation

@fabiendupont
Copy link

In the current implementation, every node installs a registry locally and pull/push the cephadm image. However, this is neither truly disconnected as pull requires internet, nor resource efficient as a single registry is enough.

This commit introduces a registry role that deploys docker.io/registry:v2 and allows importing images from internet (pull) or from an exported tarball (load). The seapath_setup_disconnected.yaml playbook installs the registry on the Ansible control node as a singleton.

TLS is enabled by default: the registry auto-generates a self-signed CA and server certificate when no user-provided certs are given. The CA is distributed to all cluster nodes so they trust the registry over HTTPS. The registry listens on port 443 to avoid specifying the port in image names.

The *_physical_machine roles are updated to use that registry as a mirror, which doesn't require changing the images names, both for Docker and Podman. They install the registry CA certificate in certs.d and set insecure = false when TLS is enabled.

The cephadm role is updated to remove image management, which is now handled by the registry role, so cephadm is focused on Ceph cluster management.

Contributes to #442

In the current implementation, every node installs a registry locally
and pull/push the cephadm image. However, this is neither truly
disconnected as pull requires internet, nor resource efficient as a
single registry is enough.

This commit introduces a registry role that deploys
docker.io/registry:v2 and allows importing images from internet (pull)
or from an exported tarball (load). The
seapath_setup_disconnected.yaml playbook installs the registry on the
Ansible control node as a singleton.

TLS is enabled by default: the registry auto-generates a self-signed CA
and server certificate when no user-provided certs are given. The CA is
distributed to all cluster nodes so they trust the registry over HTTPS.
The registry listens on port 443 to avoid specifying the port in image
names.

The *_physical_machine roles are updated to use that registry as a
mirror, which doesn't require changing the images names, both for Docker
and Podman. They install the registry CA certificate in certs.d and set
insecure = false when TLS is enabled.

The cephadm role is updated to remove image management, which is now
handled by the registry role, so cephadm is focused on Ceph cluster
management.

Contributes to seapath#442

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
@fabiendupont fabiendupont force-pushed the feat/add-registry-role branch from 6f951dc to 5799123 Compare February 19, 2026 14:48
@insatomcat
Copy link
Member

insatomcat commented Feb 21, 2026

Thanks for the PR, this is an interesting and well-structured proposal 👍

A few points I’d like to clarify and discuss.


1️⃣ Fully disconnected is already possible in the current setup

In the current implementation, it is possible to be fully disconnected, provided that images are made available at OS installation time.

For example, with build_debian_iso on Debian:

  • When the ISO is built (with internet access), required container images are loaded into the ISO.
  • During installation (without internet), those images are deployed locally.
  • No external pull is required afterward at the OS level.

I assume a similar approach is feasible for:

  • Red Hat Enterprise Linux–like distributions (at ISO/image build stage),
  • or Yocto-based images (embedding container images at image generation time).

So strictly speaking, the setup is not inherently “internet-dependent” if the images are preloaded properly.


2️⃣ The real issue: cephadm’s pull behavior

The actual difficulty is not the base OS installation, but the behavior of cephadm.

Even if images are already present locally:

  • The bootstrap command allows skipping certain pulls.
  • However, later lifecycle events (starting osd, mon, mgr, etc.) still trigger a podman pull check.

see https://marc.info/?l=ceph-users&m=164399318917018

To be truly disconnected, we therefore need:

  • Either a local registry on each node (current setup),
  • Or a central registry (as proposed in the PR).

Before deciding on registry topology, I would really like to confirm something:

Is there absolutely no way to completely skip the podman pull check that cephadm performs when launching components?

If such an option exists (or could exist), we could:

  • Preload all images at OS installation time (as done with build_debian_iso),
  • Avoid any registry entirely,
  • And remain fully disconnected without additional infrastructure.

Right now, the registry requirement seems to stem from cephadm enforcing the pull validation step.

If you have more information on whether this behavior is configurable or patchable, that would be very helpful.


3️⃣ Registry location: node-local vs controller-based

Regarding the architectural choice:

  • Current approach: registry on each node.
  • PR proposal: single registry on the Ansible controller.

Both are technically valid trade-offs:

  • Node-local registry → more autonomous nodes, no central dependency.
  • Controller-based registry → simpler, more resource-efficient, centralized management.

From my perspective, either:

  • The PR supports both models and lets the user choose,
  • Or we align on a community-level decision about the preferred architecture.

But I think we should make that decision explicitly rather than implicitly switching models.


Summary

  • Fully disconnected installs are already achievable if images are embedded at OS build time.
  • The real blocker is cephadm’s pull behavior.
  • If we could completely disable pull checks, we might not need a registry at all.
  • Otherwise, we need to consciously decide between distributed vs centralized registry architecture (or support both).

Looking forward to your feedback, especially regarding cephadm’s pull enforcement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants