Skip to content

Conversation

@troglobit
Copy link
Contributor

@troglobit troglobit commented Sep 9, 2025

Description

This PR introduces significant improvements to container lifecycle management, focusing on optimization, safety, and ease of upgrades. A long-missing container upgrade test has also been added to verify both the functionality and that old images are removed. Not just any old downloaded OCI archives but also old/unused images loaded into the container store.

Key Features

Container Lifecycle Optimization

Container Removal and Cleanup

Container Upgrade Support

  • New upgrade RPC action for upgrading containers with mutable tags (:latest, :edge)
  • Works with both registry images and local/remote OCI archives
  • Upgrade command fixes: admin@example:/> container upgrade NAME

Documentation

  • New section explaining mutable vs immutable image tags
  • Best practices for production use (prefer immutable tags)
  • Complete rewrite of container upgrade section
  • Accurate description of container lifecycle behavior
  • Volume management documentation updates

Test

  • Removed redundant container_ prefix from test directories
  • Added V=1 support for make test-spec to enable debug mode
  • New container upgrade test case, issue
    test: verify container upgrade #624

Technical Details

Checklist

Tick relevant boxes, this PR is-a or has-a:

  • Bugfix
    • Regression tests
    • ChangeLog updates (for next release)
  • Feature
    • YANG model change => revision updated?
    • Regression tests added?
    • ChangeLog updates (for next release)
    • Documentation added?
  • Test changes
    • Checked in changed Readme.adoc (make test-spec)
    • Added new test to group Readme.adoc and yaml file
  • Code style update (formatting, renaming)
  • Refactoring (please detail in commit messages)
  • Build related changes
  • Documentation content changes
    • ChangeLog updated (for major changes)
  • Other (please describe):

@troglobit troglobit added the ci:main Build default defconfig, not minimal label Sep 9, 2025
@troglobit troglobit force-pushed the recontain branch 2 times, most recently from e6be413 to a2cf0bc Compare October 13, 2025 03:12
@troglobit troglobit force-pushed the recontain branch 2 times, most recently from 3467ea9 to 9a6b5c1 Compare October 14, 2025 10:00
@troglobit troglobit changed the title Refactor container cleanup Container Lifecycle and Upgrade Improvements Oct 14, 2025
@troglobit troglobit marked this pull request as ready for review October 14, 2025 10:16
@troglobit troglobit requested review from mattiaswal and wkz October 14, 2025 11:42
Copy link
Contributor

@rical rical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

I left some comments. Mostly cosmetic code stuff and some theoretical security issues. Nothing that you really need to address if you don't want to.

Also. I don't understand why the indentation is totally broken in the container script. Most lines contains a mix of tabs and spaces. It's hard to read on GH and my editor is contemplating kill -9 $$ ;)

@troglobit
Copy link
Contributor Author

Nice work!

Thanks, awesome review!

I left some comments. Mostly cosmetic code stuff and some theoretical security issues. Nothing that you really need to address if you don't want to.

I've addressed most of them and pushed a separate commit for easy review.

The branch also includes a few new commits to fix issues in the 'copy' command helper for the shell and CLI. Stuff we forgot about for a year almost. Would be great if you could take a look at that too.

Also. I don't understand why the indentation is totally broken in the container script. Most lines contains a mix of tabs and spaces. It's hard to read on GH and my editor is contemplating kill -9 $$ ;)

Well, Emacs default indentation for shell scripts, as most of us use, have a four space base indent, but replace leading 8-char spaces with tabs. I think one of the main reasons for your review issues is that GitHub has changed (!) the tab width since 2021 from 8 to 4 bytes, but this is possible to change in your https://github.com/settings/appearance settings

indent-size

@rical
Copy link
Contributor

rical commented Oct 21, 2025

I've addressed most of them and pushed a separate commit for easy review.

Great.

The branch also includes a few new commits to fix issues in the 'copy' command helper for the shell and CLI. Stuff we forgot about for a year almost. Would be great if you could take a look at that too.

Sure thing!

Also. I don't understand why the indentation is totally broken in the container script. Most lines contains a mix of tabs and spaces. It's hard to read on GH and my editor is contemplating kill -9 $$ ;)

Well, Emacs default indentation for shell scripts, as most of us use, have a four space base indent, but replace leading 8-char spaces with tabs. I think one of the main reasons for your review issues is that GitHub has changed (!) the tab width since 2021 from 8 to 4 bytes, but this is possible to change in your https://github.com/settings/appearance settings

I strongly disagree towards mixing tabs and spaces for indentation. Personally I prefer spaces, which most documentation suggest (google et al). But growing up in #bash on IRC getting yelled at by GreyCat, I can live with tabs too.

If you look at the file in the current state you will see that there's a mix of everything. Some lines are pure space (8+), some are pure tabs and some are a mix of tabs and spaces.

@troglobit
Copy link
Contributor Author

Also. I don't understand why the indentation is totally broken in the container script. Most lines contains a mix of tabs and spaces. It's hard to read on GH and my editor is contemplating kill -9 $$ ;)
Well, Emacs default indentation for shell scripts, as most of us use, have a four space base indent, but replace leading 8-char spaces with tabs. I think one of the main reasons for your review issues is that GitHub has changed (!) the tab width since 2021 from 8 to 4 bytes, but this is possible to change in your https://github.com/settings/appearance settings
I strongly disagree towards mixing tabs and spaces for indentation. Personally I prefer spaces, which most documentation suggest (google et al). But growing up in #bash on IRC getting yelled at by GreyCat, I can live with tabs too.

I'm sorry this turned into a tabs-vs-spaces discussion, that was not my intention. I was just stating a fact that most of our scripts have this mix of tabs vs spaces because of us Emacs users.

If you look at the file in the current state you will see that there's a mix of everything. Some lines are pure space (8+), some are pure tabs and some are a mix of tabs and spaces.

Yes, and I'd rather not reformat the entire file at this point because that would make git blame a lot more difficult.

@mattiaswal
Copy link
Contributor

Great work! Just have a small question.

@troglobit troglobit requested a review from mattiaswal October 23, 2025 12:22
Fixes #1146

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
This rectifies an omission from the initial yang model.  Not all
charachters are supported in container and volume names.  E.g.,
simply attempting to create a volume or container with a space
in the name causes this error message from podman:

 podman: Error: running volume create option: names must match [a-zA-Z0-9][a-zA-Z0-9_.-]*: invalid argument

In addition to the regexp, the new 'ident' type also enforces a
minimum and maximum length.  Sure, technically a single char is
allowed, but let's be reasonable, and who in their right mind
wants an identifier > 64 chars?  We have description for that.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Not only great for debugging, but also allows users to start their
containers manually in another way.  But yeah, mostly for debug.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
The default timeout for 'podman stop foo' is 10 seconds, which for
heavily loaded systems with intricate shutdown process is *waaaay*
too short.  Increase it to the container script default 30s, which
coincidentally is also the container@.conf template's kill delay.

Fixes #1149

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Fixes #1148

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
This commit adds metadata to track loaded OCI archives to allow skipping
'delete + load' of OCI images when restarting either the container or the
system as a whole.  The sha256 of all loaded OCI archives is stored in a
sidecar file in our downloads directory.  Then we verify the checksum of
the OCI archives against their same-named sidecar to determine if the OCI
archive is already loaded or not.

Additionally, the instance using the image is labled with metadata to detect
changes in the container configuration.  This in turn allow skipping the
delete + create phase also of the instance.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
This commit reverts 477f7ae and bb19d06, which intended to fix an issue
with lingering old images, see #1098.  However, as detailed in #1147,
this caused severe side effects while working with multiple larger
containers.  Basically, the prune operation of one container removed
images of other containers that are just being created in parallel.

Instead of using the podman prune command we can use the meta datain the
start script to pinpoint exactly which image(s) to remove, including any
downloaded OCI archives when the container instance is removed.

Fixes #1147

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
When loading big OCI images at boot the `podman load` process completely
monopolizes all cores of an Arm Cortex-A72.  It blocks on I/O, sure, but
with 'nice' we can get some attention at least to more critical services
at boot.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Usually, when your system is up and running properly, you want to clean
up anything unused from your previous experiments.  This change alllows
that by calling the interactive 'podman image prune -a -f' command from
the CLI command 'container remove all'

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Container instances that run with mutable images, e.g., tagged with `:latest`
or similar non-versioned tags, can be upgraded without changing the config.

This commit fixes two issues found with this support:

- force container image re-fetch on upgrade, even if the file exists locally
- surgically remove old image from container store after upgrade

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
One unique feature of Infix OS is that you can embed your OCI archive in
the rootfs.squashfs.  This way your container instance does not need to
download anything from the network.  To upgrade you drop in a new image
and rebuild Infix, when the new Infix boots the container script 'setup'
command will recognize that the OCI archive has changed and will reload
it into the container store.

This patch is an improvement of the way too generic 'podman image prune'
command used previously.  Instead of looking for any dangling image, we
now surgically remove the old image.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Shell script:

 - Factor out big portions of code into more logical helper functions
 - Simplify calling setup script by checking for remote image first
 - Simplify meta/sha up-to-date handling and clarify terminology
 - Consistent use of -f instead of -e in file-exists checks
 - Fix unsafe use of 'mktemp -u'

C code:
 - Clarify meta/sha terminology: rename meta-sha256 -> meta-image-sha256
 - Refactor weird archive_offset() function to local_path() helper
 - Factor out helper function calc_sha()
 - Check len of sha256 >= 64

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Already supported in the CLI.  This makes it official, and quite handy
for users that run mutable containers.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
This commit adds four (small) container images to the Infamy test container
which are used in the new container upgrade test.  The test verifies that a
mutable container can be upgraded and that old images are properly cleaned
up from the container store.

Fixes #624

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Rename test directories in infix_containers/ to remove the redundant
'container_' prefix since they already live under infix_containers/:

  container_basic          -> basic
  container_bridge         -> bridge
  container_enabled        -> enabled
  container_environment    -> environment
  container_firewall_basic -> firewall_basic
  container_host_commands  -> host_commands
  container_phys           -> phys
  container_veth           -> veth
  container_volume         -> volume

Also update references in all.yaml and Readme.adoc files.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
When copying to the running datastore we cannot use sr_copy_config(),
instead we must use sr_replace_config().  This fix covers both the case
of 'copy startup-config running-config' and 'copy FILE running-config'.

Fixes #1203

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
This commit adds config file validation to the copy command, discussed
in #373.  Allowing users to test their config files before restoring a
backup.  The feature could also be used for the automatic rollback when
downgrading to an earlier version of the OS.

Fixes #373

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Fixes #981

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
This is a follow-up to PR #717 where path traversal protection was
discussed.  A year later and it's clear that having a user-friendly
copy tool in the shell is a good thing, but that we proably want to
restrict what it can do when called from the CLI.

A sanitize flag (-s) is added to control the behavior, when used in the
shell without -s, both commands act like traditional UNIX tools and do
assume . for relative paths, and allow ../, whereas when running from
the CLI only /media/ is allowed and otherwise files are assumed to be
in $HOME or /cfg

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
The regular file-to-file copy, was missing calls to cfg_adjust(), this
commit fixes that and adds some helpful comments for each use-case.

Also, drop insecure mktemp() in favor of our own version which uses the
basename of the remote source file.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Add bash completion for the common datastores, like we already do in the
CLI, and update the usage text accordingly.

Also, make sure to install to /usr/bin, not /bin since we've now merged
the hierarchies since a while back.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Since there is no safe way (unique hash/id) to identify unused named
volumes, we cannot automate removal of them.  Since volume data, when
compared to a container image, is quite small, it was decided that we
document this instead.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Much of the content was made obsolete by recent changes, we now optimize
the experience for users, avoiding recreate at boot unless checksums for
configuration or base image have changed.

This was also a good time to explain the difference between mutable and
immutable tags, which sometimes is a cause for great confusion.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
@troglobit troglobit merged commit 18d1773 into main Oct 23, 2025
7 checks passed
@troglobit troglobit deleted the recontain branch October 23, 2025 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:main Build default defconfig, not minimal

Projects

None yet

4 participants