Add multithread `cfsctl oci pull` #111

allisonkarlitskaya · 2025-04-21T15:57:03Z

For images with a large number of layers (as is the case for most bootc images) this is a massive win. I used this as a test:

skopeo copy docker://quay.io/fedora/fedora-silverblue:42 oci:$HOME/oci/silverblue-42

and then

cfsctl oci pull oci:$HOME/oci/silverblue-42

And noticed an improvement from ~90s to ~9s.

Unfortunately the situation isn't much improved for images with a small number of layers (as is the case in our own examples/ directory).

allisonkarlitskaya · 2025-04-21T16:15:59Z

Run cargo publish --dry-run
    Updating crates.io index
warning: crate composefs@0.2.0 already exists on crates.io index
error: all dependencies must have a version specified when publishing.
dependency `containers-image-proxy` does not specify a version
Note: The published dependency will use the version from crates.io,
the `git` specification will be removed from the dependency declaration.
Error: Process completed with exit code 101.

hm. tricky. Looks like we might need to get bootc-dev/containers-image-proxy-rs#78 released or find another way?

allisonkarlitskaya · 2025-04-22T07:32:53Z

An earlier version of this PR had a objects: OnceCell<Arc<OwnedFd> field and an internal separate method for ensure_objects_in_dir() that took an Arc<OwnedFd> of the objects directory and did its work inside of that... I wonder if this approach is nicer in general (although it doesn't help solve the weird lifetime issues associated with the stream writer or the oci imageop).

This is somewhere where my lack of experience is definitely showing...

When we already have a layer, we print a message saying as much. In fb1de9f ("src/fsverity: up our FsVerityHashValue trait game") we accidentally changed this to use the debug trait. This would have been correct if we were printing a layer ID, but this is an array, so we see the bytes if we do that. Revert to the previous behaviour. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>

Rename our `Repository::object_dir()` accessor to `.objects_dir()` (the directory is called `"objects"`) and change its implementation to use a OnceCell. ostree uses this "initialize on first use" pattern for some of its directories as well, and I like it. We use the external `once_cell::` crate (which we already had as a -devel dependency) because the .try() version of the API is not yet stable in the standard library. Clean up our ensure_object() implementation to use `.objects_dir()` and generally make things a bit more robust: - we avoid unnecessary mkdir() calls for directories which almost certainly already exist - instead of checking merely for the existence of an object file with the correct name, we actually measure it now - we play around less with joining pathnames (and we can drop a now-unused trait helper method on FsVerityHashValue) Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>

Let's just have an `async fn main()` instead of doing this ourselves. This also gets us access to the multithreaded executor, which we'll start using soon. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>

Our implementations trivially satisfy all of these constraints. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>

Our `SplitStreamWriter` and `oci::ImageOp` structs contain simple references to `Repository` which results in some awkward lifetime rules on those structs. We can simplify things substantially if we lean into ref-counting a bit more. I'm not yet ready to declare that Repository is always refcounted, but for operations involving splitstreams (including oci downloads) it is now required. The ergonomics of this change surprised me. The Deref trait on `Arc<>` and the ability to define `self: &Arc<Self>` methods makes this all quite nice to use. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>

cgwalters · 2025-04-24T14:47:29Z

src/oci/mod.rs

+            layers.sort_by_key(|(mld, ..)| Reverse(mld.size()));
+
+            // Bound the number of tasks to the available parallelism.
+            let threads = available_parallelism()?;


This case is tricky because we're intermixing CPU and I/O work; one problem I've seen in the past is running on large CPU count servers (e.g. 64+) but with more limited I/O bandwidth. In those cases we end up with 64 threads competing pointlessly for more limited I/O.

There's no convenient way to get any estimate for I/O parallelism unfortunately that I know of but in some equivalent places I've capped at an arbitrary number like 4.

It's a bit tricky because we do a fair amount of computation in these workers as well: we compute the full Merkle tree twice (once in userspace, once in the kernel). And some of those threads will be sleeping some of the time, because they're doing fdatasync() or blocked on the download or whatever. I considered doing something like 2 * available_parallelism() in fact.

Different hardware combinations could end up being either CPU or IO bound, but if we use at least the available_parallelism() then at least we stand a decent chance of keeping at least the CPUs busy. If IO throttles, then it'll end up slowing down the CPUs...

cgwalters · 2025-04-24T14:50:54Z

src/oci/mod.rs

+            let mut entries = vec![];
+            for (mld, diff_id) in layers {
+                let self_ = Arc::clone(self);
+                let permit = Arc::clone(&sem).acquire_owned().await?;


Ah yeah using a semaphore for this makes sense.

Ya, I think I'll stick with it here...

cgwalters · 2025-04-24T14:51:59Z

src/oci/mod.rs

+                let permit = Arc::clone(&sem).acquire_owned().await?;
+                let layer_sha256 = sha256_from_digest(diff_id)?;
+                let descriptor = mld.clone();
+                let future = tokio::spawn(async move {


See https://github.com/bootc-dev/bootc/blob/2de8e0d23fb89bed76722ff1466614afacec64b3/lib/src/fsck.rs#L195 which uses a JoinSet which is designed for this, especially that it enforces structured concurrency.

Thanks for the pointer. I used this for the HTTP downloader.

cgwalters · 2025-04-24T14:54:14Z

Cargo.toml

 env_logger = "0.11.0"
 hex = "0.4.0"
 indicatif = { version = "0.17.0", features = ["tokio"] }
 log = "0.4.8"
 oci-spec = "0.7.0"
+once_cell = { version = "1.21.3", default-features = false }


This change looks spurious? Also there's no reason to use the external crate since the functionality got merged into std.

I mentioned that in the commit message: f286208

We use the external once_cell:: crate (which we already had as a -devel dependency) because the .try() version of the API is not yet stable in the standard library.

allisonkarlitskaya · 2025-04-28T07:55:27Z

The only thing blocking this in my opinion is that it would be nice to have a new containers-image-proxy-rs release. Also: I might drop the await on the driver as we discussed in bootc-dev/containers-image-proxy-rs#80 (comment)

Add an async version of `Repository::ensure_object()` and wire it through `SplitStreamWriter::write_external()`. Call that when we're splitting OCI layer tarballs to offload the writing of external objects (and the `fdatasync()` that goes with it) to a separate thread. This is something like some prep work for something we've been trying to accomplish for a while in containers#62 but it doesn't come close to the complete picture (since it still writes the objects sequentially). Modify the (already) async code in oci::ImageOp to download layers in parallel. This is a big deal for images with many layers (as is often the case for bootc images, due to the splitting heuristics). This takes a pull of the Fedora Silverblue 42 container image (when pulled from a local `oci-dir`) from ~90s to ~8.5s time to complete on my laptop. Unfortunately, container images made from large single layers are not substantially improved. In order to make this change we need to depend on a new version of containers-image-proxy-rs which makes ImageProxy: Send + Sync, so bump our required version to the one released today. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>

This can cause the proxy's control channel to experience lockups in heavily-threaded situations and we don't gain any benefit from doing it. Just drop it. See bootc-dev/containers-image-proxy-rs#80 Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>

allisonkarlitskaya requested review from jeckersb and Johan-Liebert1 April 21, 2025 15:57

allisonkarlitskaya mentioned this pull request Apr 21, 2025

Multithreaded SplitStream creation #95

Closed

allisonkarlitskaya force-pushed the object-fd branch from ad564a2 to 23ea634 Compare April 22, 2025 08:51

allisonkarlitskaya force-pushed the object-fd branch from 23ea634 to 48b962b Compare April 22, 2025 08:51

allisonkarlitskaya added 4 commits April 22, 2025 11:00

cfsctl: use #[tokio::main]

64269f6

Let's just have an `async fn main()` instead of doing this ourselves. This also gets us access to the multithreaded executor, which we'll start using soon. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>

fsverity: mark FsVerityHashValue: Send + Sync + 'static

ba2ba48

Our implementations trivially satisfy all of these constraints. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>

allisonkarlitskaya force-pushed the object-fd branch from 48b962b to 09999c1 Compare April 22, 2025 09:00

allisonkarlitskaya marked this pull request as ready for review April 22, 2025 09:22

allisonkarlitskaya mentioned this pull request Apr 22, 2025

Awaiting driver -> FinishPipe blocks further operations bootc-dev/containers-image-proxy-rs#80

Closed

cgwalters reviewed Apr 24, 2025

View reviewed changes

allisonkarlitskaya added 2 commits April 28, 2025 15:45

allisonkarlitskaya force-pushed the object-fd branch from 09999c1 to 4311a3d Compare April 28, 2025 13:51

allisonkarlitskaya merged commit e4e8c28 into containers:main Apr 29, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add multithread `cfsctl oci pull` #111

Add multithread `cfsctl oci pull` #111

Uh oh!

allisonkarlitskaya commented Apr 21, 2025 •

edited

Loading

Uh oh!

allisonkarlitskaya commented Apr 21, 2025

Uh oh!

allisonkarlitskaya commented Apr 22, 2025 •

edited

Loading

Uh oh!

cgwalters Apr 24, 2025

Uh oh!

allisonkarlitskaya Apr 24, 2025

Uh oh!

cgwalters Apr 24, 2025

Uh oh!

allisonkarlitskaya Apr 28, 2025

Uh oh!

cgwalters Apr 24, 2025

Uh oh!

allisonkarlitskaya Apr 28, 2025

Uh oh!

cgwalters Apr 24, 2025

Uh oh!

allisonkarlitskaya Apr 24, 2025

Uh oh!

allisonkarlitskaya commented Apr 28, 2025

Uh oh!

Uh oh!

Uh oh!

Add multithread cfsctl oci pull #111

Add multithread cfsctl oci pull #111

Uh oh!

Conversation

allisonkarlitskaya commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allisonkarlitskaya commented Apr 21, 2025

Uh oh!

allisonkarlitskaya commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

allisonkarlitskaya commented Apr 28, 2025

Uh oh!

Uh oh!

Uh oh!

Add multithread `cfsctl oci pull` #111

Add multithread `cfsctl oci pull` #111

allisonkarlitskaya commented Apr 21, 2025 •

edited

Loading

allisonkarlitskaya commented Apr 22, 2025 •

edited

Loading