-
Notifications
You must be signed in to change notification settings - Fork 14
Add multithread cfsctl oci pull
#111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multithread cfsctl oci pull
#111
Conversation
hm. tricky. Looks like we might need to get bootc-dev/containers-image-proxy-rs#78 released or find another way? |
An earlier version of this PR had a This is somewhere where my lack of experience is definitely showing... |
ad564a2
to
23ea634
Compare
When we already have a layer, we print a message saying as much. In fb1de9f ("src/fsverity: up our FsVerityHashValue trait game") we accidentally changed this to use the debug trait. This would have been correct if we were printing a layer ID, but this is an array, so we see the bytes if we do that. Revert to the previous behaviour. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
23ea634
to
48b962b
Compare
Rename our `Repository::object_dir()` accessor to `.objects_dir()` (the directory is called `"objects"`) and change its implementation to use a OnceCell. ostree uses this "initialize on first use" pattern for some of its directories as well, and I like it. We use the external `once_cell::` crate (which we already had as a -devel dependency) because the .try() version of the API is not yet stable in the standard library. Clean up our ensure_object() implementation to use `.objects_dir()` and generally make things a bit more robust: - we avoid unnecessary mkdir() calls for directories which almost certainly already exist - instead of checking merely for the existence of an object file with the correct name, we actually measure it now - we play around less with joining pathnames (and we can drop a now-unused trait helper method on FsVerityHashValue) Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Let's just have an `async fn main()` instead of doing this ourselves. This also gets us access to the multithreaded executor, which we'll start using soon. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Our implementations trivially satisfy all of these constraints. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Our `SplitStreamWriter` and `oci::ImageOp` structs contain simple references to `Repository` which results in some awkward lifetime rules on those structs. We can simplify things substantially if we lean into ref-counting a bit more. I'm not yet ready to declare that Repository is always refcounted, but for operations involving splitstreams (including oci downloads) it is now required. The ergonomics of this change surprised me. The Deref trait on `Arc<>` and the ability to define `self: &Arc<Self>` methods makes this all quite nice to use. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
48b962b
to
09999c1
Compare
layers.sort_by_key(|(mld, ..)| Reverse(mld.size())); | ||
|
||
// Bound the number of tasks to the available parallelism. | ||
let threads = available_parallelism()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is tricky because we're intermixing CPU and I/O work; one problem I've seen in the past is running on large CPU count servers (e.g. 64+) but with more limited I/O bandwidth. In those cases we end up with 64 threads competing pointlessly for more limited I/O.
There's no convenient way to get any estimate for I/O parallelism unfortunately that I know of but in some equivalent places I've capped at an arbitrary number like 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit tricky because we do a fair amount of computation in these workers as well: we compute the full Merkle tree twice (once in userspace, once in the kernel). And some of those threads will be sleeping some of the time, because they're doing fdatasync()
or blocked on the download or whatever. I considered doing something like 2 * available_parallelism()
in fact.
Different hardware combinations could end up being either CPU or IO bound, but if we use at least the available_parallelism()
then at least we stand a decent chance of keeping at least the CPUs busy. If IO throttles, then it'll end up slowing down the CPUs...
let mut entries = vec![]; | ||
for (mld, diff_id) in layers { | ||
let self_ = Arc::clone(self); | ||
let permit = Arc::clone(&sem).acquire_owned().await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah using a semaphore for this makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, I think I'll stick with it here...
let permit = Arc::clone(&sem).acquire_owned().await?; | ||
let layer_sha256 = sha256_from_digest(diff_id)?; | ||
let descriptor = mld.clone(); | ||
let future = tokio::spawn(async move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://github.com/bootc-dev/bootc/blob/2de8e0d23fb89bed76722ff1466614afacec64b3/lib/src/fsck.rs#L195 which uses a JoinSet which is designed for this, especially that it enforces structured concurrency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer. I used this for the HTTP downloader.
env_logger = "0.11.0" | ||
hex = "0.4.0" | ||
indicatif = { version = "0.17.0", features = ["tokio"] } | ||
log = "0.4.8" | ||
oci-spec = "0.7.0" | ||
once_cell = { version = "1.21.3", default-features = false } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks spurious? Also there's no reason to use the external crate since the functionality got merged into std.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned that in the commit message: f286208
We use the external
once_cell::
crate (which we already had as a -devel dependency) because the .try() version of the API is not yet stable in the standard library.
The only thing blocking this in my opinion is that it would be nice to have a new containers-image-proxy-rs release. Also: I might drop the await on the driver as we discussed in bootc-dev/containers-image-proxy-rs#80 (comment) |
Add an async version of `Repository::ensure_object()` and wire it through `SplitStreamWriter::write_external()`. Call that when we're splitting OCI layer tarballs to offload the writing of external objects (and the `fdatasync()` that goes with it) to a separate thread. This is something like some prep work for something we've been trying to accomplish for a while in containers#62 but it doesn't come close to the complete picture (since it still writes the objects sequentially). Modify the (already) async code in oci::ImageOp to download layers in parallel. This is a big deal for images with many layers (as is often the case for bootc images, due to the splitting heuristics). This takes a pull of the Fedora Silverblue 42 container image (when pulled from a local `oci-dir`) from ~90s to ~8.5s time to complete on my laptop. Unfortunately, container images made from large single layers are not substantially improved. In order to make this change we need to depend on a new version of containers-image-proxy-rs which makes ImageProxy: Send + Sync, so bump our required version to the one released today. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
This can cause the proxy's control channel to experience lockups in heavily-threaded situations and we don't gain any benefit from doing it. Just drop it. See bootc-dev/containers-image-proxy-rs#80 Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
09999c1
to
4311a3d
Compare
For images with a large number of layers (as is the case for most bootc images) this is a massive win. I used this as a test:
skopeo copy docker://quay.io/fedora/fedora-silverblue:42 oci:$HOME/oci/silverblue-42
and then
cfsctl oci pull oci:$HOME/oci/silverblue-42
And noticed an improvement from ~90s to ~9s.
Unfortunately the situation isn't much improved for images with a small number of layers (as is the case in our own
examples/
directory).