-
Notifications
You must be signed in to change notification settings - Fork 18
Using a new Docker version (27) to run an old-ish docker image saved in the dataset causes DataLad error (image ID mismatch) #269
Description
With Docker 27, trying to run a docker container which was saved using an older version of Docker results with an error:
>python -m datalad_container.adapters.docker run container/image sh -c "echo 123"
(...)
RuntimeError: docker image sha256:f881bd4db45ac9775f5a5377485a7c939fea4685d7482eed4809cb83fc3b51a3 was not successfully loaded
Docker loads an image, but its ID does not match what DataLad expects based on the image that was stored:
>docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
remodnav latest 81aaa31870f5 16 months ago 3.8GB
This was observed when trying to reproduce paper-remodnav (versioned link), and snippets in this issue are based on that dataset.
Which software versions are affected?
Unclear. The problem was observed and later confirmed on Windows with Docker version 27.5.1. For me, the problem does not replicate on Debian 12 (bookworm) with Docker version 20.10.4 (docker.io package). @mih reports that it still works on his laptop, with v26.1.5.
As far as saving the image goes, I don't know which Docker version was used; however, I suppose < 25 for reasons explained below.
Where in the code does the problem happen?
The error message comes from the datalad_container.adapters.docker function:
datalad-container/datalad_container/adapters/docker.py
Lines 110 to 150 in 55309f8
| def load(path, repo_tag, config): | |
| """Load the Docker image from `path`. | |
| Parameters | |
| ---------- | |
| path : str | |
| A directory with an extracted tar archive. | |
| repo_tag : str or None | |
| `image:tag` of image to load | |
| config : str or None | |
| "Config" value or prefix of image to load | |
| Returns | |
| ------- | |
| The image ID (str) | |
| """ | |
| # FIXME: If we load a dataset, it may overwrite the current tag. Say that | |
| # (1) a dataset has a saved neurodebian:latest from a month ago, (2) a | |
| # newer neurodebian:latest has been pulled, and (3) the old image have been | |
| # deleted (e.g., with 'docker image prune --all'). Given all three of these | |
| # things, loading the image from the dataset will tag the old neurodebian | |
| # image as the latest. | |
| image_id = "sha256:" + get_image(path, repo_tag, config) | |
| if image_id not in _list_images(): | |
| lgr.debug("Loading %s", image_id) | |
| cmd = ["docker", "load"] | |
| p = sp.Popen(cmd, stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE) | |
| with tarfile.open(fileobj=p.stdin, mode="w|", dereference=True) as tar: | |
| tar.add(path, arcname="") | |
| out, err = p.communicate() | |
| return_code = p.poll() | |
| if return_code: | |
| lgr.warning("Running %r failed: %s", cmd, err.decode()) | |
| raise sp.CalledProcessError(return_code, cmd, output=out) | |
| else: | |
| lgr.debug("Image %s is already present", image_id) | |
| if image_id not in _list_images(): | |
| raise RuntimeError( | |
| "docker image {} was not successfully loaded".format(image_id)) | |
| return image_id |
The function performs a relatively simple operation: it creates a tar file object from the contents of the requested directory, and pipes it directly into docker load (all done with streams, without saving intermediate files). It then compares the image ID reported by docker to the one inferred from the image stored in the dataset - this is where the error is raised.
The expected ID is returned by get_image:
datalad-container/datalad_container/adapters/docker.py
Lines 88 to 107 in 55309f8
| def get_image(path, repo_tag=None, config=None): | |
| """Return the image ID of the image extracted at `path`. | |
| """ | |
| manifest_path = op.join(path, "manifest.json") | |
| with open(manifest_path) as fp: | |
| manifest = json.load(fp) | |
| if repo_tag is not None: | |
| manifest = [img for img in manifest if repo_tag in (img.get("RepoTags") or [])] | |
| if config is not None: | |
| manifest = [img for img in manifest if img["Config"].startswith(config)] | |
| if len(manifest) == 0: | |
| raise ValueError(f"No matching images found in {manifest_path}") | |
| elif len(manifest) > 1: | |
| raise ValueError( | |
| f"Multiple images found in {manifest_path}; disambiguate with" | |
| " --repo-tag or --config" | |
| ) | |
| with open(op.join(path, manifest[0]["Config"]), "rb") as stream: | |
| return hashlib.sha256(stream.read()).hexdigest() |
Again, the operation is relatively simple. The function opens the image manifest stored in the dataset, opens the config file it points to, and hashes its content.
Investigating the docker save layout and speculation about IDs
With that dataset, I am able to mimic DataLad's approach in creating the tar file, and save it to a file for further inspection and for loading with docker load -i:
>>> with tarfile.open("img.tar", mode="w|", dereference=True) as tar:
... tar.add("container\\image", arcname="")Note: I tried writing the tar file on both GNU/Linux and Windows. The files had different checksums (new line characters? tar header?) but both produced the same image ID when loaded on Windows.
With that, I also tried a docker load - docker save round-trip. Docker 27 has no problem loading an image generated from the dataset content in the manner above. When saving, it produces a different layout - one that is OCI compatible in fact. See OCI image format specification and, in particular, the part about Image layout.
The change in save layout was most likely introduced in Docker 25 - the release notes for Docker Engine 25.0.0 include "The docker image save tarball output is now OCI compliant".
This is the layout of a tar file created from the dataset:
img_dataset
├── 360338cd2a802f4812f06fbc50237a42bc0303390efa7fa321c381e6ec36d1ae
│ ├── json
│ ├── layer.tar
│ └── VERSION
├── 705094a41713537ec5205e79423114633a7225bae388e7ba823d92126c6b36c0
│ ├── json
│ ├── layer.tar
│ └── VERSION
├── f881bd4db45ac9775f5a5377485a7c939fea4685d7482eed4809cb83fc3b51a3.json
├── manifest.json
└── repositories
And this is the one created after running docker load and docker save:
img_load_save
├── blobs
│ └── sha256
│ ├── 81aaa31870f52a6265bef39d0be0df7f82bab3839344ec8da54cc6c18e3fd7a0
│ ├── d310e774110ab038b30c6a5f7b7f7dd527dbe527854496bd30194b9ee6ea496e
│ ├── e2728fc6d2c404f7b41e0fa4f889117090f4476eefab2bca48d7164dcbf7a0cb
│ └── f881bd4db45ac9775f5a5377485a7c939fea4685d7482eed4809cb83fc3b51a3
├── index.json
├── manifest.json
└── oci-layout
Note that the blobs include both 81aaa (which matches the image ID reported by Docker 27) and f881b (which matches the ID that DataLad expected to see, and more than likely also the ID that Docker 20 would report).
Let's explore the new layout then (note: all JSON contents below are presented with jq for readability). First, there is manifest.json:
[
{
"Config": "blobs/sha256/f881bd4db45ac9775f5a5377485a7c939fea4685d7482eed4809cb83fc3b51a3",
"RepoTags": [
"remodnav:latest"
],
"Layers": [
"blobs/sha256/d310e774110ab038b30c6a5f7b7f7dd527dbe527854496bd30194b9ee6ea496e",
"blobs/sha256/e2728fc6d2c404f7b41e0fa4f889117090f4476eefab2bca48d7164dcbf7a0cb"
]
}
]The manifest references the config with f881b checksum - this is the "old" config, and the one DataLad would look at when determining the expected image ID! However, according to the OCI Image Layout Specification, this manifest is a "file associated with a backwards compatible docker save format", and is not part of the spec.
The mandatory file, acording to the OCI spec, is index.json, and here are its contents:
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.index.v1+json",
"manifests": [
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"digest": "sha256:81aaa31870f52a6265bef39d0be0df7f82bab3839344ec8da54cc6c18e3fd7a0",
"size": 586,
"annotations": {
"io.containerd.image.name": "docker.io/library/remodnav:latest",
"org.opencontainers.image.ref.name": "latest"
}
}
]
}This index file points to a manifest, with a digest (81aaa) matching the ID of the dataset created by Docker 27.
Here is the content of that manifest, ie. blobs/sha256/81aaa...:
{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"config": {
"mediaType": "application/vnd.docker.container.image.v1+json",
"digest": "sha256:f881bd4db45ac9775f5a5377485a7c939fea4685d7482eed4809cb83fc3b51a3",
"size": 3157
},
"layers": [
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar",
"digest": "sha256:d310e774110ab038b30c6a5f7b7f7dd527dbe527854496bd30194b9ee6ea496e",
"size": 77814784
},
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar",
"digest": "sha256:e2728fc6d2c404f7b41e0fa4f889117090f4476eefab2bca48d7164dcbf7a0cb",
"size": 1750877184
}
]
}This manifest points to a config file with f881b digest, ie. exactly the one from the dataset!
It would seem that it is this manifest, rather than the config file, that docker uses as the basis for the dataset ID. However, given that it is checksums (of the config and the layers) all the way down, this seems to be equivalent (with Docker now hashing a "higher-level" metadata file). However, I wasn't able to find an indication of the ID change in Docker's release notes or documentation, so this is a speculation based on comparing the save layouts and reading the OSI spec.
How can we fix this?
This is unclear at the moment.
If I am right about Docker 27's ID being based on a metadata representation which is equivalent but different to the file saved in the dataset, this means that with the old layout we can't know the ID upfront (unless we try to create the manifest ourselves, which seems doable but finicky).
One possible workaround would be to simply drop the ID check which produced an error. We would still rely on an exit code from docker load giving us some assurance that loading succeeded, so it does not sound entirely wrong.
However, the expected ID is being checked (against a list of Docker images being present) twice. The first time, it is done to decide whether the image needs to be loaded in the first place. So not changing that part would mean loading the image every time the function is called, which sounds bad.