Partial fix for predicting Docker 27 image IDs from older images#270
Draft
mslw wants to merge 2 commits intodatalad:masterfrom
Draft
Partial fix for predicting Docker 27 image IDs from older images#270mslw wants to merge 2 commits intodatalad:masterfrom
mslw wants to merge 2 commits intodatalad:masterfrom
Conversation
It appears that while Docker 27 has no problem loading images saved with older versions, it generates the ID based on the "new style" (OCI-compliant) manifest that it would save starting with v25, and not the config file stored in the dataset. This causes DataLad to error out due to ID mismatch, although the ID is most likely equivalent; see datalad#269 This commit is the first attempt to solve this issue. Since the manifest is a structured file, an attempt is made to generate a "new" style manifest based on the contents of the saved image, and derive the ID from that. The manifest needs file types, sizes, and checksums. While we could copy checksums from the previous manifest / config, we do not seem to have the sizes. To solve that problem, we get both through ls_file_collection from datalad-next. This is convenient and quick, but introduces a new dependency. The generated structure and content are a guesswork based on reading the OCI spec and seeing docker save output from a single container - it sure works from that container and tries to be applicable more broadly, but most likely won't cover more complicated cases, or those where I'm not even sure what behavior to expect (e.g. multi-arch manifest?). Layers are assumed to always be rootfs_diff (I currently don't know if there are other types possible). This commit focuses on reading older images with new Docker, and does not address reading new images (reading images saved with Docker 26 would still fail, because it already uses the new save format which our adapter does not expect). So the combinatorics around that will need to be addressed later. The new code would only trigger for Docker 27. It introduces one small regression, where get_image_id raises a NotImplementedError for two arguments which can be given to the old get_image.
|
Code Climate has analyzed commit 3534fbf and detected 6 issues on this pull request. Here's the issue category breakdown:
View more on Code Climate. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is an attempt to address #269 - more of an exploration really.
It appears that while Docker 27 has no problem loading images saved with older versions, it generates the ID based on the "new style" (OCI-compliant) manifest that it would save starting with v25, and not the config file stored in the dataset. This causes DataLad to error out due to expected / observed ID mismatch.
Here, an attempt is made to generate a "new style" manifest from the old image, and use its hash as the expected ID. It is not complete (in particular, there can be different combination of save vs load versions), it is fragile (if my assumptions about the format are wrong or the format changes, the expected ID will not be correct either), but it works for at least one dataset (paper-remodnav).
With that in mind, I am also trying to think of an alternative approach. Is there an existing tool for deriving new IDs from old manifests? Can we expect older containers to be annotated with new IDs as their maintenance?
I am leaving the PR as a draft, since I do not expect to follow up immediately. However, it may be useful as at least one way of dealing with the problem.