-
Notifications
You must be signed in to change notification settings - Fork 60
cleanup-acr-images-official pipeline is broken #2056
Description
The cleanup-acr-images-official pipeline has been broken for some time. This issue should be closed only once the pipeline runs successfully.
Current failure modes:
1. Ghost manifest crashes GenerateEolAnnotationDataForAllImagesCommand - #2055.
GetAllManifestPropertiesAsync lists a manifest in build-staging/2881074/dotnet/nightly/aspnet that no longer exists - GetManifestAsync returns 404 every time. Because the exception is unhandled, it crashes the entire Annotations job. This has been blocking the cleanup pipeline from completing at all.
2. oras CLI is not authenticated in the Clean job - #2052.
The pruneEol action calls IsDigestAnnotatedForEol, which shells out to oras discover. The oras CLI has no credentials in the Clean job - every call returns 401 Unauthorized.
3. Duplicate annotations keep staging repos perpetually fresh.
#2045 caused ImageBuilder to think that no image digests had EOL lifecycle annotations. This causes ImageBuilder to attach lots of duplicate annotations (See dotnet/dotnet-docker#7121). Furthermore, staging image repos are pruned based on their LastUpdatedOn date. Months of duplicate annotation writes have updated LastUpdatedOn on staging repos, so the Delete build-staging/* step's 15-day age check never passes. 1,840 staging repos have accumulated. This also inflates the publish pipeline's generateEolAnnotationDataForPublish to 78 minutes and 5M log lines, as it enumerates and checks all the accumulated digests and their bloated referrer lists.
List of related issues/PRs:
- Handle 404 in GetAllImageDigestsFromRegistryAsync #2055
- Cleanup pruneEol fails: "unauthorized: authentication required" #2051
- Revert "Remove unnecessary Docker login from CleanAcrImagesCommand" #2052
- pruneEol action deletes 0 images due to
oras discoveroutput format change #2045 - Remove unnecessary Docker login from CleanAcrImagesCommand #2044
- In steps/clean-acr-images, use dryRunArg to fit pipeline usage #2035
- The
steps/clean-acr-images.ymltemplate always runs in dry run mode when called bystages/cleanup-acr-images.yml#2034 - Parallelize
GetAllImageDigestsFromRegistryAsync#1909 GetAllImageDigestsFromRegistryAsyncis suddenly extremely slow #1905- Fix cleanup-acr-images pipeline timeout by increasing EOL Annotations job timeout to 2 hours #1830
Metadata
Metadata
Assignees
Type
Projects
Status