Skip to content

Improve Kernel Build Artifact Reuse with CAS-Style Caching #59

@davidgantman

Description

@davidgantman

Problem

Today, our tool runs each static analysis tool in its own container. Many of these tools (e.g., sparse, dt_binding_check) benefit from having an already-compiled kernel tree.

  • The kernel build system (Kbuild + make)'s outputs are not multi-tenant. One output tree (O=...) can only hold one coherent set of objects at a time.
  • Switching toolchains (e.g., Clang-14 vs Clang-19) or configs invalidates and overwrites previous artifacts.
  • No concept of "union" reuse: artifacts not used in the current build are discarded, even if future builds could reuse them.

This leads to cache efficiency regressions when running tool A→B→C vs A→C→B, even if A and C are nearly identical.

Proposed Solution: CAS + Action Cache Layer

Introduce a Content-Addressable Storage (CAS) with an Action Cache on top of Kbuild:

  • Action Key = hash of:
    • Compiler/linker fingerprint (binary + version),
    • Normalized command line flags that affect codegen,
    • Hash of preprocessed source + included headers (contents, not mtimes),
    • Relevant environment that affects outputs.
  • Action Value = manifest of outputs (.o, .d, .cmd, .dtb, etc.) mapped to blob hashes.

Workflow:

  1. Wrap CC, LD, DTC, etc. with shims that compute the action key.
  2. On cache hit: hydrate (copy/link) outputs from CAS into the O= true so Kbuild thinks the step is already up-to-date.
  3. On cache miss: run the real tool, store outputs in CAS, and record the manifest.
  4. Mount a shared tmpfs volume across containers for /cas, so every patch review container can reuse artifacts in-RAM.

Benefits:

  • Identical inputs → indentical key → instant reuse, even across different commits or container runs.
  • Multiple compilers/configs coexist (Clang-14 objects don't overwrite Clang-19 objects).
  • Order-invariant: A→B→C costs the same as A→C→B.
  • At least as efficient as native Kbuild; often better.

Alternatives Considered

  1. Persistent Shared O= Directory (current)
  • Let all containers share one build tree.
  • Breaks due to Kbuild overwrites; only last compiler/config survives.
  1. Naïve Plain Build Container
  • Read through Dockerfiles manually, determine "most common" compiler.
  • Create a plain build once with the "most common" compiler.
  • Inefficient if the user only runs checks that don't use the hard-coded "most common" compiler.
  • Inefficient if the "most common" compiler is actually only used in a few checks.
  1. Bazel/Buck/Pants
  • These implement CAS + action caches natively.
  • Re-expressing the Linux kernel's Kbuild rules in Bazel is a huge lift.
  • Would only be practical if the full Bazelization of the kernel was within the scope of the project; Bazelizing the entire kernel is not with PatchWise's scope (hopefully 😋).

References & Helpful Resources

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions