Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 202 additions & 0 deletions specs/proposals/dm-verity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
# Proposal: Code Integrity for OCI Containers – Per-Layer DM-Verity Signing with Notation

## Overview

This proposal extends the Notation CLI to support signing and verifying OCI container image layers with dm-verity Merkle tree root hashes. While Notation currently supports signing container images at the manifest level, this proposal enables container image per-layer integrity protection that can be continuously enforced at runtime using dm-verity and [Integrity Policy Enforcement](https://docs.kernel.org/next/admin-guide/LSM/ipe.html) (IPE). Container layer DM-verity signing is critical for extending the trust from kernel code-integrity guarantees to workloads within trusted containers, while blocking the execution of untrusted containers and binaries created in mutable container state.

## Problem Statement & Motivation

Modern Linux container hosts may achieve a high level of security by running an immutable host OS, preventing tampering with system binaries. However, OCI container images themselves have traditionally not been held to the same standard – integrity is only verified at image pull time, with no continuous enforcement at runtime. This leaves a gap where if an attacker injects or executes a malicious binary inside a container, the host has no built-in mechanism to prevent it from running.

In developing [Azure Linux with OS Guard](https://techcommunity.microsoft.com/blog/linuxandopensourceblog/azure-linux-with-os-guard-immutable-container-host-with-code-integrity-and-open-/4437473) we aim to to extend code integrity protections into OCI containers using dm-verity and IPE. Each container image layer is backed by a read-only dm-verity block device whose integrity is ensured by a Merkle tree root hash. The root hash is signed by a key that the Linux kernel trusts. At container start, the kernel verifies each root hash signature before allowing the layer to mount. IPE policies then allow execution only from layers with verified hashes.

Signing container images at the manifest level alone is not sufficient to ensure continuous runtime integrity. While manifest signatures protect against image tampering during distribution (image pull), they do not enable enforcement at runtime. Container layers must also have kernel-verifiable signatures to ensure their integrity. With the increasing adoption of immutable infrastructure and zero-trust security models, securing these artifacts with continuous kernel enforcement is critical. By adding per-layer container image signing, Notation can extend its capabilities to enable kernel-enforced container integrity.

The challenge is that OCI registries and container image tools currently support distributing signatures but there is no good tooling for creating per-layer signatures. Existing Notation signatures use JWS or COSE formats, which cannot be verified by the Linux kernel. Kernel-level dm-verity enforcement requires each layer's dm-verity root hash to be accompanied by a PKCS#7 signature that the kernel can check at mount time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's recomnneded to define the PKCS#7 signature envelop format in https://github.com/notaryproject/specifications/tree/main/specs

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I'll create a new PR under that repo and refer to it in this document


## Scenarios

The following describes how per-layer dm-verity signing can enhance container security across different attack scenarios:

### Scenario 1: Runtime Layer Tampering (Current Implementation)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will create a new PR under that repo and refer to it in this document


Sarah, a DevOps engineer, deploys a containerized application to a production Kubernetes cluster. The container images are signed using standard Notation manifest signatures. An attacker gains access to a worker node and modifies one of the container layers on the host filesystem by injecting a malicious binary. When the container restarts, the modified layer is mounted without detection because:
- Notation's manifest signature only verifies the image at pull time
- The digest in the manifest still matches the original layer blob in the registry
- The layer is mounted without verification of root hashes after being tampered with offline (offline attack allowed)
- The malicious binary cannot be detected at runtime since IPE is not present to prevent binaries executing from unsigned dm-verity volumes (runtime attack allowed)


The malicious binary executes successfully, compromising the application. Current Notation signatures cannot prevent this attack because they don't provide continuous runtime enforcement at the image layer level.

### Scenario 2: Kernel-Enforced Layer Integrity (Proposed Solution)

With the proposed per-layer dm-verity signing:

David, a platform engineer, uses Notation with the proposed dm-verity signing changes. The following happens:
1. Each container image layer is processed to generate an EROFS filesystem with dm-verity metadata
2. The dm-verity Merkle tree root hash for each layer is computed deterministically
3. Each root hash is signed using PKCS#7 format with the company's signing key
4. Image layer signatures and metadata are injected into a referrer artifact attached to the image manifest in the registry

When containers are deployed:
1. The EROFS containerd snapshotter fetches the OCI image and its attached referrer artifact that contains the layer signatures from the registry
2. For each layer, the snapshotter creates a dm-verity block device, passing the root hash and PKCS#7 signature to the kernel
3. The Linux kernel verifies the PKCS#7 signed root hash against trusted keys before mounting the layer (offline attack blocked)
4. IPE policies enforce that only correctly signed dm-verity volumes can execute code at runtime (runtime attack blocked)

If an attacker attempts to modify a layer like in the first scenario, the root hash verification fails immediately and the kernel refuses to mount the tampered layer. If an attacker drops an unsigned binary into a running container and tries to execute it, IPE blocks execution because it was not loaded from a signed dm-verity volume.

## Proposal
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about: Multi-arch image signing? For example, when users sign the index manifest of multi-arch image. Currently Notation doesn't support recursively signing multi-arch image, which means users sign the index first, then users need to sign individual arch manifest one by one.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a blurb in the Extended Notation CLI section. The goal would be to deviate from the default manifest signing as little as possible. So, for dm-verity signing, the same would happen meaning each individual arch manifest would have to be signed by the user. The index and manifest referrers don't contain root hashes so no need to sign those with the dm-verity arg.


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing to consider is Notation supports signing OCI image layout, which is a directory structure on filesystem. This allows notation to sign images on local filesystem without registries, after signing, users can use oras to publish images including all signatures to remote registries. This way is more secure as images and sigantures are within the same trust boundary during signing.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a blurb in the Extended Notation CLI section to explain this is also a requirement. However, I haven't implemented this in the minimal prototype.

This section outlines the proposed solution for signing and verifying OCI container image layers with dm-verity root hashes using Notation CLI commands. The following topics are outside the scope of this document:

- Detailed command usage, which will be covered in individual command specifications
- [EROFS](https://erofs.docs.kernel.org/en/latest/) filesystem implementation details
- dm-verity kernel subsystem internals
- IPE policy configuration

**Requirements:**
1. Support for PKCS#7 signature format (in addition to existing JWS/COSE)
2. Per-layer signing capability
3. Deterministic EROFS image and Merkle tree generation
4. OCI registry distribution for a new artifact containing the signed layer root hashes via ORAS Attached Artifacts

### Extended Notation CLI

Extend `notation sign` with a new `--dm-verity` flag to enable automated per-layer signing. While the command below assumes the container image exists in a remote registry, this argument should also work when signing [OCI image layouts](https://github.com/notaryproject/notation/blob/main/specs/cmd/sign.md#experimental-sign-container-images-stored-in-oci-layout-directory) with argument `--oci-layout` for local signing.

The manifest from the default sign behavior is signed with the expected JWS/COSE formats that can be verified in userspace while the layer hashes are signed with the PKCS#7 format by default until other formats are supported.

This command will not recursively sign multi-arch container images. In this case, the command should be run for each individual image for the requested architecture.

**Sample command:**
```bash
notation sign --dm-verity \
--id myKeyId \
myregistry.azurecr.io/myapp@sha256:def456...
```
**Sample output:**
```
Successfully signed myregistry.azurecr.io/myapp@sha256:def456...
Pushed the dm-verity signatures to myregistry.azurecr.io/myapp@sha256:439dd2...
```

**Steps:**

1. **Pull the image manifest** from the registry.
2. **Iterate through all layers** in the manifest. For each layer:
- Pull the layer blob
- Generate an EROFS image by decompressing the tar layer and using `mkfs.erofs` (deterministic, read-only filesystem image)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a limitation for layer size?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no hardcoded size limit so it would likely depend on the size of the directory the operations take place in, which would be the /tmp directory. There is a timeout set to 5 minutes.

- Compute the dm-verity Merkle tree root hash from the EROFS image
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an estimated time for signing, say, a 1 MB layer

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's around 5s regardless of layer size. Added section Performance Metrics to capture this.

This consistent 5s time is likely due to the other processes like decompressing the tar, writing files, starting mkfs.erofs, etc. taking longer than the data processing like generating the Merkle tree and calculating the root hash of the actual layer

- Sign the root hash using PKCS#7
3. **Create a signature envelope** for each layer containing:
- The signed root hash of the EROFS image.
- Signer cert embedded inside the PKCS#7 signature blob
- Digest info of the original layer digest blob
- Digest and size of the PKCS#7 signature file
4. **Create a signature manifest** containing:
- All per-layer signature envelopes
5. **Attach the signature manifest** to the image manifest as an attached artifact (referrer) in the registry.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if the dm-verity signature manifest is removed or modified by maliciuos users?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - if the dm-verity manifest is completely removed, the containerd snapshotter (pending upstream work) will look for root hashes from the dm-verity referrer artifact. When the snapshotter doesn't find any, container layer mounting will fail and prevent the container workload from running.

If the dm-verity referrer artifact is modified, if the root hashes were modified, the snapshotter will calculate the root hashes at runtime and compare them with the values in the referrer artifact. If they match, the layer can get mounted. If they don't match the layer will not be mounted like in the first scenario.

If the signatures were tampered with but the root hashes remain the same, the snapshotter will see the root hash calculated at runtime matches the value in the manifest and mount each container image layer. In this case the root hash signatures are tampered with, so when a binary from the container image tries to execute, the kernel detects that the layer's signature is invalid and IPE prevents the binary from executing even though the layers mounted successfully.


**Example signature manifest structure:**

```json
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"artifactType": "application/vnd.cncf.notary.signature.dm-verity",
"config": {
"mediaType": "application/vnd.oci.empty.v1+json",
"digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
"size": 2
},
"layers": [
{
"mediaType": "application/pkcs7-signature",
"digest": "sha256:abc123...",
"size": 1479,
"annotations": {
"io.cncf.notary.layer.digest": "sha256:layer0digest...",
"io.cncf.notary.dm-verity.root-hash": "0dcd29977f675344645e8c907b5a86b490335e7a2657a2ba45d00e7944701eed",
"org.opencontainers.image.title": "layer-0.pkcs7.sig"
}
}
],
"subject": {
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:imagemanifestdigest...",
"size": 1234
},
"annotations": {
"io.cncf.notary.dm-verity.signature": "true"
}
}
```

The layers section now contains metadata for each image layer signature. Each entry describes:
- mediaType: The format of the signature (PKCS#7).
- digest: The SHA-256 hash of the signature file.
- size: The signature file size in bytes.
- annotations: Extra metadata, including:
- The digest of the signed layer.
- The dm-verity root hash for integrity verification.
- A human-readable title for the signature file.

The new entries are described below:
- io.cncf.notary.layer.digest: The digest of the original image layer
- io.cncf.notary.dm-verity.root-hash: The root hash value of the dm-verity block device
- io.cncf.notary.dm-verity.signature=true: This is a flag that notifies Notation that dm-verity signatures and root hashes exist in the artifact

**Performance Metrics:**
- Registry overhead: ~4 KB per layer
- PKCS#7 signature blob: ~2 KB
- Manifest entry: ~2 KB
- Signing time: ~4-5 seconds per layer
- Timeout: 5 minutes for EROFS conversion with no hardcoded maximum layer size


**Verification command:**

```bash
notation verify myregistry.azurecr.io/myapp@sha256:def456...
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it dm-verity is at runtime, so the notation verify command doesn't verify layers, the behavior is the same as existing one, just output addtional information for dm-verity

Copy link
Owner Author

@dallasd1 dallasd1 Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. It should be able to be done by the verify API inspecting either the artifact type of the referrer or the annotation within the referrer (io.cncf.notary.dm-verity.signature=true).

Expanding verify to work with the dm-verity layer hashes also may not be strictly required since the kernel enforce the signatures exist

```

The command interface will not change. This command will not check layer signature compatibility with any keys. It will only output additional information for dm-verity. If the image has dm-verity signatures attached, Notation should:
1. Detect the `io.cncf.notary.dm-verity.signature=true` annotation
2. Inform the user that kernel-level verification is required

**Sample output:**

```
Successfully verified signature for myregistry.azurecr.io/myapp@sha256:def456...
Note: This image includes dm-verity layer signatures for kernel-enforced integrity.
```

### PKCS#7 Signature Format Support

To enable kernel verification, Notation must support PKCS#7 signature envelopes:

**Requirements:**
- PKCS#7 envelope generation for signing
- X.509 certificate chain embedding
- Compatibility with Linux kernel key rings
- User-mode verification support (for build-time validation)


### Runtime Verification Workflow (implemented in containerd)

This work is ongoing in the containerd project and is described in milestone 1 of the [RFC for Code Integrity](https://github.com/containerd/containerd/issues/12081). The EROFS containerd snapshotter implements the following:

1. **Container Start**: When a container is scheduled, fetch the image manifest
2. **Signature Discovery**: Fetch the signature referrer artifact if one is attached to the image manifest
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d like to understand the performance impact of runtime verification.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little trickier to calculate but I'm finding ~10ms for the end-to-end for image+referrer pull to pod exit for a single layer "hello world" image.

The problem is that the erofs snapshotter PR is under review but has not been merged into containerd yet (milestone1.2 in this RFC). To test these Notation changes, I am using a prototype snapshotter (tardev) that already has the required changes. This prototype tardev snapshotter will inherently have a different execution time than the erofs snapshotter.

Additionally, the containerd changes that utilize the signing portion of these changes (milestone1.3 in this RFC) have not been created yet. The performance impact we may want to capture is the erofs snapshotter with vs. without signing validation.

3. **Layer Processing**: For each layer:
- Fetch the PKCS#7 signature and metadata from the signature manifest
- Generate the EROFS image from the decompressed tar file
- Compute the dm-verity Merkle tree root hash from the layer content
- Create a dm-verity block device, passing the root hash and PKCS#7 signature to the kernel
4. **Kernel Verification**: The kernel verifies the PKCS#7 signature against the trusted keyring
5. **Mount**: If verification succeeds, the kernel mounts the dm-verity protected layer
6. **IPE Enforcement**: IPE policies allow code execution only from verified dm-verity volumes

This provides continuous integrity protection. Any tampering with layer content causes dm-verity verification to fail, preventing mount and execution.