sgx: add automated DCAP registration using in-cluster PCCS caching #2121

mythi · 2025-08-27T15:29:24Z

This setup gives an automated "online, multi-platform, PCCS based Indirect Registration" and TDX QGS deployment for Kubernetes based clusters.

Building blocks:

in-cluster PCCS caching service deployment
PCKIDRetrievalTool sidecar and TDX QGS in a single daemonset

Pre-conditions:

Read the basics of Intel TDX remote attestation infrastructure setup and get an Intel PCS API Key. The node(s) have TDX and SGX enabled. The following also assumes that a user has cloned this PR and has a bare-metal cluster available.

Installation:

Deploy SGX device plugin with the "DCAP infrastructure resources" enabled:

kubectl apply -k deployments/sgx_plugin/overlays/dcap-infra-resources

Make the new (unpublished) images available to your cluster:

make sgx-pccs sgx-dcap-infra
docker save intel/sgx-pccs:devel | sudo ctr -n k8s.io i import -
docker save intel/sgx-dcap-infra:devel | sudo ctr -n k8s.io i import -

Deploy PCCS

pushd deployments/sgx_dcap/pccs
<check notes in kustomization.yaml to populate .env.pccs-tokens>
kubectl apply -k .
popd

NB: if a proxy setting is needed, edit pccs.yaml

Deploy platform-registration and TDX QGS

pushd deployments/sgx_dcap/base
<get your USER_TOKEN to .env.pccs-credentials>
kubectl apply -k .
popd

NB: add nodeSelector to filter SGX/TDX enabled nodes if run in a multi-node cluster

Check things are up:

NAME                               READY   STATUS    RESTARTS   AGE
intel-dcap-node-infra-q9zms        2/2     Running   0          17h
intel-dcap-pccs-647568f67d-ftjb2   1/1     Running   0          17h
intel-sgx-plugin-bgfgv             1/1     Running   0          17h

$ kubectl logs -c platform-registration intel-dcap-node-infra-q9zms 
Waiting for the PCCS to be ready ...
PCCS is online, proceeding ...
Calling PCKIDRetrievalTool ...

Intel(R) Software Guard Extensions PCK Cert ID Retrieval Tool Version 1.23.100.0

Registration status has been set to completed status.
the data has been sent to cache server successfully!

The node should have /var/run/tdx-qgs/qgs.socket available for QEMU to connect.

Notes:

PCCS database is stored to a RAM based EmptyDir volume and currently not backed up (a backup mechanism is added later). Keep the PCCS deployment up. If quoting errors occur, full re-install after an SGX Factory Reset might be required.

BFuhry · 2025-08-29T19:17:42Z

demo/sgx-dcap-infra/Dockerfile

+WORKDIR /opt/intel
+
+ARG SGX_SDK_URL=https://download.01.org/intel-sgx/sgx-linux/2.26/distro/ubuntu24.04-server/sgx_linux_x64_sdk_2.26.100.0.bin
+
+RUN curl -sSLfO ${SGX_SDK_URL} \
+ && export SGX_SDK_INSTALLER=$(basename $SGX_SDK_URL) \
+ && chmod +x $SGX_SDK_INSTALLER \
+ && echo "yes" | ./$SGX_SDK_INSTALLER \


All probs to @ScottR-Intel for: sudo ./sgx_linux_x64_sdk_2.26.100.0.bin --prefix /opt/intel

MatiasVara · 2025-09-17T08:58:29Z

deployments/sgx_dcap/pccs/kustomization.yaml

+
+# self-signed TLS certs for pccs-tls:
+# openssl req -x509 -sha256 -nodes -days 365 -newkey rsa:2048 -keyout private.pem -out file.crt -subj "/C=US/ST=Denial/L=Springfield/O=Dis/CN=www.example.com"
+# token hashesh follow (with 'hellworld' changed to the desired secret tokens):


Suggested change

# token hashesh follow (with 'hellworld' changed to the desired secret tokens):

# token hashesh follow (with 'helloworld' changed to the desired secret tokens):

MatiasVara · 2025-09-18T13:55:26Z

deployments/sgx_dcap/base/node-services.yaml

+              name: pccs-credentials
+        securityContext:
+          readOnlyRootFilesystem: true
+          allowPrivilegeEscalation: false


I had to change this to make it work:

- allowPrivilegeEscalation: false + privileged: true + allowPrivilegeEscalation: true

I think that for socket device plugins the container that exposes the unix socket shall be privileged (see the pr-helper example).

MatiasVara · 2025-09-18T14:24:57Z

I tried and the unix socket is not visible from the virt-launcher. This means that we still need something like a socket device plugin in the virt-handler to mount it. I do not think this is the place for that PR though.

MatiasVara · 2025-09-25T09:48:37Z

I observe that when remove the qgs pod, the new instance fails because the unix socket still exists. I think the unix socket should be removed when the pod is removed otherwise the unix should be removed manually.

demo/sgx-pccs/default.json

mythi · 2025-09-25T11:40:53Z

I think the unix socket should be removed when the pod is removed otherwise the unix should be removed manually.

I saw this too and reported a bug to QGS about this. I need to see if I can workaround that in the mean time.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>

Aseeef · 2025-10-16T20:22:19Z

Hi I've been experimenting with this PR and had much of the same issues as @MatiasVara on OpenShift. Most of these issues seem to be directly related to SELinux.

I wanted to share a case of SELinux denial which had me scratching my head for a while involving the mounting failures of the EFI variables.

When the platform-registration container requests the "sgx.intel.com/registration" resource, that triggers a mount (as per the specifications of the device plugin in cmd/sgx_plugin/sgx_plugin.go). Consequently, the sgx plugin pod tries to write to /var/run/cdi/ to create the CDI spec file that tells the container runtime to mount efivars into the container resulting in the following:

time->Thu Oct 16 20:01:53 2025
type=PROCTITLE msg=audit(1760644913.951:8108): proctitle=2F7573722F6C6F63616C2F62696E2F696E74656C5F7367785F6465766963655F706C7567696E002D646361702D696E6672612D7265736F7572636573
type=SYSCALL msg=audit(1760644913.951:8108): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c0002e0040 a2=80000 a3=0 items=0 ppid=1637961 pid=1637963 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="intel_sgx_devic" exe="/usr/local/bin/intel_sgx_device_plugin" subj=system_u:system_r:container_device_plugin_t:s0:c170,c921 key=(null)
type=AVC msg=audit(1760644913.951:8108): avc:  denied  { read } for  pid=1637963 comm="intel_sgx_devic" name="cdi" dev="tmpfs" ino=4248 scontext=system_u:system_r:container_device_plugin_t:s0:c170,c921 tcontext=system_u:object_r:container_var_run_t:s0 tclass=dir permissive=0

But then there appears to be a second more fundamental conflict with SELinux here. Even if you turn on permissive mode (setenforce 0), mounting still fails because:
SELinux: mount invalid. Same superblock, different security settings for (dev efivarfs, type efivarfs)
My interpretation of this error is that this happens because SELinux still tries to label the efivarfs file system with the context of the container. But then that conflicts with the context the host had the efivarfs file mounted with, resulting in another mount failure (despite SELinux enforcement being disabled).

The fix to this was to modify cmd/sgx_plugin/sgx_plugin.go to create a bind mount from the host to the container instead of trying to mount efivars a second time. This can be achieved by modifying...

Devices: []cdispec.Device{
			{
				Name: "efivarfs",
				ContainerEdits: cdispec.ContainerEdits{
					Mounts: []*cdispec.Mount{
						{HostPath: "efivarfs", ContainerPath: "/run/efivars", Type: "efivarfs", Options: []string{"rw", "nosuid", "nodev", "noexec", "relatime"}},
					},
				},
			},
		},

to

Devices: []cdispec.Device{
			{
				Name: "efivarfs",
				ContainerEdits: cdispec.ContainerEdits{
					Mounts: []*cdispec.Mount{
						{HostPath: "/sys/firmware/efi/efivars", ContainerPath: "/run/efivars", Type: "none", Options: []string{"bind", "rw", "nosuid", "nodev", "noexec", "relatime"}},
					},
				},
			},
		},

mythi · 2025-10-18T09:11:42Z

The fix to this was to modify cmd/sgx_plugin/sgx_plugin.go to create a bind mount from the host to the container instead of trying to mount efivars a second time. This can be achieved by modifying...

@Aseeef thanksfor the detailed investigation. This approach does not work with containerd because it sets /sys/firmware as a maskedPath so even if the bind mount is done, the container does not get to see the path unless it's run with privileged: true which I wanted to avoid. I need to take a closer look if there's anything I can do with SELinux.

mythi · 2025-10-18T09:16:17Z

Consequently, the sgx plugin pod tries to write to /var/run/cdi/ to create the CDI spec file that tells the container runtime to mount efivars into the container resulting in the following:

time->Thu Oct 16 20:01:53 2025
type=PROCTITLE msg=audit(1760644913.951:8108): proctitle=2F7573722F6C6F63616C2F62696E2F696E74656C5F7367785F6465766963655F706C7567696E002D646361702D696E6672612D7265736F7572636573
type=SYSCALL msg=audit(1760644913.951:8108): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c0002e0040 a2=80000 a3=0 items=0 ppid=1637961 pid=1637963 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="intel_sgx_devic" exe="/usr/local/bin/intel_sgx_device_plugin" subj=system_u:system_r:container_device_plugin_t:s0:c170,c921 key=(null)
type=AVC msg=audit(1760644913.951:8108): avc:  denied  { read } for  pid=1637963 comm="intel_sgx_devic" name="cdi" dev="tmpfs" ino=4248 scontext=system_u:system_r:container_device_plugin_t:s0:c170,c921 tcontext=system_u:object_r:container_var_run_t:s0 tclass=dir permissive=0

@Aseeef Does it mean the container runtime cannot read the file the plugin wrote?

Aseeef · 2025-10-18T16:26:33Z

Based on this part pid=1637963 comm="intel_sgx_devic" I think the issue happens earlier: The pod fails to write to CDI. I notice that the log I pasted says it fails { read }, but the next SELinux log also fails { write }.

time->Thu Oct 16 20:01:53 2025
type=PROCTITLE msg=audit(1760644913.951:8109): proctitle=2F7573722F6C6F63616C2F62696E2F696E74656C5F7367785F6465766963655F706C7567696E002D646361702D696E6672612D7265736F7572636573
type=SYSCALL msg=audit(1760644913.951:8109): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c0000c6180 a2=800c2 a3=180 items=0 ppid=1637961 pid=1637963 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="intel_sgx_devic" exe="/usr/local/bin/intel_sgx_device_plugin" subj=system_u:system_r:container_device_plugin_t:s0:c170,c921 key=(null)
type=AVC msg=audit(1760644913.951:8109): avc:  denied  { write } for  pid=1637963 comm="intel_sgx_devic" name="cdi" dev="tmpfs" ino=4248 scontext=system_u:system_r:container_device_plugin_t:s0:c170,c921 tcontext=system_u:object_r:container_var_run_t:s0 tclass=dir permissive=0

mythi · 2025-10-20T07:10:54Z

The pod fails to write to CDI.

We have other plugins doing the same. So far we have not heard this would be a problem on OCP but something we need to double check @tkatila

mythi · 2025-10-21T13:10:54Z

SELinux: mount invalid. Same superblock, different security settings for (dev efivarfs, type efivarfs)
My interpretation of this error is that this happens because SELinux still tries to label the efivarfs file system with the context of the container. But then that conflicts with the context the host had the efivarfs file mounted with, resulting in another mount failure (despite SELinux enforcement being disabled).

@Aseeef it looks OCI (runc) knows about mount labels and that they can also be set via mount options. looks like something worth trying at some point.

Aseeef · 2025-10-21T14:15:21Z

@Aseeef it looks OCI (runc) knows about mount labels and that they can also be set via mount options. looks like something worth trying at some point.

Thanks, I'll look into it!

tkatila · 2025-10-22T05:10:14Z

We have other plugins doing the same. So far we have not heard this would be a problem on OCP but something we need to double check @tkatila

There are two plugins using CDI somehow: FPGA and GPU. FPGA is not supported in OCP. GPU is supported in OCP but its functionality doesn't depend on CDI, it's just using it along with the generic device plugin API. It could be that CDI fails there also, but as it doesn't affect the execution, and no-one has noticed.

mythi · 2025-10-22T05:49:08Z

It could be that CDI fails there also, but as it doesn't affect the execution, and no-one has noticed.

getting a bit offtopic but I wonder if DRA drivers are impacted by the same?

tkatila · 2025-10-22T07:12:58Z

It could be that CDI fails there also, but as it doesn't affect the execution, and no-one has noticed.

getting a bit offtopic but I wonder if DRA drivers are impacted by the same?

I believe DRA is being enabled in the upcoming OCP 4.20. Hopefully they are detecting these SELinux restrictions.. fyi @byako

Aseeef · 2025-10-22T21:07:03Z

@mythi

@Aseeef it looks OCI (runc) knows about mount labels and that they can also be set via mount options. looks like something worth trying at some point.

Were you implying here to try to mount to efivarfs using the same selinux labels as the host? Could you clarify what you mean here?

mythi · 2025-10-23T08:33:51Z

@mythi

@Aseeef it looks OCI (runc) knows about mount labels and that they can also be set via mount options. looks like something worth trying at some point.

Were you implying here to try to mount to efivarfs using the same selinux labels as the host? Could you clarify what you mean here?

SELinux is not my area but, yes, that was my thinking:

$ cat /var/run/cdi/intel.cdi.k8s.io-sgx-efivarfs.yaml 
---
cdiVersion: 0.5.0
kind: intel.cdi.k8s.io/sgx
devices:
    - name: efivarfs
      containerEdits:
        mounts:
            - hostPath: efivarfs
              containerPath: /run/efivars
              options:
                - rw
                - nosuid
                - nodev
                - noexec
                - relatime
                - context="<labels>"
              type: efivarfs

but before testing anything we'd need to check if the container (and/or if some added labeling would be needed for that platform-registration container) is going to have the permissions to access the host efivarfs

Aseeef · 2025-10-23T16:58:41Z

@mythi I tried that and seems to result in a new error:
SELinux: duplicate or incompatible mount options

We probably are going to need someone who is an expert with SELinux for this one.

Aseeef · 2025-10-23T18:11:42Z

One more thing: The QGS socket is created with the owning user as root and permissions set to:
srwxr-xr-x. 1 root root 0 Oct 23 15:59 qgs.socket

Therefore, by default, the qemu user is not going to be able to access it. Not sure if this is something that should be addressed here in this PR though...

mythi · 2025-10-28T11:35:18Z

@mythi I tried that and seems to result in a new error: SELinux: duplicate or incompatible mount options

We probably are going to need someone who is an expert with SELinux for this one.

I'll try to play with this a bit since I now have an OCP cluster.

One more thing: The QGS socket is created with the owning user as root and permissions set to: srwxr-xr-x. 1 root root 0 Oct 23 15:59 qgs.socket

Therefore, by default, the qemu user is not going to be able to access it. Not sure if this is something that should be addressed here in this PR though...

Thanks! Redhat's build of QGS adds mode controls. I believe with virt-launcher we need a "shared" group ID for QEMU to be able to use the socket

mythi force-pushed the PR-2025-018 branch 2 times, most recently from 691dfe1 to 65ce152 Compare August 29, 2025 08:08

BFuhry reviewed Aug 29, 2025

View reviewed changes

mythi force-pushed the PR-2025-018 branch from 65ce152 to 23b73d6 Compare September 8, 2025 12:37

MatiasVara reviewed Sep 17, 2025

View reviewed changes

MatiasVara reviewed Sep 18, 2025

View reviewed changes

MatiasVara reviewed Sep 25, 2025

View reviewed changes

demo/sgx-pccs/default.json Outdated Show resolved Hide resolved

sgx: add automated DCAP registration using in-cluster PCCS caching

0d5e011

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>

mythi force-pushed the PR-2025-018 branch from 23b73d6 to 0d5e011 Compare September 26, 2025 05:27

Aseeef mentioned this pull request Oct 27, 2025

QGS Unix Socket kubevirt/kubevirt#15958

Open

8 tasks

	# token hashesh follow (with 'hellworld' changed to the desired secret tokens):
	# token hashesh follow (with 'helloworld' changed to the desired secret tokens):

sgx: add automated DCAP registration using in-cluster PCCS caching #2121

Are you sure you want to change the base?

sgx: add automated DCAP registration using in-cluster PCCS caching #2121

Uh oh!

Conversation

mythi commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BFuhry Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

MatiasVara Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

MatiasVara Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

MatiasVara commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MatiasVara commented Sep 25, 2025

Uh oh!

Uh oh!

mythi commented Sep 25, 2025

Uh oh!

Aseeef commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mythi commented Oct 18, 2025

Uh oh!

mythi commented Oct 18, 2025

Uh oh!

Aseeef commented Oct 18, 2025

Uh oh!

mythi commented Oct 20, 2025

Uh oh!

mythi commented Oct 21, 2025

Uh oh!

Aseeef commented Oct 21, 2025

Uh oh!

tkatila commented Oct 22, 2025

Uh oh!

mythi commented Oct 22, 2025

Uh oh!

tkatila commented Oct 22, 2025

Uh oh!

Aseeef commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mythi commented Oct 23, 2025

Uh oh!

Aseeef commented Oct 23, 2025

Uh oh!

Aseeef commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mythi commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mythi commented Aug 27, 2025 •

edited

Loading

MatiasVara commented Sep 18, 2025 •

edited

Loading

Aseeef commented Oct 16, 2025 •

edited

Loading

Aseeef commented Oct 22, 2025 •

edited

Loading

Aseeef commented Oct 23, 2025 •

edited

Loading