Skip to content

Conversation

@mythi
Copy link
Contributor

@mythi mythi commented Aug 27, 2025

This setup gives an automated "online, multi-platform, PCCS based Indirect Registration" and TDX QGS deployment for Kubernetes based clusters.

Building blocks:

  1. in-cluster PCCS caching service deployment
  2. PCKIDRetrievalTool sidecar and TDX QGS in a single daemonset

Pre-conditions:

Read the basics of Intel TDX remote attestation infrastructure setup and get an Intel PCS API Key. The node(s) have TDX and SGX enabled. The following also assumes that a user has cloned this PR and has a bare-metal cluster available.

Installation:

  1. Deploy SGX device plugin with the "DCAP infrastructure resources" enabled:

kubectl apply -k deployments/sgx_plugin/overlays/dcap-infra-resources

  1. Make the new (unpublished) images available to your cluster:
make sgx-pccs sgx-dcap-infra
docker save intel/sgx-pccs:devel | sudo ctr -n k8s.io i import -
docker save intel/sgx-dcap-infra:devel | sudo ctr -n k8s.io i import -
  1. Deploy PCCS
pushd deployments/sgx_dcap/pccs
<check notes in kustomization.yaml to populate .env.pccs-tokens>
kubectl apply -k .
popd

NB: if a proxy setting is needed, edit pccs.yaml

  1. Deploy platform-registration and TDX QGS
pushd deployments/sgx_dcap/base
<get your USER_TOKEN to .env.pccs-credentials>
kubectl apply -k .
popd

NB: add nodeSelector to filter SGX/TDX enabled nodes if run in a multi-node cluster

  1. Check things are up:
NAME                               READY   STATUS    RESTARTS   AGE
intel-dcap-node-infra-q9zms        2/2     Running   0          17h
intel-dcap-pccs-647568f67d-ftjb2   1/1     Running   0          17h
intel-sgx-plugin-bgfgv             1/1     Running   0          17h
$ kubectl logs -c platform-registration intel-dcap-node-infra-q9zms 
Waiting for the PCCS to be ready ...
PCCS is online, proceeding ...
Calling PCKIDRetrievalTool ...

Intel(R) Software Guard Extensions PCK Cert ID Retrieval Tool Version 1.23.100.0

Registration status has been set to completed status.
the data has been sent to cache server successfully!

The node should have /var/run/tdx-qgs/qgs.socket available for QEMU to connect.

Notes:

PCCS database is stored to a RAM based EmptyDir volume and currently not backed up (a backup mechanism is added later). Keep the PCCS deployment up. If quoting errors occur, full re-install after an SGX Factory Reset might be required.

@mythi mythi force-pushed the PR-2025-018 branch 2 times, most recently from 691dfe1 to 65ce152 Compare August 29, 2025 08:08
Comment on lines +9 to +16
WORKDIR /opt/intel

ARG SGX_SDK_URL=https://download.01.org/intel-sgx/sgx-linux/2.26/distro/ubuntu24.04-server/sgx_linux_x64_sdk_2.26.100.0.bin

RUN curl -sSLfO ${SGX_SDK_URL} \
&& export SGX_SDK_INSTALLER=$(basename $SGX_SDK_URL) \
&& chmod +x $SGX_SDK_INSTALLER \
&& echo "yes" | ./$SGX_SDK_INSTALLER \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All probs to @ScottR-Intel for: sudo ./sgx_linux_x64_sdk_2.26.100.0.bin --prefix /opt/intel


# self-signed TLS certs for pccs-tls:
# openssl req -x509 -sha256 -nodes -days 365 -newkey rsa:2048 -keyout private.pem -out file.crt -subj "/C=US/ST=Denial/L=Springfield/O=Dis/CN=www.example.com"
# token hashesh follow (with 'hellworld' changed to the desired secret tokens):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# token hashesh follow (with 'hellworld' changed to the desired secret tokens):
# token hashesh follow (with 'helloworld' changed to the desired secret tokens):

name: pccs-credentials
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to change this to make it work:

-          allowPrivilegeEscalation: false
+          privileged: true
+          allowPrivilegeEscalation: true

I think that for socket device plugins the container that exposes the unix socket shall be privileged (see the pr-helper example).

@MatiasVara
Copy link

MatiasVara commented Sep 18, 2025

I tried and the unix socket is not visible from the virt-launcher. This means that we still need something like a socket device plugin in the virt-handler to mount it. I do not think this is the place for that PR though.

@MatiasVara
Copy link

I observe that when remove the qgs pod, the new instance fails because the unix socket still exists. I think the unix socket should be removed when the pod is removed otherwise the unix should be removed manually.

@mythi
Copy link
Contributor Author

mythi commented Sep 25, 2025

I think the unix socket should be removed when the pod is removed otherwise the unix should be removed manually.

I saw this too and reported a bug to QGS about this. I need to see if I can workaround that in the mean time.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
@Aseeef
Copy link

Aseeef commented Oct 16, 2025

Hi I've been experimenting with this PR and had much of the same issues as @MatiasVara on OpenShift. Most of these issues seem to be directly related to SELinux.

I wanted to share a case of SELinux denial which had me scratching my head for a while involving the mounting failures of the EFI variables.

When the platform-registration container requests the "sgx.intel.com/registration" resource, that triggers a mount (as per the specifications of the device plugin in cmd/sgx_plugin/sgx_plugin.go). Consequently, the sgx plugin pod tries to write to /var/run/cdi/ to create the CDI spec file that tells the container runtime to mount efivars into the container resulting in the following:

time->Thu Oct 16 20:01:53 2025
type=PROCTITLE msg=audit(1760644913.951:8108): proctitle=2F7573722F6C6F63616C2F62696E2F696E74656C5F7367785F6465766963655F706C7567696E002D646361702D696E6672612D7265736F7572636573
type=SYSCALL msg=audit(1760644913.951:8108): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c0002e0040 a2=80000 a3=0 items=0 ppid=1637961 pid=1637963 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="intel_sgx_devic" exe="/usr/local/bin/intel_sgx_device_plugin" subj=system_u:system_r:container_device_plugin_t:s0:c170,c921 key=(null)
type=AVC msg=audit(1760644913.951:8108): avc:  denied  { read } for  pid=1637963 comm="intel_sgx_devic" name="cdi" dev="tmpfs" ino=4248 scontext=system_u:system_r:container_device_plugin_t:s0:c170,c921 tcontext=system_u:object_r:container_var_run_t:s0 tclass=dir permissive=0

But then there appears to be a second more fundamental conflict with SELinux here. Even if you turn on permissive mode (setenforce 0), mounting still fails because:
SELinux: mount invalid. Same superblock, different security settings for (dev efivarfs, type efivarfs)
My interpretation of this error is that this happens because SELinux still tries to label the efivarfs file system with the context of the container. But then that conflicts with the context the host had the efivarfs file mounted with, resulting in another mount failure (despite SELinux enforcement being disabled).

The fix to this was to modify cmd/sgx_plugin/sgx_plugin.go to create a bind mount from the host to the container instead of trying to mount efivars a second time. This can be achieved by modifying...

Devices: []cdispec.Device{
			{
				Name: "efivarfs",
				ContainerEdits: cdispec.ContainerEdits{
					Mounts: []*cdispec.Mount{
						{HostPath: "efivarfs", ContainerPath: "/run/efivars", Type: "efivarfs", Options: []string{"rw", "nosuid", "nodev", "noexec", "relatime"}},
					},
				},
			},
		},

to

Devices: []cdispec.Device{
			{
				Name: "efivarfs",
				ContainerEdits: cdispec.ContainerEdits{
					Mounts: []*cdispec.Mount{
						{HostPath: "/sys/firmware/efi/efivars", ContainerPath: "/run/efivars", Type: "none", Options: []string{"bind", "rw", "nosuid", "nodev", "noexec", "relatime"}},
					},
				},
			},
		},

@mythi
Copy link
Contributor Author

mythi commented Oct 18, 2025

The fix to this was to modify cmd/sgx_plugin/sgx_plugin.go to create a bind mount from the host to the container instead of trying to mount efivars a second time. This can be achieved by modifying...

@Aseeef thanksfor the detailed investigation. This approach does not work with containerd because it sets /sys/firmware as a maskedPath so even if the bind mount is done, the container does not get to see the path unless it's run with privileged: true which I wanted to avoid. I need to take a closer look if there's anything I can do with SELinux.

@mythi
Copy link
Contributor Author

mythi commented Oct 18, 2025

Consequently, the sgx plugin pod tries to write to /var/run/cdi/ to create the CDI spec file that tells the container runtime to mount efivars into the container resulting in the following:

time->Thu Oct 16 20:01:53 2025
type=PROCTITLE msg=audit(1760644913.951:8108): proctitle=2F7573722F6C6F63616C2F62696E2F696E74656C5F7367785F6465766963655F706C7567696E002D646361702D696E6672612D7265736F7572636573
type=SYSCALL msg=audit(1760644913.951:8108): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c0002e0040 a2=80000 a3=0 items=0 ppid=1637961 pid=1637963 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="intel_sgx_devic" exe="/usr/local/bin/intel_sgx_device_plugin" subj=system_u:system_r:container_device_plugin_t:s0:c170,c921 key=(null)
type=AVC msg=audit(1760644913.951:8108): avc:  denied  { read } for  pid=1637963 comm="intel_sgx_devic" name="cdi" dev="tmpfs" ino=4248 scontext=system_u:system_r:container_device_plugin_t:s0:c170,c921 tcontext=system_u:object_r:container_var_run_t:s0 tclass=dir permissive=0

@Aseeef Does it mean the container runtime cannot read the file the plugin wrote?

@Aseeef
Copy link

Aseeef commented Oct 18, 2025

Based on this part pid=1637963 comm="intel_sgx_devic" I think the issue happens earlier: The pod fails to write to CDI. I notice that the log I pasted says it fails { read }, but the next SELinux log also fails { write }.

time->Thu Oct 16 20:01:53 2025
type=PROCTITLE msg=audit(1760644913.951:8109): proctitle=2F7573722F6C6F63616C2F62696E2F696E74656C5F7367785F6465766963655F706C7567696E002D646361702D696E6672612D7265736F7572636573
type=SYSCALL msg=audit(1760644913.951:8109): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c0000c6180 a2=800c2 a3=180 items=0 ppid=1637961 pid=1637963 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="intel_sgx_devic" exe="/usr/local/bin/intel_sgx_device_plugin" subj=system_u:system_r:container_device_plugin_t:s0:c170,c921 key=(null)
type=AVC msg=audit(1760644913.951:8109): avc:  denied  { write } for  pid=1637963 comm="intel_sgx_devic" name="cdi" dev="tmpfs" ino=4248 scontext=system_u:system_r:container_device_plugin_t:s0:c170,c921 tcontext=system_u:object_r:container_var_run_t:s0 tclass=dir permissive=0

@mythi
Copy link
Contributor Author

mythi commented Oct 20, 2025

The pod fails to write to CDI.

We have other plugins doing the same. So far we have not heard this would be a problem on OCP but something we need to double check @tkatila

@mythi
Copy link
Contributor Author

mythi commented Oct 21, 2025

SELinux: mount invalid. Same superblock, different security settings for (dev efivarfs, type efivarfs)
My interpretation of this error is that this happens because SELinux still tries to label the efivarfs file system with the context of the container. But then that conflicts with the context the host had the efivarfs file mounted with, resulting in another mount failure (despite SELinux enforcement being disabled).

@Aseeef it looks OCI (runc) knows about mount labels and that they can also be set via mount options. looks like something worth trying at some point.

@Aseeef
Copy link

Aseeef commented Oct 21, 2025

@Aseeef it looks OCI (runc) knows about mount labels and that they can also be set via mount options. looks like something worth trying at some point.

Thanks, I'll look into it!

@tkatila
Copy link
Contributor

tkatila commented Oct 22, 2025

We have other plugins doing the same. So far we have not heard this would be a problem on OCP but something we need to double check @tkatila

There are two plugins using CDI somehow: FPGA and GPU. FPGA is not supported in OCP. GPU is supported in OCP but its functionality doesn't depend on CDI, it's just using it along with the generic device plugin API. It could be that CDI fails there also, but as it doesn't affect the execution, and no-one has noticed.

@mythi
Copy link
Contributor Author

mythi commented Oct 22, 2025

It could be that CDI fails there also, but as it doesn't affect the execution, and no-one has noticed.

getting a bit offtopic but I wonder if DRA drivers are impacted by the same?

@tkatila
Copy link
Contributor

tkatila commented Oct 22, 2025

It could be that CDI fails there also, but as it doesn't affect the execution, and no-one has noticed.

getting a bit offtopic but I wonder if DRA drivers are impacted by the same?

I believe DRA is being enabled in the upcoming OCP 4.20. Hopefully they are detecting these SELinux restrictions.. fyi @byako

@Aseeef
Copy link

Aseeef commented Oct 22, 2025

@mythi

@Aseeef it looks OCI (runc) knows about mount labels and that they can also be set via mount options. looks like something worth trying at some point.

Were you implying here to try to mount to efivarfs using the same selinux labels as the host? Could you clarify what you mean here?

@mythi
Copy link
Contributor Author

mythi commented Oct 23, 2025

@mythi

@Aseeef it looks OCI (runc) knows about mount labels and that they can also be set via mount options. looks like something worth trying at some point.

Were you implying here to try to mount to efivarfs using the same selinux labels as the host? Could you clarify what you mean here?

SELinux is not my area but, yes, that was my thinking:

$ cat /var/run/cdi/intel.cdi.k8s.io-sgx-efivarfs.yaml 
---
cdiVersion: 0.5.0
kind: intel.cdi.k8s.io/sgx
devices:
    - name: efivarfs
      containerEdits:
        mounts:
            - hostPath: efivarfs
              containerPath: /run/efivars
              options:
                - rw
                - nosuid
                - nodev
                - noexec
                - relatime
                - context="<labels>"
              type: efivarfs

but before testing anything we'd need to check if the container (and/or if some added labeling would be needed for that platform-registration container) is going to have the permissions to access the host efivarfs

@Aseeef
Copy link

Aseeef commented Oct 23, 2025

@mythi I tried that and seems to result in a new error:
SELinux: duplicate or incompatible mount options

We probably are going to need someone who is an expert with SELinux for this one.

@Aseeef
Copy link

Aseeef commented Oct 23, 2025

One more thing: The QGS socket is created with the owning user as root and permissions set to:
srwxr-xr-x. 1 root root 0 Oct 23 15:59 qgs.socket

Therefore, by default, the qemu user is not going to be able to access it. Not sure if this is something that should be addressed here in this PR though...

@Aseeef Aseeef mentioned this pull request Oct 27, 2025
8 tasks
@mythi
Copy link
Contributor Author

mythi commented Oct 28, 2025

@mythi I tried that and seems to result in a new error: SELinux: duplicate or incompatible mount options

We probably are going to need someone who is an expert with SELinux for this one.

I'll try to play with this a bit since I now have an OCP cluster.

One more thing: The QGS socket is created with the owning user as root and permissions set to: srwxr-xr-x. 1 root root 0 Oct 23 15:59 qgs.socket

Therefore, by default, the qemu user is not going to be able to access it. Not sure if this is something that should be addressed here in this PR though...

Thanks! Redhat's build of QGS adds mode controls. I believe with virt-launcher we need a "shared" group ID for QEMU to be able to use the socket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants