Skip to content

Integrate go-p11-kit into Pelican and use it to sign TLS connection#2706

Merged
patrickbrophy merged 14 commits intoPelicanPlatform:mainfrom
h2zh:pkcs11
Dec 19, 2025
Merged

Integrate go-p11-kit into Pelican and use it to sign TLS connection#2706
patrickbrophy merged 14 commits intoPelicanPlatform:mainfrom
h2zh:pkcs11

Conversation

@h2zh
Copy link
Contributor

@h2zh h2zh commented Sep 26, 2025

Update: Rewrote this PR description so it now walks through how Pelican and XRootD work together to deliver full PKCS#11-based TLS signing end to end.

This PR wires XRootD’s OpenSSL TLS layer into Pelican’s PKCS#11 helper so the TLS private key never has to be read by the xrootd process. Instead, every TLS signature goes through a PKCS#11 URI backed by go-p11-kit.

Highlights

  • The private key never leaves Pelican process. Only “sign this” requests and resulting signatures traverse the socket.
  • OpenSSL reads only the certificate chain from -cert; the key is referenced via the pkcs11: URI.
  • By default, the value of Server_EnablePKCS11 is set to false because this feature is under development. But when this feature is enabled in prod, I plan to switch the default to true to enforce this security enhancement.

param.Server_EnablePKCS11.GetBool() means if the server admin intends to enable PKCS11
pkcs11Info.Enabled represents if PKCS11 is really enabled in Pelican (e.g. it could be disabled during runtime due to the lack of dependency)

Technical details

pkcs11 drawio

  1. Client opens TCP to xrootd exactly as before.
  2. XRootD’s OpenSSL context loads only the certificate; but the key path is a PKCS#11 URI (pkcs11:token=pelican-tls;object=server-key;type=private).
  3. When OpenSSL needs to sign, the pkcs11 engine (libengine-pkcs11-openssl + libp11) looks up PKCS11_MODULE_PATH (= /usr/lib64/pkcs11/p11-kit-client.so) and connects to the p11-kit RPC socket provided by Pelican via P11_KIT_SERVER_ADDRESS.
  4. Pelican’s p11proxy goroutine accepts that connection, exposes the in-memory key via go-p11-kit, and fulfils the C_Sign request with the real TLS key (which never leaves Pelican’s process memory).

How to test

Rebuild XRootD with this counterpart PR in pelicanplatform/xrood repo: PelicanPlatform/xrootd#42

Install necessary libraries to connect XRootD to Pelican's p11proxy

dnf install -y openssl-pkcs11 p11-kit-server

The dependency chain explanation:
XRootD (C++) 
   ↓ uses
OpenSSL library (from openssl package, which is already bundled in Pelican)
   ↓ loads engine
PKCS#11 engine (from openssl-pkcs11 package: /usr/lib64/engines-3/pkcs11.so)
   ↓ uses
p11-kit client library (from p11-kit-server package: /usr/lib64/pkcs11/p11-kit-client.so)
   ↓ connects via Unix socket
Pelican's p11proxy server (Go code)
   ↓ provides
Signature signed by TLS key in the pelican process

(TODO: Once all reviewers approve this PR, I @h2zh need to contact BrianA & Mat to bundle these new libraries into the Pelican image build & other build workflows (KOJI/OSG/...)

Modify config parameter in pelican.yaml to enable the built-in p11 proxy in Pelican.

Server:
  EnablePKCS11: true

Make the TLS key unreadable to the xrootd user. Example:

chmod 600 /etc/pelican/certificates/tls.key
chown root:root /etc/pelican/certificates/tls.key

Spin up Pelican Registry, Director, Origin/Cache. Run Origin/Cache with verbose p11-kit logging (e.g. P11_KIT_DEBUG=rpc pelican origin serve -p 8447)

If this PR works, you can see lines as below in the Origin logs. It means pkcs11 is supplying the key and xrootd no longer need read access to the PEM TLS key file.

(p11-kit:PID) rpc_unix_init: initialized rpc socket: ...
(p11-kit:PID) rpc_C_GetSlotList: C_GetSlotList: enter ...

p11proxy: accepted connection ...

@h2zh h2zh requested a review from bbockelm September 26, 2025 22:02
@h2zh h2zh added cache Issue relating to the cache component origin Issue relating to the origin component security labels Sep 26, 2025
@h2zh h2zh linked an issue Sep 26, 2025 that may be closed by this pull request
Copy link
Contributor

@patrickbrophy patrickbrophy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just some of the things I caught looking at the code. On a broader view, was the design always supposed to be that the service admin needs to run the commands in another shell?

@h2zh
Copy link
Contributor Author

h2zh commented Oct 27, 2025

This is just some of the things I caught looking at the code. On a broader view, was the design always supposed to be that the service admin needs to run the commands in another shell?

This is a good point - the goal of this PR is "Pelican as a p11 server" so that the consequent XRootD development could be done. The testing recipe above provides a way to validate if Pelican can work as a p11 server.

Copy link
Contributor

@patrickbrophy patrickbrophy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good @h2zh! LGTM

@h2zh h2zh requested a review from patrickbrophy December 11, 2025 23:48
@h2zh h2zh force-pushed the pkcs11 branch 2 times, most recently from d0e2e83 to 28cee02 Compare December 18, 2025 22:32
h2zh added 13 commits December 19, 2025 20:50
- New pelican/p11proxy with Start/Stop, Info, Options.
- Serve a PKCS#11 slot via p11kit.Handler exposing a private key (Go crypto.Signer) and X.509 cert.
- Autodetect engine (pkcs11.so) and module (p11-kit-client.so) and write OPENSSL_CONF.
- Generate pkcs11: URI and export P11_KIT_SERVER_ADDRESS; never write private keys to disk; cleanup socket/temp on stop.
- Start helper early in origin_serve.go and cache_serve.go; auto-disable with WARN if deps missing.
Consolidate logics
Set the default of Server_EnablePKCS11 to false because this is still in the dev process
Initialize PKCS#11 helper after the defaults are set up

Remove the unused cancel context in CacheServe and OriginServe.
Log cleanup errors in p11proxy.Stop().
Log errors from the p11-kit handler goroutine.
Harden the socket path to avoid predictable paths.

Hand test / unit test pass after this change
XRootD launchers and tests can now discover helper state without re-instantiating it, and the socket handling is safe against stale files.
- Maintain the helper’s last-known Info under a RWMutex, expose it via CurrentInfo
- Ensure the socket directory exists, pick a unique socket name, and clear stale sockets before binding, avoiding conflicts when multiple modules start, or when an old file is still lying around
- Plumb PKCS#11 helper-derived env vars (P11_KIT_SERVER_ADDRESS, OPENSSL_CONF), and stop injecting a client key file into XRootD when the helper is active
- Right before we render the XRootD template, `ConfigXrootd` now rewrites  `xrdConfig.Server.{TLSCertificateChain, TLSKey}` fields
-- When PKCS#11 is off, both fields are set to the same runtime file (which still contains cert+key), so the template produces the same path it used to.
-- When PKCS#11 is on, Server.TLSCertificateChain stays pointed at the runtime cert file (no key), while Server.TLSKey is swapped out for the pkcs11 URI (pkcs11:...). That URI needs to reach XRootD so it can use the engine to sign with the helper.
-- Note: the template sees whatever is in the `XrootdConfig` struct at render time, not the raw config parameter.
- Unit test: assert the runtime file only contains cert material when pkcs11 signing is active
- expose the detected p11-kit client module path via p11proxy.Info, simplify
  error capture in Proxy.Stop
- harden socket setup/IDs/logging in the RPC server
- propagate PKCS11_MODULE_PATH plus more detailed trace logging to both the
  unprivileged launcher and the privileged linux launcher
- preserve the parent environment
- Configure Federation.DiscoveryUrl by using the new helper function `test_utils.MockFederationRoot(t, nil, nil)` introduced by PR PelicanPlatform#2747 - Use federation's discovery URL as token issuer for Director/Federation tests
- Since the Proxy handles its own cleanup internally via the context, there's no need for the caller to pass in an errgroup to manage cleanup goroutines
- Made the entire Stop() method idempotent by adding a stopped flag and mutex, so multiple calls to Stop() won't cause issues
- Made socket file removal idempotent by checking if the error is "file not exists" before logging a warnin
- Added special handling for EOF errors in the p11proxy handler. During shutdown, when connections are closed, EOF errors are expected and should not be logged as warnings
@patrickbrophy patrickbrophy merged commit f56caec into PelicanPlatform:main Dec 19, 2025
28 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cache Issue relating to the cache component origin Issue relating to the origin component security

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove read access to server key from XRootD

2 participants