Summary
Running SSEBootstrapGPU (and likely other bootstrap benchmarks) with
FIDESLIB_USE_NUM_GPUS > 1 yields ciphertexts whose post-decryption noise
exceeds OpenFHE's CKKS approximation tolerance. The same benchmark on the same
fixture/config passes cleanly single-GPU. This points to a precision problem
in FIDESlib's multi-GPU bootstrap pipeline (most likely the cross-GPU
reduce/aggregation step), not in the per-GPU compute.
Environment
- 4× NVIDIA RTX PRO 6000 Blackwell, P2P enabled between all pairs.
Reproduction
Single-GPU passes, multi-GPU fails on the same case:
# baseline — passes cleanly with bits=17 post-bootstrap precision
unset FIDESLIB_USE_NUM_GPUS
./build/fideslib-bench --benchmark_filter='GeneralFixture/SSEBootstrapGPU/18/0/100/11/'
# multi-GPU — same case, crashes during decode
FIDESLIB_USE_NUM_GPUS=4 ./build/fideslib-bench --benchmark_filter='GeneralFixture/SSEBootstrapGPU/18/0/100/11/'
Multi-GPU run output:
Adding bootstrap precomputation to GPU for 32768 slots.
Plaintexts loaded: 378 ~ 4347MB
GPU P2P? 1
Rotation keys loaded: 7 ~ 840MB
terminate called after throwing an instance of 'lbcrypto::OpenFHEException'
what(): ...ckkspackedencoding.cpp:l.453:Decode(): The decryption failed because
the approximation error is too high. Check the parameters.
Aborted (core dumped)
The benchmark itself completes its 50 iterations of FIDESlib::CKKS::Bootstrap
without error; the throw happens in the post-iteration verification block when
the bench calls cc->Decrypt(keys.secretKey, result, &result_pt).
Summary
Running
SSEBootstrapGPU(and likely other bootstrap benchmarks) withFIDESLIB_USE_NUM_GPUS > 1yields ciphertexts whose post-decryption noiseexceeds OpenFHE's CKKS approximation tolerance. The same benchmark on the same
fixture/config passes cleanly single-GPU. This points to a precision problem
in FIDESlib's multi-GPU bootstrap pipeline (most likely the cross-GPU
reduce/aggregation step), not in the per-GPU compute.
Environment
Reproduction
Single-GPU passes, multi-GPU fails on the same case:
Multi-GPU run output:
The benchmark itself completes its 50 iterations of
FIDESlib::CKKS::Bootstrapwithout error; the throw happens in the post-iteration verification block when
the bench calls
cc->Decrypt(keys.secretKey, result, &result_pt).