Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
28a85da
Add dependencies for new feature `gpu`
itzmeanjan Mar 19, 2025
11e5b77
Use `u32` for matrix dimensions
itzmeanjan Mar 19, 2025
cfb9124
Add compute shader for matrix-matrix multiplication
itzmeanjan Mar 19, 2025
4cb8d97
Setup a Vulkan device and queue so that commands can be submitted to it
itzmeanjan Mar 19, 2025
4b9bac8
Setup gpu returns a memory allocator and command buffer allocator too
itzmeanjan Mar 19, 2025
9c2ba00
Given a matrix, returns a buffer with transfer-src flag set
itzmeanjan Mar 19, 2025
dce7da2
Add error enum for vulkan buffer creation failure
itzmeanjan Mar 19, 2025
2b8d84c
Simplify return in matrix to transfer source buffer function
itzmeanjan Mar 19, 2025
96552dc
Add function recording Vulkan buffer to buffer data transfer command
itzmeanjan Mar 19, 2025
0e21934
Make error type more explicit
itzmeanjan Mar 19, 2025
1b3c3bc
Add function to create empty Vulkan storage buffer
itzmeanjan Mar 19, 2025
e526074
Add function to submit transfer command buffer to queue and wait till…
itzmeanjan Mar 19, 2025
db5aca1
Rename error enum variant to be more generic
itzmeanjan Mar 19, 2025
8ff3965
Add function for computing number of bytes required to encode matrix
itzmeanjan Mar 19, 2025
9f4e0ea
Matrix-matrix multiplication command submission and execution on GPU …
itzmeanjan Mar 20, 2025
3d5757b
Reformat GLSL compute shader using clang-format
itzmeanjan Mar 20, 2025
1cc4806
Add matrix transpose compute shader
itzmeanjan Mar 20, 2025
98a0746
Submit and wait for matrix transpose job to finish on GPU
itzmeanjan Mar 20, 2025
679bc17
Fix matrix transpose shader
itzmeanjan Mar 20, 2025
3f33f81
Refactor function for transferring host matrix to device
itzmeanjan Mar 20, 2025
9b50f41
Maintain two different functions for host-accessible and device-local…
itzmeanjan Mar 20, 2025
450d7dc
Implementation server-setup phase for `gpu` feature
itzmeanjan Mar 20, 2025
3be9c22
Add row-vector transposed matrix multiplication compute shader
itzmeanjan Apr 1, 2025
40ba459
Implement server-respond function, using `gpu` feature
itzmeanjan Apr 1, 2025
ec4a802
Change work-group size for vector-matrix multiplication shader invoca…
itzmeanjan Apr 4, 2025
1d2ed91
Duplicate comment for `gpu` feature-gated version of `server-respond`…
itzmeanjan Apr 4, 2025
fe5ce49
Avoid computing vector-matrix multiplication on GPU during `server-re…
itzmeanjan Apr 5, 2025
1e391ad
Update project documentation mentioning about the `gpu` feature gate
itzmeanjan Apr 6, 2025
42a6736
Prepare for release v0.5.0
itzmeanjan Apr 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "chalamet_pir"
version = "0.4.0"
version = "0.5.0"
edition = "2024"
resolver = "2"
rust-version = "1.85.0"
Expand All @@ -9,14 +9,22 @@ description = "Simple, Stateful, Single-Server Private Information Retrieval for
readme = "README.md"
repository = "https://github.com/itzmeanjan/ChalametPIR.git"
license = "MPL-2.0"
keywords = ["priv-info-retrieval", "lwe-pir", "frodo-pir", "chalamet-pir"]
categories = ["cryptography", "data-structures"]
keywords = [
"priv-info-retrieval",
"lwe-pir",
"frodo-pir",
"chalamet-pir",
"gpu",
]
categories = ["cryptography", "data-structures", "concurrency"]

[dependencies]
turboshake = "=0.4.1"
rayon = "=1.10.0"
rand = "=0.9.0"
rand_chacha = "=0.9.0"
vulkano = { version = "=0.35.1", optional = true }
vulkano-shaders = { version = "=0.35.0", optional = true }

[dev-dependencies]
test-case = "=3.3.1"
Expand All @@ -34,6 +42,7 @@ required-features = ["mutate_internal_client_state"]

[features]
mutate_internal_client_state = []
gpu = ["dep:vulkano", "dep:vulkano-shaders"]

[profile.optimized]
inherits = "release"
Expand Down
55 changes: 38 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,12 @@ built on top of FrodoPIR - a practical, single-server, stateful LWE -based PIR s
- Binary Fuse Filter was proposed in https://arxiv.org/pdf/2201.01174.
- And ChalametPIR was proposed in https://ia.cr/2024/092.

ChalametPIR allows a client to retrieve a specific value from a key-value database on a server without revealing the requested key.
It uses Binary Fuse Filters to encode key-value pairs in form of a matrix. And then it applies FrodoPIR on the encoded database matrix
to actually retrieve values for requested keys.
ChalametPIR allows a client to retrieve a specific value from a key-value database, stored on a server, without revealing the requested key to the server. It uses Binary Fuse Filters to encode key-value pairs in form of a matrix. And then it applies FrodoPIR on the encoded database matrix to actually retrieve values for requested keys.

The protocol has two participants:

**Server:**
* **`setup`:** Initializes the server with a key-value database, generating a public matrix, a hint matrix, and a Binary Fuse Filter (3-wise XOR or 4-wise XOR, compile-time configurable). Returns serialized representations of the hint matrix and filter parameters. This phase can be completed in offline and it's completely client agnostic.
* **`setup`:** Initializes the server with a key-value database, generating a public matrix, a hint matrix, and a Binary Fuse Filter (3-wise XOR or 4-wise XOR, configurable at compile time). It returns serialized representations of the hint matrix and filter parameters. This phase can be completed offline and is completely client-agnostic. But it is very compute-intensive, which is why this library allows you to offload expensive matrix multiplication and transposition to a GPU, gated behind the opt-in `gpu` feature. For large key-value databases (e.g., with >= $2^{18}$ entries), I recommend enabling the `gpu` feature, as it can significantly reduce the cost of the server-setup phase.
* **`respond`:** Processes a client's query and returns an encrypted response vector.

**Client:**
Expand All @@ -28,8 +26,8 @@ To paint a more practical picture, imagine, we have a database with $2^{20}$ (~1

Machine Type | Machine | Kernel | Compiler | Memory Read Speed
--- | --- | --- | --- | ---
aarch64 server | AWS EC2 `m8g.8xlarge` | `Linux 6.8.0-1021-aws aarch64` | `rustc 1.84.1 (e71f9a9a9 2025-01-27)` | 28.25 GB/s
x86_64 server | AWS EC2 `m7i.8xlarge` | `Linux 6.8.0-1021-aws x86_64` | `rustc 1.84.1 (e71f9a9a9 2025-01-27)` | 10.33 GB/s
aarch64 server | AWS EC2 `m8g.8xlarge` | `Linux 6.8.0-1021-aws aarch64` | `rustc 1.85.1 (e71f9a9a9 2025-01-27)` | 28.25 GB/s
x86_64 server | AWS EC2 `m7i.8xlarge` | `Linux 6.8.0-1021-aws x86_64` | `rustc 1.85.1 (e71f9a9a9 2025-01-27)` | 10.33 GB/s

and this implementation of ChalametPIR is compiled with specified compiler, in `optimized` profile. See [Cargo.toml](./Cargo.toml).

Expand All @@ -44,22 +42,34 @@ Step | `(a)` Time Taken on `aarch64` server | `(b)` Time Taken on `x86_64` serve
`server_respond` | 18.01 milliseconds | 32.16 milliseconds | 0.56
`client_process_response` | 11.73 microseconds | 16.75 microseconds | 0.7

> [!NOTE]
> In above table, I show only the median timing measurements, while the DB is encoded using a 3 -wise XOR Binary Fuse Filter. For more results, with more database configurations, see benchmarking [section](#benchmarking) below.

So, the median bandwidth of the `server_respond` algorithm, which needs to traverse through the whole processed database, is
- (a) For `aarch64` server: 53.82 GB/s
- (b) For `x86_64` server: 30.12 GB/s

For demonstrating the effectiveness of offloading parts of the server-setup phase to a GPU, I benchmark it on AWS EC2 instance `g6e.8xlarge`, which features a NVIDIA L40S Tensor Core GPU and $3^{rd}$ generation AMD EPYC CPUs.

Number of entries in DB | Key length | Value length | `(a)` Time taken to setup PIR server on CPU | `(b)` Time taken to setup PIR server, partially offloading to GPU | Ratio `a / b`
:-- | --: | --: | --: | --: | --:
$2^{16}$ | 32B | 1kB | 19.55 seconds | 19.39 seconds | 1.0
$2^{18}$ | 32B | 1kB | 6.0 minutes | 2.23 minutes | 2.69
$2^{20}$ | 32B | 1kB | 25.89 minutes | 25.58 seconds | 60.72

For small key-value databases, it is not worth offloading server-setup to the GPU, but for databases with entries >= $2^{18}$, it is recommended to enable `gpu` feature, when GPU is available.

> [!NOTE]
> In both of above tables, I show only the median timing measurements, while the DB is encoded using a 3 -wise XOR Binary Fuse Filter. For more results, with more database configurations, see benchmarking [section](#benchmarking) below.

## Prerequisites
Rust stable toolchain; see https://rustup.rs for installation guide. MSRV for this crate is 1.84.0.
Rust stable toolchain; see https://rustup.rs for installation guide. MSRV for this crate is 1.85.0.

```bash
# While developing this library, I was using
$ rustc --version
rustc 1.84.1 (e71f9a9a9 2025-01-27)
rustc 1.85.1 (e71f9a9a9 2025-01-27)
```

If you plan to offload server-setup to GPU, you need to install Vulkan drivers and library for your target setup. I followed https://linux.how2shout.com/how-to-install-vulkan-on-ubuntu-24-04-or-22-04-lts-linux on Ubuntu 24.04 LTS, with Nvidia GPUs - it was easy to setup.

## Testing
The `chalamet_pir` library includes comprehensive tests to ensure functional correctness.

Expand All @@ -69,8 +79,12 @@ The `chalamet_pir` library includes comprehensive tests to ensure functional cor
To run the tests, go to the project's root directory and issue:

```bash
cargo test --profile test-release # Custom profile to make tests run faster!
# Default debug mode is too slow!
# Custom profile to make tests run faster!
# Default debug mode is too slow!
cargo test --profile test-release

# For testing if offloading to GPU works as expected.
cargo test --features gpu --profile test-release
```


Expand All @@ -80,9 +94,12 @@ Performance benchmarks are included to evaluate the efficiency of the PIR scheme
To run the benchmarks, execute the following command from the root of the project:

```bash
cargo bench --all-features --profile optimized # For benchmarking the online phase of the PIR,
# you need to enable feature `mutate_internal_client_state`,
# passing `--all-features` does that.
# For benchmarking the online phase of the PIR,
# you need to enable feature `mutate_internal_client_state`.
cargo bench --features mutate_internal_client_state --profile optimized

# For benchmarking only the server-setup phase, offloaded to the GPU.
cargo bench --features gpu --profile optimized --bench offline_phase -q server_setup
```

> [!WARNING]
Expand All @@ -101,7 +118,11 @@ First, add this library crate as a dependency in your Cargo.toml file.

```toml
[dependencies]
chalamet_pir = "=0.4.0"
chalamet_pir = "=0.5.0"
# Or, if you want to offload server-setup to a GPU.
# chalamet_pir = { version = "=0.5.0", features = ["gpu"] }
rand = "=0.9.0"
rand_chacha = "=0.9.0"
```

Then, let's code a very simple keyword PIR scheme:
Expand Down
37 changes: 37 additions & 0 deletions shaders/mat_transpose.glsl
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#version 460
#pragma shader_stage(compute)

layout(local_size_x = 8, local_size_y = 8, local_size_z = 1) in;

layout(set = 0, binding = 0) buffer readonly MatrixA {
uint rows;
uint cols;
uint[] elems;
}
matrix_a;

layout(set = 0, binding = 1) buffer writeonly MatrixB {
uint rows;
uint cols;
uint[] elems;
}
matrix_b;

void main() {
const uint row_idx = gl_GlobalInvocationID.x;
const uint col_idx = gl_GlobalInvocationID.y;

if (row_idx >= matrix_a.rows || col_idx >= matrix_a.cols) {
return;
}

if ((row_idx == 0) && (col_idx == 0)) {
matrix_b.rows = matrix_a.cols;
matrix_b.cols = matrix_a.rows;
}

const uint src_index = row_idx * matrix_a.cols + col_idx;
const uint dst_index = col_idx * matrix_a.rows + row_idx;

matrix_b.elems[dst_index] = matrix_a.elems[src_index];
}
47 changes: 47 additions & 0 deletions shaders/mat_x_mat.glsl
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#version 460
#pragma shader_stage(compute)

layout(local_size_x = 8, local_size_y = 8, local_size_z = 1) in;

layout(set = 0, binding = 0) buffer readonly MatrixA {
uint rows;
uint cols;
uint[] elems;
}
matrix_a;

layout(set = 0, binding = 1) buffer readonly MatrixB {
uint rows;
uint cols;
uint[] elems;
}
matrix_b;

layout(set = 0, binding = 2) buffer writeonly MatrixC {
uint rows;
uint cols;
uint[] elems;
}
matrix_c;

void main() {
const uint row_idx = gl_GlobalInvocationID.x;
const uint col_idx = gl_GlobalInvocationID.y;

if (row_idx >= matrix_a.rows || col_idx >= matrix_b.cols) {
return;
}

if ((row_idx == 0) && (col_idx == 0)) {
matrix_c.rows = matrix_a.rows;
matrix_c.cols = matrix_b.cols;
}

uint sum = 0;
for (uint i = 0; i < matrix_a.cols; i++) {
sum += matrix_a.elems[row_idx * matrix_a.cols + i] *
matrix_b.elems[i * matrix_b.cols + col_idx];
}

matrix_c.elems[row_idx * matrix_b.cols + col_idx] = sum;
}
4 changes: 2 additions & 2 deletions src/client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ impl Client {
let filter = BinaryFuseFilter::from_bytes(filter_param_bytes)?;

let pub_mat_a_num_rows = LWE_DIMENSION;
let pub_mat_a_num_cols = filter.num_fingerprints;
let pub_mat_a_num_cols = filter.num_fingerprints as u32;

let pub_mat_a = Matrix::generate_from_seed(pub_mat_a_num_rows, pub_mat_a_num_cols, seed_μ)?;
let hint_mat_m = Matrix::from_bytes(hint_bytes)?;
Expand Down Expand Up @@ -225,7 +225,7 @@ impl Client {
let hashed_key = binary_fuse_filter::hash_of_key(key);
let hash = binary_fuse_filter::mix256(&hashed_key, &self.filter.seed);

let recovered_row = (0..response_vector.num_cols())
let recovered_row = (0..response_vector.num_cols() as usize)
.map(|idx| {
let unscaled_res = response_vector[(0, idx)].wrapping_sub(secret_vec_c[(0, idx)]);

Expand Down
5 changes: 4 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
//! * **Secure Private Information Retrieval:** Allows clients to retrieve value from a PIR server without disclosing corresponding key. Server learns neither the value nor the queried key.
//! * **Error Handling:** Comprehensive error handling to catch and report issues during setup, query generation, and response processing.
//! * **Flexibility:** Supports both 3-wise and 4-wise XOR Binary Fuse Filters, allowing a choice between trade-offs in client/server computation and communication costs.
//! * **Efficient:** It supports offloading parts of the server-setup phase to a GPU, using Vulkan Compute API, which can drastically reduce time taken to setup PIR server, for large key-value databases.
//!
//! ## Usage
//!
Expand All @@ -18,7 +19,9 @@
//!
//! ```toml
//! [dependencies]
//! chalametpir = "=0.4.0"
//! chalametpir = "=0.5.0"
//! # Or, if you want to offload server-setup to GPU.
//! # chalamet_pir = { version = "=0.5.0", features = ["gpu"] }
//! rand = "=0.9.0"
//! rand_chacha = "=0.9.0"
//! ```
Expand Down
29 changes: 29 additions & 0 deletions src/pir_internals/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,21 @@ use std::{error::Error, fmt::Display};
/// It includes errors related to matrix operations, binary fuse filter operations, and PIR operations.
#[derive(Debug, PartialEq)]
pub enum ChalametPIRError {
// GPU
VulkanLibraryNotFound,
VulkanInstanceCreationFailed,
VulkanPhysicalDeviceNotFound,
VulkanDeviceCreationFailed,
VulkanBufferCreationFailed,
VulkanCommandBufferBuilderCreationFailed,
VulkanCommandBufferRecordingFailed,
VulkanCommandBufferBuildingFailed,
VulkanCommandBufferExecutionFailed,
VulkanReadingFromBufferFailed,
VulkanComputeShaderLoadingFailed,
VulkanComputePipelineCreationFailed,
VulkanDescriptorSetCreationFailed,

// Matrix
InvalidMatrixDimension,
IncompatibleDimensionForMatrixMultiplication,
Expand Down Expand Up @@ -36,6 +51,20 @@ pub enum ChalametPIRError {
impl Display for ChalametPIRError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Self::VulkanLibraryNotFound => write!(f, "Failed to load the default Vulkan library for the system."),
Self::VulkanInstanceCreationFailed => write!(f, "Failed to create a new instance of Vulkan."),
Self::VulkanPhysicalDeviceNotFound => write!(f, "Failed to find a compatible Vulkan physical device."),
Self::VulkanDeviceCreationFailed => write!(f, "Failed to create a Vulkan device and associated queue."),
Self::VulkanBufferCreationFailed => write!(f, "Failed to create a Vulkan transfer source buffer."),
Self::VulkanCommandBufferBuilderCreationFailed => write!(f, "Failed to create a Vulkan command buffer builder."),
Self::VulkanCommandBufferRecordingFailed => write!(f, "Failed to record command in a Vulkan command buffer."),
Self::VulkanCommandBufferBuildingFailed => write!(f, "Failed to build a Vulkan command buffer."),
Self::VulkanCommandBufferExecutionFailed => write!(f, "Failed to execute the Vulkan command buffer."),
Self::VulkanReadingFromBufferFailed => write!(f, "Failed to read from Vulkan buuffer."),
Self::VulkanComputeShaderLoadingFailed => write!(f, "Failed to load Vulkan compute shader module."),
Self::VulkanComputePipelineCreationFailed => write!(f, "Failed to create Vulkan compute pipeline."),
Self::VulkanDescriptorSetCreationFailed => write!(f, "Failed to create descriptor set for Vulkan compute pipeline."),

Self::InvalidMatrixDimension => write!(f, "The number of rows and columns in the matrix must be non-zero."),
Self::IncompatibleDimensionForMatrixMultiplication => write!(f, "The matrix dimensions do not allow multiplication."),
Self::IncompatibleDimensionForMatrixAddition => write!(f, "The matrix dimensions do not allow addition."),
Expand Down
Loading