This repository contains the code associated with the paper "Distributed HDMM: Scalable, Distributed, Accurate, and Differentially Private Query Workloads without a Trusted Curator" by Ratang Sedimo, Ivoline C. Ngong, Jami Lashua, and Joseph P. Near.
Distributed HDMM is a distributed protocol based on the High-Dimensional Matrix Mechanism (HDMM) for generating differentially private answers to query workloads. Distributed HDMM uses a secure aggregation protocol to achieve accuracy comparable to central-model HDMM without the need for a trusted curator, and scales to data distributed across thousands of clients.
Our implementation of Distributed HDMM uses the Olympia framework for experiments on secure aggregation protocols, and leverages the secure aggregation protocol due to Bell et al.. See the Olympia framework repository for more information on Olympia; briefly, you can install the requirements using pip install -r requirements.txt and then install Olympia with pip install ..
To run an experiment with the Distributed HDMM protocol, issue the following command:
python olympia.py -c bell_new.yaml
To modify the settings for the experiment, modify the config/bell_new.yaml file.
The scalability results from the paper are produced using the HDMM_Graphs.ipynb notebook in the scalability_results directory. The CSV files containing the results (generated using the process described above) to reproduce the graphs from the paper are also in that directory.
The utility results from the paper are produced using the scripts in the utility_results directory:
- The
utility_vs_corrupted.pyfile generates the graph showing Distributed HDMM's utility as the fraction of corrupted clients changes - The
utility_vs_baselines.pyfile generates the graph comparing Distributed HDMM's utility to local model and shuffle model alternatives
Both scripts require the installation of the hdmm package from the HDMM repository.