DEPA for Training is a techno-legal framework that enables privacy-preserving sharing of bulk, de-identified datasets for large scale analytics and training. This repository contains a reference implementation of Confidential Clean Rooms (CCR), which together with the Contract Service, forms the basis of this framework. The reference implementation is provided on an As-Is basis. It is work-in-progress and should not be used in production.
You can now try out DEPA-Training interactively using our interactive GUI demo. The demo requires a signed electronic contract and an Azure cloud subscription.
Start by setting up this project on GitHub Codespaces or your own development environment and then follow the instructions.
The simplest way to set up a development environment is using GitHub Codespaces. The repository includes a devcontainer.json, which customizes your codespace to install all required dependencies. Please ensure you allocate at least 8 vCPUs and 64GB disk space in your codespace. Also, run the following command in the codespace to update submodules.
git submodule update --init --recursiveAlternatively, you can build and develop locally in a Linux environment (we have tested with Ubuntu 20.04, 22.04, 24.04), or Windows with WSL 2.
Clone this repo to your local machine / virtual machine as follows.
git clone --recursive http://github.com/iSPIRT/depa-training
cd depa-trainingInstall the required dependencies by running the install-prerequisites.sh script.
./install-prerequisites.shNote: You may need to restart your machine to ensure that the changes take effect.
To build your own Confidential Cleanroom (CCR) container images, use the following command from the root of the repository.
./ci/build.shThis scripts build the following containers.
depa-training: Container with the core CCR logic for joining datasets and running differentially private training.depa-training-encfs: Container for loading encrypted data into the CCR.
Alternatively, you can pull and use pre-built container images from the iSPIRT container registry by setting the following environment variable. Docker hub has started throttling which may affect the upload/download time, especially when images are bigger size. So, It is advisable to use other container registries. We are using Azure container registry (ACR) as shown below:
export CONTAINER_REGISTRY=ispirt.azurecr.io
./ci/pull-containers.shThis repository contains sample demos illustrating a diverse set of scenarios that DEPA for Training can support.
Follow the links to build and deploy these scenarios.
| Scenario name | Scenario type | Task type | Privacy | No. of TDPs* | Data type (format) | Model type (format) | Join type (No. of datasets) |
|---|---|---|---|---|---|---|---|
| COVID-19 | Training - Deep Learning | Binary Classification | Differentially Private | 3 | PII tabular data (CSV) | MLP (ONNX) | Horizontal (3) |
| BraTS | Training - Deep Learning | Image Segmentation | Differentially Private | 4 | MRI scans data (NIfTI/PNG) | UNet (Safetensors) | Vertical (4) |
| Credit Risk | Training - Classical ML | Binary Classification | Differentially Private | 4 | PII tabular data (Parquet) | XGBoost (JSON) | Horizontal (6) |
| CIFAR-10 | Training - Deep Learning | Multi-class Image Classification | NA | 1 | Non-PII image data (SafeTensors) | CNN (Safetensors) | NA (1) |
| MNIST | Training - Deep Learning | Multi-class Image Classification | NA | 1 | Non-PII image data (HDF5) | CNN (ONNX) | NA (1) |
NA: Not Applicable
DL: Deep Learning, ML: Classical Machine Learning
*Training Data Providers (TDPs) involved in the scenario.
A guide to build your own scenarios is available here. Follow the steps to build and run your own unique training scenario!
Currently, DEPA for Training supports the following training frameworks, libraries and file formats (more will be included soon):
- Training frameworks: PyTorch, Scikit-learn, XGBoost
- Libraries: Opacus, PySpark, Pandas
- File formats (for models and datasets): ONNX, Safetensors, Parquet, CSV, HDF5, PNG
Note: Due to security reasons, we do not support Pickle based file formats such as .pkl, .pt/.pth, .npy/.npz, .joblib, etc.
This project welcomes feedback and contributions. Before you start, please take a moment to review our Contribution Guidelines. These guidelines provide information on how to contribute, set up your development environment, and submit your changes.
We look forward to your contributions and appreciate your efforts in making DEPA Training better for everyone.