Description
In dpmm/src/dpmm/models/base/mechanisms/mechanism.py, the mechanism's domain is automatically inferred by calculating the maximum value of the raw input dataframe.
In a Differential Privacy (DP) context, any operation performed on the raw data must be accounted for in the privacy budget ($\epsilon, \delta$). Calculating a global max() is a non-private operation that leaks information about the dataset's tail distribution.
Location
File: dpmm/src/dpmm/models/base/mechanisms/mechanism.py
Line: 113
_domain = (df.astype(int).max(axis=0) + 1).to_dict()
By calculating df.max(axis=0) directly on the private dataframe, the specific maximum value of a sensitive attribute is revealed without any noise or privacy cost.
Suggested Patch
The domain should be treated as a hyperparameter or a private statistic. Consider one of the following approaches:
- User-Provided Bounds: Require the user to pass a domain or bounds argument derived from public knowledge or data schemas.
- Private Max Computation: Use a DP-compliant mechanism to find a noisy upper bound, and subtract the cost from the total privacy budget.
Description$\epsilon, \delta$ ). Calculating a global max() is a non-private operation that leaks information about the dataset's tail distribution.
In dpmm/src/dpmm/models/base/mechanisms/mechanism.py, the mechanism's domain is automatically inferred by calculating the maximum value of the raw input dataframe.
In a Differential Privacy (DP) context, any operation performed on the raw data must be accounted for in the privacy budget (
Location
File: dpmm/src/dpmm/models/base/mechanisms/mechanism.py
Line: 113
By calculating df.max(axis=0) directly on the private dataframe, the specific maximum value of a sensitive attribute is revealed without any noise or privacy cost.
Suggested Patch
The domain should be treated as a hyperparameter or a private statistic. Consider one of the following approaches: