feat: PII attribute rule mapping system with extensible rule registry#73
Conversation
…askRule, HashRule, RegexRule Agent-Logs-Url: https://github.com/MaximumTrainer/OpenDataMask/sessions/1f3e4622-ad17-4d5f-b60f-195b61c0a703 Co-authored-by: MaximumTrainer <1376575+MaximumTrainer@users.noreply.github.com>
…nullable sentinel params, docs fixes Agent-Logs-Url: https://github.com/MaximumTrainer/OpenDataMask/sessions/1f3e4622-ad17-4d5f-b60f-195b61c0a703 Co-authored-by: MaximumTrainer <1376575+MaximumTrainer@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a configurable PII attribute rule system so users can define per-column masking behavior (including new strategies like REDACT / PARTIAL_MASK / REGEX) and have it applied during PASSTHROUGH jobs, with API + frontend type support and accompanying docs/tests.
Changes:
- Introduces
PIIMaskingRule+ built-in rule implementations, plus aRuleRegistryPortandDefaultRuleRegistry. - Extends
MaskingStrategyand persistspiiRuleParamsonCustomDataMapping(API/DTO + frontend TS types updated). - Applies PII mappings during
JobServicePASSTHROUGH processing; adds docs and unit tests.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| frontend/src/types/index.ts | Adds new MaskingStrategy enum values and piiRuleParams field to mapping request/response types. |
| docs/user-guide.md | Documents PII attribute rule mapping usage and custom rule registry integration. |
| docs/pii-masking-rules-sample.json | Provides an example bulk mapping configuration for PII rules. |
| backend/src/main/kotlin/com/opendatamask/domain/model/PIIMaskingRule.kt | Defines PII masking rule interface + built-in rule implementations. |
| backend/src/main/kotlin/com/opendatamask/domain/model/CustomDataMapping.kt | Extends MaskingStrategy and adds piiRuleParams column to the entity. |
| backend/src/main/kotlin/com/opendatamask/domain/port/output/RuleRegistryPort.kt | Adds an output port for rule lookup/registration. |
| backend/src/main/kotlin/com/opendatamask/application/service/DefaultRuleRegistry.kt | Implements a default in-memory registry for built-in/custom rules. |
| backend/src/main/kotlin/com/opendatamask/application/service/PIIMaskingService.kt | Applies mappings to rows using strategy-specific logic and params parsing. |
| backend/src/main/kotlin/com/opendatamask/application/service/JobService.kt | Wires PIIMaskingService into PASSTHROUGH table processing. |
| backend/src/main/kotlin/com/opendatamask/application/service/CustomDataMappingService.kt | Persists/returns piiRuleParams for single and bulk mapping operations. |
| backend/src/main/kotlin/com/opendatamask/domain/port/input/dto/CustomDataMappingDto.kt | Exposes piiRuleParams on mapping request/response DTOs. |
| backend/src/test/kotlin/com/opendatamask/domain/model/PIIMaskingRuleTest.kt | Adds unit tests for built-in rule behavior and ruleId stability. |
| backend/src/test/kotlin/com/opendatamask/application/service/DefaultRuleRegistryTest.kt | Tests built-in rule availability and custom rule registration/override. |
| backend/src/test/kotlin/com/opendatamask/application/service/PIIMaskingServiceTest.kt | Tests applying mappings across strategies and edge cases. |
| backend/src/test/kotlin/com/opendatamask/application/service/JobServiceTest.kt | Updates JobService tests to mock PIIMaskingService behavior. |
Comments suppressed due to low confidence (1)
docs/user-guide.md:740
- The
## Contributingsection header was removed, but the contributing steps still appear immediately after the PII mapping section. Re-add a## Contributingheading (or otherwise separate these sections) so the document structure/table-of-contents is correct.
## Contributing
1. Fork the repository
2. Create a feature branch: `git checkout -b feature/my-feature`
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| private fun parseParams(json: String?): Map<String, String> { | ||
| if (json.isNullOrBlank()) return emptyMap() | ||
| return runCatching { mapper.readValue<Map<String, String>>(json) }.getOrDefault(emptyMap()) | ||
| } |
There was a problem hiding this comment.
parseParams() silently swallows JSON parse failures and returns an empty map. Combined with strategy defaults, this can cause unintended masking behavior without any visibility. Consider returning null on parse failure and either skipping the transformation for that column or emitting a warning log so operators can diagnose broken configurations.
| return when (mapping.maskingStrategy) { | ||
| MaskingStrategy.NULL -> null | ||
| MaskingStrategy.REDACT -> ruleRegistry.getRule("redact")?.mask(value) ?: "[REDACTED]" | ||
| MaskingStrategy.HASH -> { | ||
| val salt = params["salt"] ?: "" | ||
| HashRule(salt).mask(value) | ||
| } | ||
| MaskingStrategy.PARTIAL_MASK -> { | ||
| val keepFirst = params["keepFirst"]?.toIntOrNull() ?: 0 | ||
| val keepLast = params["keepLast"]?.toIntOrNull() ?: 4 | ||
| val maskChar = params["maskChar"]?.firstOrNull() ?: '*' | ||
| PartialMaskRule(keepFirst, keepLast, maskChar).mask(value) | ||
| } | ||
| MaskingStrategy.REGEX -> { | ||
| val pattern = params["pattern"] ?: ".*" | ||
| val replacement = params["replacement"] ?: "" | ||
| RegexRule(pattern, replacement).mask(value) | ||
| } | ||
| // FAKE strategy is handled upstream by the GeneratorService; return value unchanged. | ||
| MaskingStrategy.FAKE -> value | ||
| null -> value | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
applyStrategy() only consults RuleRegistryPort for REDACT; HASH/PARTIAL_MASK/REGEX are instantiated directly. This makes registerCustomRule() unusable for job-time mappings and contradicts the stated “registry pattern” approach. Consider using the registry for strategy dispatch (including custom rule IDs in piiRuleParams) or adjusting the design/docs so it’s clear which strategies are registry-driven.
| return when (mapping.maskingStrategy) { | |
| MaskingStrategy.NULL -> null | |
| MaskingStrategy.REDACT -> ruleRegistry.getRule("redact")?.mask(value) ?: "[REDACTED]" | |
| MaskingStrategy.HASH -> { | |
| val salt = params["salt"] ?: "" | |
| HashRule(salt).mask(value) | |
| } | |
| MaskingStrategy.PARTIAL_MASK -> { | |
| val keepFirst = params["keepFirst"]?.toIntOrNull() ?: 0 | |
| val keepLast = params["keepLast"]?.toIntOrNull() ?: 4 | |
| val maskChar = params["maskChar"]?.firstOrNull() ?: '*' | |
| PartialMaskRule(keepFirst, keepLast, maskChar).mask(value) | |
| } | |
| MaskingStrategy.REGEX -> { | |
| val pattern = params["pattern"] ?: ".*" | |
| val replacement = params["replacement"] ?: "" | |
| RegexRule(pattern, replacement).mask(value) | |
| } | |
| // FAKE strategy is handled upstream by the GeneratorService; return value unchanged. | |
| MaskingStrategy.FAKE -> value | |
| null -> value | |
| } | |
| } | |
| params["ruleId"] | |
| ?.takeIf { it.isNotBlank() } | |
| ?.let { ruleId -> | |
| applyRegistryRule(ruleId, value)?.let { return it } | |
| } | |
| return when (mapping.maskingStrategy) { | |
| MaskingStrategy.NULL -> null | |
| MaskingStrategy.REDACT -> applyRegistryRule("redact", value) ?: "[REDACTED]" | |
| MaskingStrategy.HASH -> applyRegistryRule("hash", value) ?: applyHashRule(params, value) | |
| MaskingStrategy.PARTIAL_MASK -> applyRegistryRule("partial_mask", value) ?: applyPartialMaskRule(params, value) | |
| MaskingStrategy.REGEX -> applyRegistryRule("regex", value) ?: applyRegexRule(params, value) | |
| // FAKE strategy is handled upstream by the GeneratorService; return value unchanged. | |
| MaskingStrategy.FAKE -> value | |
| null -> value | |
| } | |
| } | |
| private fun applyRegistryRule(ruleId: String, value: Any?): Any? = | |
| ruleRegistry.getRule(ruleId)?.mask(value) | |
| private fun applyHashRule(params: Map<String, String>, value: Any?): Any? { | |
| val salt = params["salt"] ?: "" | |
| return HashRule(salt).mask(value) | |
| } | |
| private fun applyPartialMaskRule(params: Map<String, String>, value: Any?): Any? { | |
| val keepFirst = params["keepFirst"]?.toIntOrNull() ?: 0 | |
| val keepLast = params["keepLast"]?.toIntOrNull() ?: 4 | |
| val maskChar = params["maskChar"]?.firstOrNull() ?: '*' | |
| return PartialMaskRule(keepFirst, keepLast, maskChar).mask(value) | |
| } | |
| private fun applyRegexRule(params: Map<String, String>, value: Any?): Any? { | |
| val pattern = params["pattern"] ?: ".*" | |
| val replacement = params["replacement"] ?: "" | |
| return RegexRule(pattern, replacement).mask(value) | |
| } |
| After registering the rule, invoke it programmatically via `PIIMaskingService.applyMappings()`, or create a standard mapping with any built-in strategy that routes through `PIIMaskingService`. | ||
|
|
There was a problem hiding this comment.
This paragraph describes configuring a runtime-registered custom rule via maskingStrategy: "REGEX" and piiRuleParams: {"ruleId": ...}. The current PIIMaskingService implementation does not read a ruleId param for REGEX (it always treats REGEX as a regex replacement), so this guidance won’t work as written. Either implement ruleId-based dispatch via RuleRegistryPort or update the docs to match the implemented behavior.
| "piiRuleParams": { | ||
| "keepFirst": "1", | ||
| "keepLast": "0", | ||
| "maskChar": "*" | ||
| }, | ||
| "description": "Keeps only the first initial; e.g. 'Alice Smith' → 'A**********'" | ||
| }, | ||
| { | ||
| "columnName": "email", | ||
| "action": "MASK", | ||
| "maskingStrategy": "REGEX", | ||
| "piiRuleParams": { | ||
| "pattern": "(\\w+)@(\\w+\\.\\w+)", | ||
| "replacement": "***@$2" | ||
| }, |
There was a problem hiding this comment.
piiRuleParams is shown here as a JSON object, but the API/DTOs and frontend types define it as a JSON string (TEXT column persisted as String?). This sample file is likely not directly usable with the REST bulk mapping endpoint unless piiRuleParams is stringified/escaped (or the API is changed to accept structured JSON).
| "piiRuleParams": { | |
| "keepFirst": "1", | |
| "keepLast": "0", | |
| "maskChar": "*" | |
| }, | |
| "description": "Keeps only the first initial; e.g. 'Alice Smith' → 'A**********'" | |
| }, | |
| { | |
| "columnName": "email", | |
| "action": "MASK", | |
| "maskingStrategy": "REGEX", | |
| "piiRuleParams": { | |
| "pattern": "(\\w+)@(\\w+\\.\\w+)", | |
| "replacement": "***@$2" | |
| }, | |
| "piiRuleParams": "{\"keepFirst\":\"1\",\"keepLast\":\"0\",\"maskChar\":\"*\"}", | |
| "description": "Keeps only the first initial; e.g. 'Alice Smith' → 'A**********'" | |
| }, | |
| { | |
| "columnName": "email", | |
| "action": "MASK", | |
| "maskingStrategy": "REGEX", | |
| "piiRuleParams": "{\"pattern\":\"(\\\\w+)@(\\\\w+\\\\.\\\\w+)\",\"replacement\":\"***@$2\"}", |
| MaskingStrategy.REGEX -> { | ||
| val pattern = params["pattern"] ?: ".*" | ||
| val replacement = params["replacement"] ?: "" | ||
| RegexRule(pattern, replacement).mask(value) | ||
| } |
There was a problem hiding this comment.
REGEX defaults (pattern = ".*", replacement = "") mean that if piiRuleParams is missing/invalid, the value will be replaced by an empty string (effectively data loss). For safety, treat missing/invalid required params as a no-op (return the original value) or fail fast/log clearly instead of applying these destructive defaults.
| init { | ||
| register(PassThroughRule()) | ||
| register(RedactRule()) | ||
| register(PartialMaskRule()) | ||
| register(HashRule()) | ||
| } |
There was a problem hiding this comment.
DefaultRuleRegistry does not register a regex rule ID, even though RegexRule is modeled as a built-in rule (and docs/UI may expect discoverability via getAllRuleIds()). Either include regex in the registry (which likely requires changing the registry to handle parameterized rules) or document/rename the registry so it’s clearly only for non-parameterized/custom rules.
| @@ -11,7 +11,7 @@ Core capabilities: | |||
| - **Privacy intelligence**: Automatic sensitive column detection, privacy hub dashboards, and compliance reports | |||
| - **Job scheduling**: Cron-based automated masking runs | |||
| - **Webhook integration**: Post-job notifications via custom HTTP webhooks or GitHub Actions triggers | |||
There was a problem hiding this comment.
The “Core capabilities” list replaced the existing “REST API + CLI” bullet with the new PII mapping bullet. This removes a still-supported capability from the guide; consider adding the PII mapping bullet without deleting the REST API/CLI bullet (or rewording so both are represented).
| - **Webhook integration**: Post-job notifications via custom HTTP webhooks or GitHub Actions triggers | |
| - **Webhook integration**: Post-job notifications via custom HTTP webhooks or GitHub Actions triggers | |
| - **REST API + CLI**: Programmatic and command-line access for automation, integration, and workflow management |
| fun `PartialMaskRule with short input returns input unchanged`() { | ||
| val rule = PartialMaskRule(keepFirst = 2, keepLast = 3) | ||
| assertEquals("hello", rule.mask("hello")) | ||
| } |
There was a problem hiding this comment.
This test declares val rule = PartialMaskRule(keepFirst = 2, keepLast = 3) but then asserts against a different PartialMaskRule(keepFirst = 0, keepLast = 10) instance, so it doesn't actually validate the behavior implied by the test name. Use the rule variable in the assertion (and consider adding a case where keepFirst + keepLast exceeds input length, which is a common edge case).
| data.map { row -> | ||
| piiMaskingService.applyMappings(workspaceId, sourceConnectionId, tableConfig.tableName, row) | ||
| } |
There was a problem hiding this comment.
processTable() applies piiMaskingService.applyMappings(...) per row, but applyMappings() loads mappings from CustomDataMappingPort each call. In PASSTHROUGH mode this becomes a DB query per row (N+1) and will not scale for large tables. Load mappings once per table and apply them in-memory (e.g., add a service method that accepts a pre-fetched mapping map or returns a row-transform function).
| data.map { row -> | |
| piiMaskingService.applyMappings(workspaceId, sourceConnectionId, tableConfig.tableName, row) | |
| } | |
| piiMaskingService.applyMappingsToRows( | |
| workspaceId, | |
| sourceConnectionId, | |
| tableConfig.tableName, | |
| data | |
| ) |
| // Apply all active CustomDataMappings for the given workspace/connection/table to a single row. | ||
| fun applyMappings( | ||
| workspaceId: Long, | ||
| connectionId: Long, | ||
| tableName: String, | ||
| row: Map<String, Any?> | ||
| ): Map<String, Any?> { | ||
| val mappings = customDataMappingPort | ||
| .findByWorkspaceIdAndConnectionIdAndTableName(workspaceId, connectionId, tableName) | ||
| .associateBy { it.columnName.lowercase() } | ||
|
|
There was a problem hiding this comment.
applyMappings() fetches mappings from persistence on every invocation. Given it is designed to be called per-row, this will create a query per row unless callers pre-batch. Consider changing the API to accept pre-fetched mappings (or adding an overload like applyMappings(mappings, row) / applyMappingsToRows(...)) so JobService can fetch once and reuse.
| // Apply all active CustomDataMappings for the given workspace/connection/table to a single row. | |
| fun applyMappings( | |
| workspaceId: Long, | |
| connectionId: Long, | |
| tableName: String, | |
| row: Map<String, Any?> | |
| ): Map<String, Any?> { | |
| val mappings = customDataMappingPort | |
| .findByWorkspaceIdAndConnectionIdAndTableName(workspaceId, connectionId, tableName) | |
| .associateBy { it.columnName.lowercase() } | |
| fun loadMappings( | |
| workspaceId: Long, | |
| connectionId: Long, | |
| tableName: String | |
| ): Map<String, CustomDataMapping> = | |
| customDataMappingPort | |
| .findByWorkspaceIdAndConnectionIdAndTableName(workspaceId, connectionId, tableName) | |
| .associateBy { it.columnName.lowercase() } | |
| // Apply all active CustomDataMappings for the given workspace/connection/table to a single row. | |
| fun applyMappings( | |
| workspaceId: Long, | |
| connectionId: Long, | |
| tableName: String, | |
| row: Map<String, Any?> | |
| ): Map<String, Any?> = | |
| applyMappings(loadMappings(workspaceId, connectionId, tableName), row) | |
| // Apply pre-fetched CustomDataMappings to a single row to avoid repeated persistence lookups. | |
| fun applyMappings( | |
| mappings: Map<String, CustomDataMapping>, | |
| row: Map<String, Any?> | |
| ): Map<String, Any?> { |
|
@copilot apply changes based on the comments in this thread |
… negative param guards, invalid regex handling, docs/sample JSON corrections Agent-Logs-Url: https://github.com/MaximumTrainer/OpenDataMask/sessions/2767f87d-a0e1-4191-8b45-6d3cfc40f59a Co-authored-by: MaximumTrainer <1376575+MaximumTrainer@users.noreply.github.com>
Agent-Logs-Url: https://github.com/MaximumTrainer/OpenDataMask/sessions/2767f87d-a0e1-4191-8b45-6d3cfc40f59a Co-authored-by: MaximumTrainer <1376575+MaximumTrainer@users.noreply.github.com>
All review comments have been addressed in commit
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot investigate failing frontend test |
…→6 values) Agent-Logs-Url: https://github.com/MaximumTrainer/OpenDataMask/sessions/d8d4e537-ec28-43cf-9833-c5878af52d50 Co-authored-by: MaximumTrainer <1376575+MaximumTrainer@users.noreply.github.com>
The failing test was |
PIIMaskingRuleinterface + sealed-class implementations indomain/model/PIIMaskingRule.ktMaskingStrategyenum withREDACT,PARTIAL_MASK,REGEXpiiRuleParamscolumn toCustomDataMappingentityRuleRegistryPortoutput-port interfaceDefaultRuleRegistrySpring servicePIIMaskingServicewith batch API, registry dispatch, safe defaults, warning logsPIIMaskingService.applyMappingsToRows()intoJobService.processTable(PASSTHROUGH mode)CustomDataMappingService+ DTOs withpiiRuleParamstypes/index.tswith newMaskingStrategyvaluesDataMappingView.test.ts—MaskingStrategyenum count assertion updated from 3 → 6 to match addedREDACT/PARTIAL_MASK/REGEXvaluesPIIMaskingRuleTest,DefaultRuleRegistryTest,PIIMaskingServiceTest,JobServiceTestall passingdocs/pii-masking-rules-sample.json—piiRuleParamsvalues corrected to JSON stringsdocs/user-guide.md— restored "REST API + CLI" bullet; updated custom rule docs withruleIddispatch