feat: Data subsetting & attribute-level column filtering#68
feat: Data subsetting & attribute-level column filtering#68MaximumTrainer merged 3 commits intomainfrom
Conversation
- Add selectedAttributes field to TableConfiguration entity (stored as comma-separated TEXT) - Update DatabaseConnector.fetchData port signature with selectedAttributes parameter - Update PostgreSQL, AzureSQL, MySQL, MongoDB, File connectors to push column selection to source - Update DestinationSchemaService.mirrorSchema to filter schema by selectedAttributes - Update JobService to pass selectedAttributes to fetchData and mirrorSchema in all table modes - Update TableConfigurationRequest/Response DTOs with selectedAttributes field - Update TableConfigurationService to serialize/deserialize selectedAttributes list - Update WorkspaceConfigDto and WorkspaceExportService for export/import of selectedAttributes - Update frontend TypeScript types with selectedAttributes - Add column picker UI in TablesView.vue with comma-separated text input - Add unit tests for new selectedAttributes functionality - Fix existing test mocks to use updated 4-parameter fetchData signature - Update user-guide.md with subsetting and attribute filtering documentation Agent-Logs-Url: https://github.com/MaximumTrainer/OpenDataMask/sessions/9a3b22af-7e44-44d4-87a5-f806e0540af3 Co-authored-by: MaximumTrainer <1376575+MaximumTrainer@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds column-level extraction filtering (selectedAttributes) alongside existing row-level subsetting (whereClause), threading both through the API, persistence, job engine, schema mirroring, connectors, and the frontend Tables UI.
Changes:
- Extend table configuration DTOs/entity and export/import to support
selectedAttributesas a list (persisted as comma-separated text). - Push down column selection to source connectors (SQL
SELECT col1,.../ MongoDB projection / file row-map filtering) and mirror destination schema accordingly. - Update frontend types + table config modal to edit/display selected columns, and add/adjust tests for the new behavior/signature.
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| frontend/src/views/TablesView.vue | Adds UI input + payload normalization for selected columns and displays active column filters. |
| frontend/src/types/index.ts | Extends TS types and export DTO shape to include selectedAttributes. |
| docs/user-guide.md | Documents subsetting/WHERE and attribute-level column filtering plus UI instructions. |
| backend/src/main/kotlin/com/opendatamask/domain/port/output/DatabaseConnector.kt | Extends connector port with selectedAttributes parameter. |
| backend/src/main/kotlin/com/opendatamask/domain/port/input/dto/TableConfigurationDto.kt | Adds selectedAttributes to request/response DTOs. |
| backend/src/main/kotlin/com/opendatamask/domain/port/input/dto/WorkspaceConfigDto.kt | Adds selectedAttributes to table export DTO. |
| backend/src/main/kotlin/com/opendatamask/domain/model/TableConfiguration.kt | Persists selectedAttributes as comma-separated TEXT column. |
| backend/src/main/kotlin/com/opendatamask/application/service/TableConfigurationService.kt | Serializes/deserializes selectedAttributes between entity string and DTO list. |
| backend/src/main/kotlin/com/opendatamask/application/service/WorkspaceExportService.kt | Export/import handling for selectedAttributes. |
| backend/src/main/kotlin/com/opendatamask/application/service/JobService.kt | Threads selectedAttributes through schema mirroring and source reads across modes. |
| backend/src/main/kotlin/com/opendatamask/application/service/DestinationSchemaService.kt | Filters mirrored schema columns based on selectedAttributes (case-insensitive). |
| backend/src/main/kotlin/com/opendatamask/adapter/output/connector/PostgreSQLConnector.kt | Generates SELECT col1,col2 (quoted identifiers) when attributes are provided. |
| backend/src/main/kotlin/com/opendatamask/adapter/output/connector/AzureSQLConnector.kt | Adds TOP + bracket-quoted column selection pushdown. |
| backend/src/main/kotlin/com/opendatamask/adapter/output/connector/MySQLConnector.kt | Adds backtick-quoted column selection pushdown. |
| backend/src/main/kotlin/com/opendatamask/adapter/output/connector/MongoDBConnector.kt | Adds projection-based field selection. |
| backend/src/main/kotlin/com/opendatamask/adapter/output/connector/FileConnector.kt | Filters parsed row maps to selected keys (case-insensitive). |
| backend/src/test/kotlin/com/opendatamask/application/service/DestinationSchemaServiceTest.kt | New tests for selectedAttributes filtering, empty list, and case-insensitivity. |
| backend/src/test/kotlin/com/opendatamask/adapter/output/connector/PostgreSQLConnectorTest.kt | Adds tests for selectedAttributes and WHERE+columns combined. |
| backend/src/test/kotlin/com/opendatamask/application/service/TableConfigurationServiceTest.kt | Adds tests around selectedAttributes serialization/deserialization. |
| backend/src/test/kotlin/com/opendatamask/application/service/JobServiceTest.kt | Updates mocks/verifications for new fetchData signature. |
| backend/src/test/kotlin/com/opendatamask/application/service/SubsetExecutionServiceTest.kt | Updates fetchData mock to new signature. |
| backend/src/test/kotlin/com/opendatamask/application/service/SensitivityScanServiceTest.kt | Updates fetchData mock to new signature. |
| backend/src/test/kotlin/com/opendatamask/application/service/SensitivityScanLogServiceTest.kt | Updates fetchData mocks to new signature. |
| backend/src/test/kotlin/com/opendatamask/adapter/input/rest/TableConfigurationControllerTest.kt | Updates response factory for new DTO field. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Normalise: send null when no attributes selected (empty list = select all) | ||
| const payload: TableConfigurationRequest = { | ||
| ...tableForm.value, | ||
| selectedAttributes: | ||
| tableForm.value.selectedAttributes?.length ? tableForm.value.selectedAttributes : undefined | ||
| } |
There was a problem hiding this comment.
The comment says the UI will "send null" when no attributes are selected, but the code sets selectedAttributes to undefined (omits the field). Either update the comment to match the behavior, or explicitly send null if the backend relies on distinguishing null vs missing.
| val filteredColumns = if (selectedAttributes.isEmpty()) { | ||
| sourceColumns | ||
| } else { | ||
| val normalizedSelected = selectedAttributes.map { it.lowercase() }.toSet() | ||
| sourceColumns.filter { it.name.lowercase() in normalizedSelected } | ||
| } |
There was a problem hiding this comment.
mirrorSchema() can end up with filteredColumns empty when selectedAttributes doesn't match any source column names. This will call createTable() with 0 columns (invalid SQL in some connectors, and a silent no-op in MySQL), causing confusing job failures later. Consider validating that at least one column matched and fail fast with a clear error listing the unknown attributes (or fall back to all columns).
| val filteredColumns = if (selectedAttributes.isEmpty()) { | |
| sourceColumns | |
| } else { | |
| val normalizedSelected = selectedAttributes.map { it.lowercase() }.toSet() | |
| sourceColumns.filter { it.name.lowercase() in normalizedSelected } | |
| } | |
| val sourceColumnNames = sourceColumns.map { it.name } | |
| val sourceColumnNamesNormalized = sourceColumnNames.map { it.lowercase() }.toSet() | |
| val normalizedSelected = selectedAttributes.map { it.lowercase() }.toSet() | |
| val filteredColumns = if (selectedAttributes.isEmpty()) { | |
| sourceColumns | |
| } else { | |
| sourceColumns.filter { it.name.lowercase() in normalizedSelected } | |
| } | |
| if (selectedAttributes.isNotEmpty() && filteredColumns.isEmpty()) { | |
| val unknownAttributes = selectedAttributes.filter { it.lowercase() !in sourceColumnNamesNormalized } | |
| throw IllegalArgumentException( | |
| "No selected attributes matched source columns for table '$tableName'. " + | |
| "Unknown attributes: ${if (unknownAttributes.isEmpty()) selectedAttributes else unknownAttributes}. " + | |
| "Available columns: $sourceColumnNames" | |
| ) | |
| } |
| val selectPart = if (!selectedAttributes.isNullOrEmpty()) { | ||
| selectedAttributes.joinToString(", ") { "`$it`" } | ||
| } else { | ||
| "*" | ||
| } | ||
| val sql = buildString { | ||
| append("SELECT * FROM `$tableName`") | ||
| append("SELECT $selectPart FROM `$tableName`") | ||
| if (whereClause != null) append(" WHERE $whereClause") |
There was a problem hiding this comment.
MySQL column identifiers in selectedAttributes are interpolated as `$it` without escaping/sanitizing. This can break queries for columns containing backticks and expands the injection surface compared to other connectors that strip quote characters. Sanitize/escape backticks in column names (e.g., remove or double them) before building selectPart.
| var query = if (limit != null) collection.find(filter).limit(limit) else collection.find(filter) | ||
| if (!selectedAttributes.isNullOrEmpty()) { | ||
| val projection = Document().apply { | ||
| selectedAttributes.forEach { field -> put(field, 1) } |
There was a problem hiding this comment.
MongoDB projections that include specific fields will still return _id by default. With the current projection document, fetchData() will likely include _id even when it wasn't requested, which contradicts the "only requested columns" behavior used elsewhere. Consider explicitly excluding _id (set _id: 0) unless _id is present in selectedAttributes.
| selectedAttributes.forEach { field -> put(field, 1) } | |
| selectedAttributes.forEach { field -> put(field, 1) } | |
| if (!selectedAttributes.contains("_id")) { | |
| put("_id", 0) | |
| } |
| rowLimit = request.rowLimit, | ||
| whereClause = request.whereClause | ||
| ) | ||
| config.selectedAttributes = request.selectedAttributes?.filter { it.isNotBlank() }?.joinToString(",") | ||
| return tableConfigurationRepository.save(config).toResponse() |
There was a problem hiding this comment.
When request.selectedAttributes is an empty list, joinToString(",") produces an empty string, so the entity stores "" rather than null. This makes responses/export potentially return an empty list instead of null and complicates the "empty/omitted = all columns" contract. Consider normalizing to null when the filtered+trimmed list is empty (e.g., takeIf { it.isNotEmpty() }).
| table.mode = tableDto.mode | ||
| table.rowLimit = tableDto.rowLimit | ||
| table.whereClause = tableDto.whereClause | ||
| table.selectedAttributes = tableDto.selectedAttributes?.filter { it.isNotBlank() }?.joinToString(",") |
There was a problem hiding this comment.
Import currently persists selectedAttributes as a comma-separated string even when the list is empty, which results in an empty string instead of null. If the API contract is "null/empty = all columns", consider storing null when the incoming list is empty after trimming/filtering, to keep persistence/export consistent.
| table.selectedAttributes = tableDto.selectedAttributes?.filter { it.isNotBlank() }?.joinToString(",") | |
| val normalizedSelectedAttributes = tableDto.selectedAttributes | |
| ?.map { it.trim() } | |
| ?.filter { it.isNotBlank() } | |
| table.selectedAttributes = normalizedSelectedAttributes | |
| ?.takeIf { it.isNotEmpty() } | |
| ?.joinToString(",") |
| ) | ||
| whenever(tableConfigurationRepository.save(any<TableConfiguration>())).thenReturn(saved) | ||
|
|
||
| val response = service.createTableConfiguration(10L, request) | ||
|
|
There was a problem hiding this comment.
This test name says it verifies persistence, but it only asserts the response mapping from the mocked saved entity. Consider verifying the argument passed to tableConfigurationRepository.save(...) (e.g., capture and assert selectedAttributes == "id,name,email") so the test actually covers the serialization behavior.
| ) | |
| whenever(tableConfigurationRepository.save(any<TableConfiguration>())).thenReturn(saved) | |
| val response = service.createTableConfiguration(10L, request) | |
| ) | |
| val tableConfigurationCaptor = argumentCaptor<TableConfiguration>() | |
| whenever(tableConfigurationRepository.save(any<TableConfiguration>())).thenReturn(saved) | |
| val response = service.createTableConfiguration(10L, request) | |
| verify(tableConfigurationRepository).save(tableConfigurationCaptor.capture()) | |
| assertEquals("id,name,email", tableConfigurationCaptor.firstValue.selectedAttributes) |
|
@copilot apply changes based on the comments in this thread |
- Fix comment in TablesView.vue: omit field (undefined) rather than null when no columns selected
- Normalize empty selectedAttributes list to null in TableConfigurationService and WorkspaceExportService using takeIf { isNotEmpty() }, preventing empty-string storage
- Add fast-fail in DestinationSchemaService.mirrorSchema when selectedAttributes produces zero matching columns
- Escape backticks in MySQL column identifiers (double-backtick escaping) to prevent SQL injection
- Exclude _id from MongoDB projection unless explicitly requested in selectedAttributes
- Improve TableConfigurationServiceTest to capture save() argument and verify comma-separated serialization
- Add TableConfigurationServiceTest for empty-list normalization to null
- Add DestinationSchemaServiceTest for unmatched selectedAttributes fast-fail path
Agent-Logs-Url: https://github.com/MaximumTrainer/OpenDataMask/sessions/d68edfed-91a4-4db2-aa64-007b97334b8c
Co-authored-by: MaximumTrainer <1376575+MaximumTrainer@users.noreply.github.com>
Applied all 7 review comments in commit 0877d56:
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
selectedAttributesfield toTableConfigurationdomain modelDatabaseConnectorport'sfetchDatasignature withselectedAttributesparameterselectedAttributesin SELECT pushdownit.replace("", "``")`)_idfrom projection unless explicitly requestedDestinationSchemaService.mirrorSchemato filter columns byselectedAttributesIllegalArgumentExceptionwhenselectedAttributesmatches zero source columnsJobService.processTableto passselectedAttributestofetchDataandmirrorSchemaTableConfigurationRequest,TableConfigurationResponseDTOsTableConfigurationServiceto persist and returnselectedAttributesnullusingtakeIf { isNotEmpty() }WorkspaceExportServiceexport/import with same normalizationselectedAttributes