Skip to content

feat: Data subsetting & attribute-level column filtering#68

Merged
MaximumTrainer merged 3 commits intomainfrom
copilot/implement-data-subsetting-attributes
Apr 16, 2026
Merged

feat: Data subsetting & attribute-level column filtering#68
MaximumTrainer merged 3 commits intomainfrom
copilot/implement-data-subsetting-attributes

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 16, 2026

  • Add selectedAttributes field to TableConfiguration domain model
  • Update DatabaseConnector port's fetchData signature with selectedAttributes parameter
  • Update all connector implementations to use selectedAttributes in SELECT pushdown
    • MySQL: escape backticks in column names (it.replace("", "``")`)
    • MongoDB: exclude _id from projection unless explicitly requested
  • Update DestinationSchemaService.mirrorSchema to filter columns by selectedAttributes
    • Fail fast with IllegalArgumentException when selectedAttributes matches zero source columns
  • Update JobService.processTable to pass selectedAttributes to fetchData and mirrorSchema
  • Update TableConfigurationRequest, TableConfigurationResponse DTOs
  • Update TableConfigurationService to persist and return selectedAttributes
    • Normalize empty list → null using takeIf { isNotEmpty() }
  • Update WorkspaceExportService export/import with same normalization
  • Update Frontend TypeScript types with selectedAttributes
  • Fix TablesView.vue comment: "omit field" (undefined) not "send null"
  • Add/improve unit tests for all above changes
  • Update user-guide.md documentation

- Add selectedAttributes field to TableConfiguration entity (stored as comma-separated TEXT)
- Update DatabaseConnector.fetchData port signature with selectedAttributes parameter
- Update PostgreSQL, AzureSQL, MySQL, MongoDB, File connectors to push column selection to source
- Update DestinationSchemaService.mirrorSchema to filter schema by selectedAttributes
- Update JobService to pass selectedAttributes to fetchData and mirrorSchema in all table modes
- Update TableConfigurationRequest/Response DTOs with selectedAttributes field
- Update TableConfigurationService to serialize/deserialize selectedAttributes list
- Update WorkspaceConfigDto and WorkspaceExportService for export/import of selectedAttributes
- Update frontend TypeScript types with selectedAttributes
- Add column picker UI in TablesView.vue with comma-separated text input
- Add unit tests for new selectedAttributes functionality
- Fix existing test mocks to use updated 4-parameter fetchData signature
- Update user-guide.md with subsetting and attribute filtering documentation

Agent-Logs-Url: https://github.com/MaximumTrainer/OpenDataMask/sessions/9a3b22af-7e44-44d4-87a5-f806e0540af3

Co-authored-by: MaximumTrainer <1376575+MaximumTrainer@users.noreply.github.com>
Copilot AI changed the title [WIP] Add data subsetting and attribute filtering capability feat: Data subsetting & attribute-level column filtering Apr 16, 2026
Copilot AI requested a review from MaximumTrainer April 16, 2026 11:53
@MaximumTrainer MaximumTrainer requested review from Copilot and removed request for MaximumTrainer April 16, 2026 12:05
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds column-level extraction filtering (selectedAttributes) alongside existing row-level subsetting (whereClause), threading both through the API, persistence, job engine, schema mirroring, connectors, and the frontend Tables UI.

Changes:

  • Extend table configuration DTOs/entity and export/import to support selectedAttributes as a list (persisted as comma-separated text).
  • Push down column selection to source connectors (SQL SELECT col1,... / MongoDB projection / file row-map filtering) and mirror destination schema accordingly.
  • Update frontend types + table config modal to edit/display selected columns, and add/adjust tests for the new behavior/signature.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
frontend/src/views/TablesView.vue Adds UI input + payload normalization for selected columns and displays active column filters.
frontend/src/types/index.ts Extends TS types and export DTO shape to include selectedAttributes.
docs/user-guide.md Documents subsetting/WHERE and attribute-level column filtering plus UI instructions.
backend/src/main/kotlin/com/opendatamask/domain/port/output/DatabaseConnector.kt Extends connector port with selectedAttributes parameter.
backend/src/main/kotlin/com/opendatamask/domain/port/input/dto/TableConfigurationDto.kt Adds selectedAttributes to request/response DTOs.
backend/src/main/kotlin/com/opendatamask/domain/port/input/dto/WorkspaceConfigDto.kt Adds selectedAttributes to table export DTO.
backend/src/main/kotlin/com/opendatamask/domain/model/TableConfiguration.kt Persists selectedAttributes as comma-separated TEXT column.
backend/src/main/kotlin/com/opendatamask/application/service/TableConfigurationService.kt Serializes/deserializes selectedAttributes between entity string and DTO list.
backend/src/main/kotlin/com/opendatamask/application/service/WorkspaceExportService.kt Export/import handling for selectedAttributes.
backend/src/main/kotlin/com/opendatamask/application/service/JobService.kt Threads selectedAttributes through schema mirroring and source reads across modes.
backend/src/main/kotlin/com/opendatamask/application/service/DestinationSchemaService.kt Filters mirrored schema columns based on selectedAttributes (case-insensitive).
backend/src/main/kotlin/com/opendatamask/adapter/output/connector/PostgreSQLConnector.kt Generates SELECT col1,col2 (quoted identifiers) when attributes are provided.
backend/src/main/kotlin/com/opendatamask/adapter/output/connector/AzureSQLConnector.kt Adds TOP + bracket-quoted column selection pushdown.
backend/src/main/kotlin/com/opendatamask/adapter/output/connector/MySQLConnector.kt Adds backtick-quoted column selection pushdown.
backend/src/main/kotlin/com/opendatamask/adapter/output/connector/MongoDBConnector.kt Adds projection-based field selection.
backend/src/main/kotlin/com/opendatamask/adapter/output/connector/FileConnector.kt Filters parsed row maps to selected keys (case-insensitive).
backend/src/test/kotlin/com/opendatamask/application/service/DestinationSchemaServiceTest.kt New tests for selectedAttributes filtering, empty list, and case-insensitivity.
backend/src/test/kotlin/com/opendatamask/adapter/output/connector/PostgreSQLConnectorTest.kt Adds tests for selectedAttributes and WHERE+columns combined.
backend/src/test/kotlin/com/opendatamask/application/service/TableConfigurationServiceTest.kt Adds tests around selectedAttributes serialization/deserialization.
backend/src/test/kotlin/com/opendatamask/application/service/JobServiceTest.kt Updates mocks/verifications for new fetchData signature.
backend/src/test/kotlin/com/opendatamask/application/service/SubsetExecutionServiceTest.kt Updates fetchData mock to new signature.
backend/src/test/kotlin/com/opendatamask/application/service/SensitivityScanServiceTest.kt Updates fetchData mock to new signature.
backend/src/test/kotlin/com/opendatamask/application/service/SensitivityScanLogServiceTest.kt Updates fetchData mocks to new signature.
backend/src/test/kotlin/com/opendatamask/adapter/input/rest/TableConfigurationControllerTest.kt Updates response factory for new DTO field.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread frontend/src/views/TablesView.vue Outdated
Comment on lines +110 to +115
// Normalise: send null when no attributes selected (empty list = select all)
const payload: TableConfigurationRequest = {
...tableForm.value,
selectedAttributes:
tableForm.value.selectedAttributes?.length ? tableForm.value.selectedAttributes : undefined
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the UI will "send null" when no attributes are selected, but the code sets selectedAttributes to undefined (omits the field). Either update the comment to match the behavior, or explicitly send null if the backend relies on distinguishing null vs missing.

Copilot uses AI. Check for mistakes.
Comment on lines +85 to +90
val filteredColumns = if (selectedAttributes.isEmpty()) {
sourceColumns
} else {
val normalizedSelected = selectedAttributes.map { it.lowercase() }.toSet()
sourceColumns.filter { it.name.lowercase() in normalizedSelected }
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mirrorSchema() can end up with filteredColumns empty when selectedAttributes doesn't match any source column names. This will call createTable() with 0 columns (invalid SQL in some connectors, and a silent no-op in MySQL), causing confusing job failures later. Consider validating that at least one column matched and fail fast with a clear error listing the unknown attributes (or fall back to all columns).

Suggested change
val filteredColumns = if (selectedAttributes.isEmpty()) {
sourceColumns
} else {
val normalizedSelected = selectedAttributes.map { it.lowercase() }.toSet()
sourceColumns.filter { it.name.lowercase() in normalizedSelected }
}
val sourceColumnNames = sourceColumns.map { it.name }
val sourceColumnNamesNormalized = sourceColumnNames.map { it.lowercase() }.toSet()
val normalizedSelected = selectedAttributes.map { it.lowercase() }.toSet()
val filteredColumns = if (selectedAttributes.isEmpty()) {
sourceColumns
} else {
sourceColumns.filter { it.name.lowercase() in normalizedSelected }
}
if (selectedAttributes.isNotEmpty() && filteredColumns.isEmpty()) {
val unknownAttributes = selectedAttributes.filter { it.lowercase() !in sourceColumnNamesNormalized }
throw IllegalArgumentException(
"No selected attributes matched source columns for table '$tableName'. " +
"Unknown attributes: ${if (unknownAttributes.isEmpty()) selectedAttributes else unknownAttributes}. " +
"Available columns: $sourceColumnNames"
)
}

Copilot uses AI. Check for mistakes.
Comment on lines +53 to 60
val selectPart = if (!selectedAttributes.isNullOrEmpty()) {
selectedAttributes.joinToString(", ") { "`$it`" }
} else {
"*"
}
val sql = buildString {
append("SELECT * FROM `$tableName`")
append("SELECT $selectPart FROM `$tableName`")
if (whereClause != null) append(" WHERE $whereClause")
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MySQL column identifiers in selectedAttributes are interpolated as `$it` without escaping/sanitizing. This can break queries for columns containing backticks and expands the injection surface compared to other connectors that strip quote characters. Sanitize/escape backticks in column names (e.g., remove or double them) before building selectPart.

Copilot uses AI. Check for mistakes.
var query = if (limit != null) collection.find(filter).limit(limit) else collection.find(filter)
if (!selectedAttributes.isNullOrEmpty()) {
val projection = Document().apply {
selectedAttributes.forEach { field -> put(field, 1) }
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MongoDB projections that include specific fields will still return _id by default. With the current projection document, fetchData() will likely include _id even when it wasn't requested, which contradicts the "only requested columns" behavior used elsewhere. Consider explicitly excluding _id (set _id: 0) unless _id is present in selectedAttributes.

Suggested change
selectedAttributes.forEach { field -> put(field, 1) }
selectedAttributes.forEach { field -> put(field, 1) }
if (!selectedAttributes.contains("_id")) {
put("_id", 0)
}

Copilot uses AI. Check for mistakes.
Comment on lines 26 to 30
rowLimit = request.rowLimit,
whereClause = request.whereClause
)
config.selectedAttributes = request.selectedAttributes?.filter { it.isNotBlank() }?.joinToString(",")
return tableConfigurationRepository.save(config).toResponse()
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When request.selectedAttributes is an empty list, joinToString(",") produces an empty string, so the entity stores "" rather than null. This makes responses/export potentially return an empty list instead of null and complicates the "empty/omitted = all columns" contract. Consider normalizing to null when the filtered+trimmed list is empty (e.g., takeIf { it.isNotEmpty() }).

Copilot uses AI. Check for mistakes.
table.mode = tableDto.mode
table.rowLimit = tableDto.rowLimit
table.whereClause = tableDto.whereClause
table.selectedAttributes = tableDto.selectedAttributes?.filter { it.isNotBlank() }?.joinToString(",")
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import currently persists selectedAttributes as a comma-separated string even when the list is empty, which results in an empty string instead of null. If the API contract is "null/empty = all columns", consider storing null when the incoming list is empty after trimming/filtering, to keep persistence/export consistent.

Suggested change
table.selectedAttributes = tableDto.selectedAttributes?.filter { it.isNotBlank() }?.joinToString(",")
val normalizedSelectedAttributes = tableDto.selectedAttributes
?.map { it.trim() }
?.filter { it.isNotBlank() }
table.selectedAttributes = normalizedSelectedAttributes
?.takeIf { it.isNotEmpty() }
?.joinToString(",")

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +95
)
whenever(tableConfigurationRepository.save(any<TableConfiguration>())).thenReturn(saved)

val response = service.createTableConfiguration(10L, request)

Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test name says it verifies persistence, but it only asserts the response mapping from the mocked saved entity. Consider verifying the argument passed to tableConfigurationRepository.save(...) (e.g., capture and assert selectedAttributes == "id,name,email") so the test actually covers the serialization behavior.

Suggested change
)
whenever(tableConfigurationRepository.save(any<TableConfiguration>())).thenReturn(saved)
val response = service.createTableConfiguration(10L, request)
)
val tableConfigurationCaptor = argumentCaptor<TableConfiguration>()
whenever(tableConfigurationRepository.save(any<TableConfiguration>())).thenReturn(saved)
val response = service.createTableConfiguration(10L, request)
verify(tableConfigurationRepository).save(tableConfigurationCaptor.capture())
assertEquals("id,name,email", tableConfigurationCaptor.firstValue.selectedAttributes)

Copilot uses AI. Check for mistakes.
@MaximumTrainer
Copy link
Copy Markdown
Owner

@copilot apply changes based on the comments in this thread

- Fix comment in TablesView.vue: omit field (undefined) rather than null when no columns selected
- Normalize empty selectedAttributes list to null in TableConfigurationService and WorkspaceExportService using takeIf { isNotEmpty() }, preventing empty-string storage
- Add fast-fail in DestinationSchemaService.mirrorSchema when selectedAttributes produces zero matching columns
- Escape backticks in MySQL column identifiers (double-backtick escaping) to prevent SQL injection
- Exclude _id from MongoDB projection unless explicitly requested in selectedAttributes
- Improve TableConfigurationServiceTest to capture save() argument and verify comma-separated serialization
- Add TableConfigurationServiceTest for empty-list normalization to null
- Add DestinationSchemaServiceTest for unmatched selectedAttributes fast-fail path

Agent-Logs-Url: https://github.com/MaximumTrainer/OpenDataMask/sessions/d68edfed-91a4-4db2-aa64-007b97334b8c

Co-authored-by: MaximumTrainer <1376575+MaximumTrainer@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 16, 2026

@copilot apply changes based on the comments in this thread

Applied all 7 review comments in commit 0877d56:

  1. TablesView.vue comment — updated to say "omit field" (undefined) instead of "send null".
  2. DestinationSchemaService fast-failmirrorSchema now throws IllegalArgumentException when selectedAttributes matches zero source columns, reporting the unknown attribute names and listing available columns.
  3. MySQL backtick escaping — column names now use it.replace("", "``")` before wrapping in backticks, closing the injection surface.
  4. MongoDB _id exclusion_id: 0 is added to the projection unless _id is explicitly listed in selectedAttributes.
  5. TableConfigurationService empty-list normalization — both create and update paths use ?.takeIf { it.isNotEmpty() } so an empty list stores null instead of "".
  6. WorkspaceExportService — same takeIf { isNotEmpty() } normalization applied consistently.
  7. TableConfigurationServiceTest serialization — test now uses argumentCaptor<TableConfiguration>() to assert the saved entity's selectedAttributes field equals "id,name,email", plus a new test verifying an empty list normalizes to null.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • example.com
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java -Dorg.gradle.internal.worker.tmpdir=/home/REDACTED/work/OpenDataMask/OpenDataMask/backend/build/tmp/test/work -Dorg.gradle.native=false -javaagent:/home/REDACTED/work/OpenDataMask/OpenDataMask/backend/build/tmp/.cache/expanded/zip_0bd0851504e6286cb5fc6f37a81e25cc/jacocoagent.jar=destfile=build/jacoco/test.exec,append=true,inclnolocationclasses=false,dumponexit=true,output=file,jmx=false @/home/REDACTED/.gradle/.tmp/gradle-worker-classpath14013826460707004457txt -Xmx512m -Dfile.encoding=UTF-8 -Duser.country -Duser.language=en -Duser.variant -ea worker.org.gradle.process.internal.worker.GradleWorkerMain &#39;Gradle Test Executor 1&#39; (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI requested a review from MaximumTrainer April 16, 2026 12:24
@MaximumTrainer MaximumTrainer marked this pull request as ready for review April 16, 2026 12:34
@MaximumTrainer MaximumTrainer merged commit 786b365 into main Apr 16, 2026
10 checks passed
@MaximumTrainer MaximumTrainer deleted the copilot/implement-data-subsetting-attributes branch April 16, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data Subsetting & Attribute Filtering

3 participants