feat(optimizer): [2/N] Optimizer REST service layer by mkuchenbecker · Pull Request #531 · linkedin/openhouse

mkuchenbecker · 2026-04-06T18:02:12Z

Optimizer Stack

PR	Content
#527	Data Model
#530	Database Repos
#531 (this)	REST service
#533	Analyzer app
#534	Scheduler app
#tbd	Spark BatchedOFD app
#tbd	Infra, docker-compose, smoke test

Summary

PR 2 of N in the optimizer stack.
Overall Project
Service Design doc.

Service layer and REST controllers for the optimizer service, plus the apps/optimizer shared module providing lightweight entity/repo copies for the analyzer and scheduler apps.

Changes

Service layer: OptimizerDataService interface and OptimizerDataServiceImpl — CRUD operations, complete-operation lifecycle, stats upsert with history double-write, filtered queries.

Controllers: TableOperationsController, TableOperationsHistoryController, TableStatsController — REST endpoints per the design doc API spec.

Shared module (apps/optimizer): Lightweight entity and repository copies used by the analyzer and scheduler apps to read optimizer state directly from MySQL.

Testing Done

Manually Tested on local docker setup. Please include commands ran, and their output.
Added new tests for the changes made.
Updated existing tests to reflect the changes made.
No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
Some other form of testing like staging or soak time in production. Please explain.

H2 integration tests in OptimizerDataServiceImplTest (5 tests):

completeOperation_writesHistoryFromOperationRow — saves SCHEDULED row, completes it, asserts history DTO fields
completeOperation_notFound_returnsEmpty — completes nonexistent ID, asserts empty
upsertTableStats_createsNewRow — upserts new table, asserts DTO and repo row
upsertTableStats_updatesExistingRow — upserts twice, asserts overwrite with single row
upsertTableStats_appendsHistoryOnEveryCall — upserts twice, asserts 2 history rows

./gradlew :services:optimizer:test
# BUILD SUCCESSFUL — all 25 tests pass (repo tests from PR 1 + 5 new service tests)

Additional Information

Breaking Changes
Deprecations
Large PR broken into smaller PRs, and PR plan linked in the description.

Service interface and implementation for all optimizer CRUD operations including complete-operation lifecycle, stats upsert with history double-write, and filtered queries. Three REST controllers expose the endpoints. The apps/optimizer shared module provides lightweight entity/repo copies for the analyzer and scheduler apps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Align OptimizerDataServiceImpl with renamed repository methods from optimizer-1 review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Resolve repo conflicts by taking optimizer-1's clean find-only versions. Scheduler-specific methods and streamAll removed per review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mkuchenbecker

this needs tests

Propagate CompleteOperationRequest orphan field removal. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

H2 integration tests for OptimizerDataServiceImpl covering completeOperation (write history, not-found) and upsertTableStats (create, update, history append). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Strengthen upsertTableStats test to verify history rows contain the raw delta stats from each call, not just the row count. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

abhisheknath2011 · 2026-04-13T14:53:34Z

+   * with the history row, or 404 if the operation does not exist.
+   */
+  @PostMapping("/{id}/complete")
+  public ResponseEntity<TableOperationsHistoryDto> completeOperation(


We need table name and database name as input. We can keep the url format same as how tables sevice urls are specified like v1/databases/DB/tables/TABLE

Or can be passed as parameters.

These APIs are intentionally keyed by table UUID because of drop-and-recreate semantics: a recreated table is a brand-new entity for the optimizer (new stats, new storage, new operation history), and a name-based key would conflate two distinct identities. The Spark caller of /{id}/complete already has the operation id. We'll add a name-based variant when a concrete use case lands; today the only such use case is operation-history browsing, which is covered separately.

abhisheknath2011 · 2026-04-13T14:54:49Z

+
+  /** Fetch a single operation row by its ID, regardless of status. Returns 404 if not found. */
+  @GetMapping("/{id}")
+  public ResponseEntity<TableOperationsDto> getTableOperation(@PathVariable String id) {


Same comment database name and table name needed.

Same answer — fetch-by-id is intentional for the same drop-and-recreate reason. The list endpoint at the controller root already accepts databaseName / tableName as optional query-param filters when a multi-criteria browse is needed.

abhisheknath2011 · 2026-04-13T15:03:45Z

+
+/** REST controller for {@code table_operations}. */
+@RestController
+@RequestMapping("/v1/table-operations")


Can we have common format for all urls like common prefix /v1/optimizer/ and operations can be be suffix. So the url can be something like /v1/optimizer/operations.

Claude: Renamed to /v1/optimizer/operations. Applied the same /v1/optimizer/... namespacing across all three controllers.

abhisheknath2011 · 2026-04-13T15:04:36Z

+
+/** REST controller for {@code table_operations_history}. */
+@RestController
+@RequestMapping("/v1/table-operations-history")


Can we have common format for all urls like common prefix /v1/optimizer/ and operations can be be suffix. So the url can be something like /v1/optimizer/history or /v1/optimizer/operations-history

Claude: Renamed to /v1/optimizer/operations-history (the more descriptive of the two, to disambiguate from stats history).

abhisheknath2011 · 2026-04-13T15:05:41Z

+
+  /** Return the most recent history for a table, newest first, up to {@code limit} rows. */
+  @GetMapping("/{tableUuid}")
+  public ResponseEntity<List<TableOperationsHistoryDto>> getHistory(


Table name and database name?

we probably need both. This API is used by the analyzer to find the history for a particular uuid, but people getting the history will do so by name.

Claude: Done — added GET /v1/optimizer/databases/{databaseName}/tables/{tableName}/operations-history for human/name-based access. The UUID-keyed path stays for the analyzer. Backed by a new composite index on table_operations_history (database_name, table_name) at the schema layer.

abhisheknath2011 · 2026-04-13T15:06:35Z

+
+/** REST controller for managing per-table stats in the optimizer DB. */
+@RestController
+@RequestMapping("/v1/table-stats")


Suggested change

@RequestMapping("/v1/table-stats")

@RequestMapping("/v1/optimizer/table-stats")

or

Suggested change

@RequestMapping("/v1/table-stats")

@RequestMapping("/v1/optimizer/stats")

Claude: Renamed to /v1/optimizer/stats (took the shorter of the two; symmetric with /v1/optimizer/operations and /v1/optimizer/operations-history).

abhisheknath2011 · 2026-04-13T15:08:26Z

+   * Iceberg commit. Idempotent.
+   */
+  @PutMapping("/{tableUuid}")
+  public ResponseEntity<TableStatsDto> upsertTableStats(


database name and table name needed.

The PUT path is intentionally UUID-keyed — the Tables Service caller writes by UUID, and stats for a recreated table need to land under a fresh row, not collide with the dropped table's history. The request body already carries databaseName / tableName as denormalized fields. Same position as the operations endpoints: we'll add name-based access if a concrete use case lands.

This was referenced Apr 6, 2026

feat(optimizer): [1/N] Optimizer Repository #530

Open

feat(optimizer): [0/N] Optimizer Data Model #527

Open

mkuchenbecker marked this pull request as draft April 6, 2026 18:03