Pre-delete dependency guard for DROP TABLE by jogrogan · Pull Request #219 · linkedin/Hoptimator

jogrogan · 2026-05-01T17:13:34Z

Problem

DROP TABLE on a Kafka topic / Venice store / MySQL table currently succeeds even when an active Pipeline still reads from or writes to the resource. The downstream pipeline gets silently orphaned, it keeps trying to consume a topic that no
longer exists.

LogicalTableDeployer.delete() was unimplemented (threw SQLFeatureNotSupported) for the same reason: there was no safe way to verify a logical table's tier resources weren't still in use.

Approach

A pre-delete dependency check that runs in the SQL DDL path before any deployer-level state change:

At MV creation or logical table creation, every underlying Pipeline CRD is stamped with depends-on-<slug> labels — one per source and one per sink — and a collision-guard annotation listing the same identifiers verbatim.
At DROP time, ValidationService.validateOrThrow(new PendingDelete<>(source), connection) runs an indexed label-selector query against Pipeline CRDs. Any match (after self-uid filtering and annotation cross-check) becomes a validation error and fails the DROP.

The mechanism is a thin layer on Hoptimator's existing Validator/ValidatorProvider/ValidationService framework, extended to thread a Connection so validators can do external lookups.

What's in the diff (3 commits)

feat: pre-delete dependency guard for DROP TABLE

The core feature, end-to-end:

Layer	Change
Validator framework	`Validated.validate(Issues, Connection)`, `ValidatorProvider.validators(T, Connection)`, `ValidationService.validateOrThrow(T, Connection)` and `validateOrThrow(Collection<T>, Connection)`. Connection-aware on a single, breaking signature (similar to how deployers take connection)
Intent signal	New `PendingDelete<T>` wrapper in `hoptimator-api`. Carries an optional `selfOwnerUid` so cascade-deleted children can be excluded from the dependent set.
K8s indexed lookup	`PipelineDependencyLabels` (slug + identifier + label/annotation builders), `PipelineDependencyChecker` (label-selector query + annotation collision guard + self-owner filter).
Stamping	`K8sPipelineDeployer.toK8sObject()` stamps labels and annotation; `K8sPipelineBundle` and `K8sMaterializedViewDeployer` thread sources/sink through.
Dispatch	`K8sValidatorProvider` returns a `K8sPipelineDependencyValidator` for `PendingDelete<Source>`; registered via `META-INF/services/com.linkedin.hoptimator.ValidatorProvider`.
Wiring	`HoptimatorDdlExecutor.execute(SqlDropObject)` calls `validateOrThrow(new PendingDelete<>(source), connection)` before `DeploymentService.delete` in the table branch.

feat: support DROP TABLE for logical tables

LogicalTableDeployer.delete() is now implemented as N-tier-DROPs + 1-CRD-delete:

Per-tier pre-flight: validateOrThrow(new PendingDelete<>(tierSource, logicalTableUid), connection). Active external pipelines block. The selfOwnerUid (the LogicalTable CRD's UID) ensures the implicit inter-tier pipelines don't self-block.
Delete the LogicalTable CRD — owner-ref cascade removes its owned Pipeline and TableTrigger CRDs.
Best-effort per-tier physical cleanup (Kafka topic / Venice store). Failures log and continue; a stranded tier resource is recoverable, aborting mid-DROP isn't.
Per-tier schema cleanup: deregister the TemporaryTable only when its physical delete succeeded.

test: integration scenarios + cleanup test warnings

kafka-ddl-create-table.id — cross-driver scenarios exercising the new check (DROP-while-pipeline-depends-on-it, source-side and partial-view-sink-side).
The bulk of the file count is unrelated test-warning cleanup that was queued up against this branch — unused imports, tightened generics. No semantic change.

Test plan

./gradlew test — all unit tests pass.
kafka-ddl-create-table.id — cross-driver DROP scenarios exercised end-to-end against a fixture-loaded JDBC topic.
logical-ddl.id — DROP TABLE on a logical table now succeeds; cascading nearline-to-online pipeline removal verified.
Integration tests against a live K8s environment.
Manual: try DROP-ing a table while an MV depends on it — confirm the new error message names the dependent pipeline.

Known caveats

No backfill for pre-existing pipelines. The label-selector query only matches Pipelines that have the depends-on-* labels stamped on them, which only happens at create time post-feature. Pipelines deployed before this PR ships are invisible
to the check and won't block a DROP. To make this a hard correctness invariant, an upgrade-time backfill job needs to re-stamp labels on every existing Pipeline CRD by re-deriving sources/sink from its spec.sql. Out of scope here.
MV-on-MV is not specifically guarded. Calcite inlines view sources at plan time, so a Pipeline backing ads.audience2 AS SELECT * FROM ads.audience ends up labeled with the leaf tables (ads.page_views, profile.members), not ads.audience.
Operationally that's fine — audience2's Flink job never references audience at runtime, so dropping audience doesn't orphan anything. The dangling SQL on audience2's View CRD is self-healing on next CREATE OR REPLACE.

Refuses a DROP TABLE while an active Pipeline still references the resource (as either source or sink), so dropping the underlying Kafka topic / Venice store / MySQL table can't silently orphan a downstream pipeline. Validator framework, made Connection-aware: - Validated.validate(Issues, Connection) (was: validate(Issues)) - ValidatorProvider.validators(T, Connection) (was: validators(T)) - ValidationService.validate(T, Issues, Connection) - ValidationService.validateOrThrow(T, Connection) - ValidationService.validateOrThrow(Collection<T>, Connection) - ValidationService.validators(T, Connection) PendingDelete<T> wrapper (hoptimator-api): - Explicit "this is being deleted" signal so unrelated callers of validateOrThrow(source, connection) don't accidentally trigger pre-delete checks. - Carries an optional selfOwnerUid so cascade-deleted children can be excluded from the dependent set. K8s indexed lookup: - PipelineDependencyLabels stamps `depends-on-<slug>` labels on every Pipeline CRD at create time, naming each source/sink. The slug is a 16-char SHA-256 prefix of `<database>_<dot-joined-path>`; an annotation lists the full identifiers so a slug collision can be detected at check time. - PipelineDependencyChecker uses a server-indexed label-selector list + annotation cross-check + selfOwnerUid filter. - K8sPipelineDeployer threads sources/sink through and calls PipelineDependencyLabels.labelsFor / annotationFor at toK8sObject(). K8sPipelineBundle and K8sMaterializedViewDeployer pass the data through. Dispatch: - K8sValidatorProvider returns a K8sPipelineDependencyValidator for PendingDelete<Source>; registered via META-INF/services/com.linkedin.hoptimator.ValidatorProvider. - K8sPipelineDependencyValidator wraps PipelineDependencyChecker as a Validator. DROP TABLE wiring: - HoptimatorDdlExecutor calls ValidationService.validateOrThrow(new PendingDelete<>(source), connection) before DeploymentService.delete in the table branch. HoptimatorDdlUtils.removeTableFromSchema() is the symmetric inverse of registerTemporaryTableInSchema() for cleanup. Implementor side-effects (no behavior change): - KafkaDeployer / VeniceDeployer / MySqlDeployer no longer need a declarative DependencyGuarded marker — the guard fires from the validator framework before delete() is reached. - All existing Validated implementors (DefaultValidator, CompatibilityValidatorBase, AvroTableValidator, K8sViewTable) and ValidatorProvider implementors (DefaultValidatorProvider, CompatibilityValidatorProvider, AvroValidatorProvider) updated to the new signatures. Tests: PipelineDependencyLabelsTest, PipelineDependencyCheckerTest, K8sPipelineDeployerTest assertions for stamping, validator-framework test updates throughout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

LogicalTableDeployer.delete() previously threw SQLFeatureNotSupported. Now implemented end-to-end as a per-tier sequence that mirrors what running DROP TABLE on each tier independently would do, plus the LogicalTable CRD removal at the top. Flow: 1. Per-tier pre-flight via the validator framework: ValidationService.validateOrThrow(new PendingDelete<>(tierSource, logicalTableUid), connection) — refuses the drop if any active external pipeline still references a tier resource. The selfOwnerUid is the LogicalTable CRD's UID so the implicit inter-tier pipelines (owned by the CRD, cascade-deleted with it) don't self-block. 2. Delete the LogicalTable CRD. K8s owner-ref cascade removes its owned Pipeline and TableTrigger CRDs. 3. Best-effort physical cleanup of each tier resource (Kafka topic, Venice store, ...). A failed tier delete logs a warning but does not abort — a stranded tier is recoverable; aborting mid-DROP isn't. 4. Per-tier schema cleanup: deregister the TemporaryTable entry in each tier schema only when its physical delete succeeded. Tests: - LogicalTableDeployerTest deleteRemovesCrdAndCleansUpTierResources, deletePropagatesCrdDeletionFailure, deleteSwallowsTierCleanupFailures. - logical-ddl.id integration test: DROP TABLE LOGICAL.testevent now succeeds and cascades the implicit nearline-to-online pipeline. - logical-offline-ddl.id companion update. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@SuppressWarnings

kafka-ddl-create-table.id: cross-driver dependency-guard scenarios exercising the new pre-delete check end-to-end through the kafka driver — drop-table-while-pipeline-depends-on-it (source side and partial-view sink side). The bulk of the file count is mechanical noise reduction across existing test files: dropped unused imports, tightened generics on @SuppressWarnings, etc. — fallout from the warning_cleanup pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-01T17:23:06Z

Code Coverage

Overall Project	84.51% `-0.16%`	🟢
Files changed	93.86%	🟢

File	Coverage
K8sValidatorProvider.java	100%	🟢
K8sYamlApi.java	100%	🟢
K8sPipelineDependencyValidator.java	100%	🟢
AvroValidatorProvider.java	100%	🟢
Validator.java	100%	🟢
PendingDelete.java	100%	🟢
ValidationService.java	100%	🟢
DefaultValidatorProvider.java	100%	🟢
CompatibilityValidatorProvider.java	100%	🟢
K8sApi.java	99.64%	🟢
K8sPipelineDeployer.java	97.67% `-2.33%`	🟢
KafkaDeployer.java	97.04%	🟢
MySqlDeployer.java	96.84%	🟢
PipelineDependencyLabels.java	96.41% `-3.59%`	🟢
AvroTableValidator.java	96.35%	🟢
K8sMaterializedViewDeployer.java	95.85% `-4.15%`	🟢
K8sViewTable.java	95.42%	🟢
CompatibilityValidatorBase.java	92.96%	🟢
HoptimatorDdlExecutor.java	92.83%	🟢
PipelineDependencyChecker.java	90.96% `-9.04%`	🟢
K8sPipelineBundle.java	87.79% `-7.63%`	❌
HoptimatorDdlUtils.java	85.63% `-0.14%`	🟢
LogicalTableDeployer.java	76.67% `-0.93%`	🟢
VeniceDeployer.java	52.92%	🟢

jogrogan and others added 4 commits May 1, 2026 13:14

update integration test

6b2b705

jogrogan force-pushed the jogrogan/dependency-guard-validator-v2 branch from 7c19cc8 to 6b2b705 Compare May 1, 2026 17:15

jogrogan added 4 commits May 1, 2026 15:40

comment cleanups and add @nullable annotation

1730d11

Migrate UID lookup to kind+name

a294bdf

Fix issue to preserve existing annotations

46c8a6f

Add more test coverage

e234cec

jogrogan force-pushed the jogrogan/dependency-guard-validator-v2 branch from 60fb5e0 to e234cec Compare May 1, 2026 19:40

jogrogan marked this pull request as ready for review May 2, 2026 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-delete dependency guard for DROP TABLE#219

Pre-delete dependency guard for DROP TABLE#219
jogrogan wants to merge 8 commits intomainfrom
jogrogan/dependency-guard-validator-v2

jogrogan commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jogrogan commented May 1, 2026

Problem

Approach

feat: pre-delete dependency guard for DROP TABLE

feat: support DROP TABLE for logical tables

test: integration scenarios + cleanup test warnings

Test plan

Known caveats

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 1, 2026 •

edited

Loading