Skip to content

Pre-delete dependency guard for DROP TABLE#219

Open
jogrogan wants to merge 8 commits intomainfrom
jogrogan/dependency-guard-validator-v2
Open

Pre-delete dependency guard for DROP TABLE#219
jogrogan wants to merge 8 commits intomainfrom
jogrogan/dependency-guard-validator-v2

Conversation

@jogrogan
Copy link
Copy Markdown
Collaborator

@jogrogan jogrogan commented May 1, 2026

Problem

DROP TABLE on a Kafka topic / Venice store / MySQL table currently succeeds even when an active Pipeline still reads from or writes to the resource. The downstream pipeline gets silently orphaned, it keeps trying to consume a topic that no
longer exists.

LogicalTableDeployer.delete() was unimplemented (threw SQLFeatureNotSupported) for the same reason: there was no safe way to verify a logical table's tier resources weren't still in use.

Approach

A pre-delete dependency check that runs in the SQL DDL path before any deployer-level state change:

  1. At MV creation or logical table creation, every underlying Pipeline CRD is stamped with depends-on-<slug> labels — one per source and one per sink — and a collision-guard annotation listing the same identifiers verbatim.
  2. At DROP time, ValidationService.validateOrThrow(new PendingDelete<>(source), connection) runs an indexed label-selector query against Pipeline CRDs. Any match (after self-uid filtering and annotation cross-check) becomes a validation error and fails the DROP.

The mechanism is a thin layer on Hoptimator's existing Validator/ValidatorProvider/ValidationService framework, extended to thread a Connection so validators can do external lookups.

What's in the diff (3 commits)

feat: pre-delete dependency guard for DROP TABLE

The core feature, end-to-end:

Layer Change
Validator framework Validated.validate(Issues, Connection), ValidatorProvider.validators(T, Connection), ValidationService.validateOrThrow(T, Connection) and validateOrThrow(Collection<T>, Connection). Connection-aware on a single, breaking signature (similar to how deployers take connection)
Intent signal New PendingDelete<T> wrapper in hoptimator-api. Carries an optional selfOwnerUid so cascade-deleted children can be excluded from the dependent set.
K8s indexed lookup PipelineDependencyLabels (slug + identifier + label/annotation builders), PipelineDependencyChecker (label-selector query + annotation collision guard + self-owner filter).
Stamping K8sPipelineDeployer.toK8sObject() stamps labels and annotation; K8sPipelineBundle and K8sMaterializedViewDeployer thread sources/sink through.
Dispatch K8sValidatorProvider returns a K8sPipelineDependencyValidator for PendingDelete<Source>; registered via META-INF/services/com.linkedin.hoptimator.ValidatorProvider.
Wiring HoptimatorDdlExecutor.execute(SqlDropObject) calls validateOrThrow(new PendingDelete<>(source), connection) before DeploymentService.delete in the table branch.

feat: support DROP TABLE for logical tables

LogicalTableDeployer.delete() is now implemented as N-tier-DROPs + 1-CRD-delete:

  1. Per-tier pre-flight: validateOrThrow(new PendingDelete<>(tierSource, logicalTableUid), connection). Active external pipelines block. The selfOwnerUid (the LogicalTable CRD's UID) ensures the implicit inter-tier pipelines don't self-block.
  2. Delete the LogicalTable CRD — owner-ref cascade removes its owned Pipeline and TableTrigger CRDs.
  3. Best-effort per-tier physical cleanup (Kafka topic / Venice store). Failures log and continue; a stranded tier resource is recoverable, aborting mid-DROP isn't.
  4. Per-tier schema cleanup: deregister the TemporaryTable only when its physical delete succeeded.

test: integration scenarios + cleanup test warnings

  • kafka-ddl-create-table.id — cross-driver scenarios exercising the new check (DROP-while-pipeline-depends-on-it, source-side and partial-view-sink-side).
  • The bulk of the file count is unrelated test-warning cleanup that was queued up against this branch — unused imports, tightened generics. No semantic change.

Test plan

  • ./gradlew test — all unit tests pass.
  • kafka-ddl-create-table.id — cross-driver DROP scenarios exercised end-to-end against a fixture-loaded JDBC topic.
  • logical-ddl.id — DROP TABLE on a logical table now succeeds; cascading nearline-to-online pipeline removal verified.
  • Integration tests against a live K8s environment.
  • Manual: try DROP-ing a table while an MV depends on it — confirm the new error message names the dependent pipeline.

Known caveats

  • No backfill for pre-existing pipelines. The label-selector query only matches Pipelines that have the depends-on-* labels stamped on them, which only happens at create time post-feature. Pipelines deployed before this PR ships are invisible
    to the check and won't block a DROP. To make this a hard correctness invariant, an upgrade-time backfill job needs to re-stamp labels on every existing Pipeline CRD by re-deriving sources/sink from its spec.sql. Out of scope here.
  • MV-on-MV is not specifically guarded. Calcite inlines view sources at plan time, so a Pipeline backing ads.audience2 AS SELECT * FROM ads.audience ends up labeled with the leaf tables (ads.page_views, profile.members), not ads.audience.
    Operationally that's fine — audience2's Flink job never references audience at runtime, so dropping audience doesn't orphan anything. The dangling SQL on audience2's View CRD is self-healing on next CREATE OR REPLACE.

jogrogan and others added 4 commits May 1, 2026 13:14
Refuses a DROP TABLE while an active Pipeline still references the
resource (as either source or sink), so dropping the underlying Kafka
topic / Venice store / MySQL table can't silently orphan a downstream
pipeline.

Validator framework, made Connection-aware:
- Validated.validate(Issues, Connection)              (was: validate(Issues))
- ValidatorProvider.validators(T, Connection)         (was: validators(T))
- ValidationService.validate(T, Issues, Connection)
- ValidationService.validateOrThrow(T, Connection)
- ValidationService.validateOrThrow(Collection<T>, Connection)
- ValidationService.validators(T, Connection)

PendingDelete<T> wrapper (hoptimator-api):
- Explicit "this is being deleted" signal so unrelated callers of
  validateOrThrow(source, connection) don't accidentally trigger
  pre-delete checks.
- Carries an optional selfOwnerUid so cascade-deleted children can be
  excluded from the dependent set.

K8s indexed lookup:
- PipelineDependencyLabels stamps `depends-on-<slug>` labels on every
  Pipeline CRD at create time, naming each source/sink. The slug is a
  16-char SHA-256 prefix of `<database>_<dot-joined-path>`; an
  annotation lists the full identifiers so a slug collision can be
  detected at check time.
- PipelineDependencyChecker uses a server-indexed label-selector list
  + annotation cross-check + selfOwnerUid filter.
- K8sPipelineDeployer threads sources/sink through and calls
  PipelineDependencyLabels.labelsFor / annotationFor at toK8sObject().
  K8sPipelineBundle and K8sMaterializedViewDeployer pass the data
  through.

Dispatch:
- K8sValidatorProvider returns a K8sPipelineDependencyValidator for
  PendingDelete<Source>; registered via
  META-INF/services/com.linkedin.hoptimator.ValidatorProvider.
- K8sPipelineDependencyValidator wraps PipelineDependencyChecker as a
  Validator.

DROP TABLE wiring:
- HoptimatorDdlExecutor calls
  ValidationService.validateOrThrow(new PendingDelete<>(source),
  connection) before DeploymentService.delete in the table branch.
  HoptimatorDdlUtils.removeTableFromSchema() is the symmetric inverse
  of registerTemporaryTableInSchema() for cleanup.

Implementor side-effects (no behavior change):
- KafkaDeployer / VeniceDeployer / MySqlDeployer no longer need a
  declarative DependencyGuarded marker — the guard fires from the
  validator framework before delete() is reached.
- All existing Validated implementors (DefaultValidator,
  CompatibilityValidatorBase, AvroTableValidator, K8sViewTable) and
  ValidatorProvider implementors (DefaultValidatorProvider,
  CompatibilityValidatorProvider, AvroValidatorProvider) updated to
  the new signatures.

Tests: PipelineDependencyLabelsTest, PipelineDependencyCheckerTest,
K8sPipelineDeployerTest assertions for stamping, validator-framework
test updates throughout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LogicalTableDeployer.delete() previously threw SQLFeatureNotSupported.
Now implemented end-to-end as a per-tier sequence that mirrors what
running DROP TABLE on each tier independently would do, plus the
LogicalTable CRD removal at the top.

Flow:
1. Per-tier pre-flight via the validator framework:
   ValidationService.validateOrThrow(new PendingDelete<>(tierSource,
   logicalTableUid), connection) — refuses the drop if any active
   external pipeline still references a tier resource. The
   selfOwnerUid is the LogicalTable CRD's UID so the implicit
   inter-tier pipelines (owned by the CRD, cascade-deleted with it)
   don't self-block.
2. Delete the LogicalTable CRD. K8s owner-ref cascade removes its
   owned Pipeline and TableTrigger CRDs.
3. Best-effort physical cleanup of each tier resource (Kafka topic,
   Venice store, ...). A failed tier delete logs a warning but does
   not abort — a stranded tier is recoverable; aborting mid-DROP
   isn't.
4. Per-tier schema cleanup: deregister the TemporaryTable entry in
   each tier schema only when its physical delete succeeded.

Tests:
- LogicalTableDeployerTest deleteRemovesCrdAndCleansUpTierResources,
  deletePropagatesCrdDeletionFailure, deleteSwallowsTierCleanupFailures.
- logical-ddl.id integration test: DROP TABLE LOGICAL.testevent now
  succeeds and cascades the implicit nearline-to-online pipeline.
- logical-offline-ddl.id companion update.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kafka-ddl-create-table.id: cross-driver dependency-guard scenarios
exercising the new pre-delete check end-to-end through the kafka
driver — drop-table-while-pipeline-depends-on-it (source side and
partial-view sink side).

The bulk of the file count is mechanical noise reduction across
existing test files: dropped unused imports, tightened generics on
@SuppressWarnings, etc. — fallout from the warning_cleanup pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jogrogan jogrogan force-pushed the jogrogan/dependency-guard-validator-v2 branch from 7c19cc8 to 6b2b705 Compare May 1, 2026 17:15
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Code Coverage

Overall Project 84.51% -0.16% 🟢
Files changed 93.86% 🟢

File Coverage
K8sValidatorProvider.java 100% 🟢
K8sYamlApi.java 100% 🟢
K8sPipelineDependencyValidator.java 100% 🟢
AvroValidatorProvider.java 100% 🟢
Validator.java 100% 🟢
PendingDelete.java 100% 🟢
ValidationService.java 100% 🟢
DefaultValidatorProvider.java 100% 🟢
CompatibilityValidatorProvider.java 100% 🟢
K8sApi.java 99.64% 🟢
K8sPipelineDeployer.java 97.67% -2.33% 🟢
KafkaDeployer.java 97.04% 🟢
MySqlDeployer.java 96.84% 🟢
PipelineDependencyLabels.java 96.41% -3.59% 🟢
AvroTableValidator.java 96.35% 🟢
K8sMaterializedViewDeployer.java 95.85% -4.15% 🟢
K8sViewTable.java 95.42% 🟢
CompatibilityValidatorBase.java 92.96% 🟢
HoptimatorDdlExecutor.java 92.83% 🟢
PipelineDependencyChecker.java 90.96% -9.04% 🟢
K8sPipelineBundle.java 87.79% -7.63%
HoptimatorDdlUtils.java 85.63% -0.14% 🟢
LogicalTableDeployer.java 76.67% -0.93% 🟢
VeniceDeployer.java 52.92% 🟢

@jogrogan jogrogan force-pushed the jogrogan/dependency-guard-validator-v2 branch from 60fb5e0 to e234cec Compare May 1, 2026 19:40
@jogrogan jogrogan marked this pull request as ready for review May 2, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant