Skip to content

feat: Multi-execution smoke test scenarios (schema evolution, sequential syncs) #1001

@devin-ai-integration

Description

@devin-ai-integration

Multi-Execution Smoke Test Scenarios

Problem

The current source-smoke-test architecture only supports single-sync scenarios: one catalog, one set of records, one execution. This means we cannot test important destination behaviors that require multiple sequential syncs, such as:

  • Schema evolution (add/drop/change columns between syncs)
  • Truncate refresh (overwrite existing data)
  • Incremental append after initial load

These are among the most common sources of destination bugs and are well-covered by the CDK's BasicFunctionalityIntegrationTest and connector-specific component tests (e.g. SnowflakeTableSchemaEvolutionTest), but have no coverage in the lightweight smoke test framework.

Proposed Solution

Extend the scenario format to support multi-part scenarios — an ordered list of executions within a single scenario. Each execution would specify:

  1. A catalog (stream schema + sync mode)
  2. Records to emit
  3. (Optional) Whether to run a check operation before this execution

Example structure (conceptual):

name: schema_evolution_add_column
description: "Adds a column between two syncs"
executions:
  - catalog:
      json_schema:
        properties:
          id: {type: integer}
          name: {type: string}
      primary_key: [["id"]]
    records:
      - {id: 1, name: "Alice"}
  - catalog:
      json_schema:
        properties:
          id: {type: integer}
          name: {type: string}
          email: {type: string}
      primary_key: [["id"]]
    records:
      - {id: 2, name: "Bob", email: "bob@example.com"}

Design Considerations

  • Two executions is likely sufficient — schema evolution and most multi-sync scenarios only need before/after. No known need for three-part scenarios currently.
  • Check operation: Evaluate whether including a check between executions adds value. It requires a full container spin-up/spin-down, so it should be opt-in per execution rather than automatic, to avoid unnecessary slowdown.
  • Discovery is not needed: The smoke test source declares its own catalog via scenarios — there's no external schema to discover.
  • Orchestrator changes: The run_destination_smoke_test() function (and the MCP tool / CLI) would need to understand that a scenario can be an ordered list of executions, running them sequentially against the same destination namespace.

Candidate Multi-Execution Scenarios

  1. schema_evolution_add_column — Sync with schema v1, then sync with schema v2 (new column added)
  2. schema_evolution_drop_column — Sync with schema v1, then sync with schema v2 (column removed)
  3. schema_evolution_change_type — Sync with a string column, then change it to integer
  4. overwrite_refresh — Initial sync, then a second sync that replaces all data
  5. incremental_append — Initial sync with records 1-3, then second sync with records 4-6

Context

This issue was identified during an analysis of coverage gaps between destination-snowflake integration tests and source-smoke-test scenarios. The integration tests cover schema evolution, table operations, and sync mode transitions extensively, but source-smoke-test currently cannot test any of these because it only supports single-sync execution.

See also: Devin session for the full coverage analysis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions