Skip to content

Add Airbyte protocol v2 compliance (stream status, per-stream state)#149

Open
Smidge wants to merge 3 commits intoplanetscale:mainfrom
anam-org:upstream-pr
Open

Add Airbyte protocol v2 compliance (stream status, per-stream state)#149
Smidge wants to merge 3 commits intoplanetscale:mainfrom
anam-org:upstream-pr

Conversation

@Smidge
Copy link
Copy Markdown

@Smidge Smidge commented Mar 31, 2026

Problem

The connector does not work with Airbyte 2.x. Every sync fails with one of two errors:

  1. Missing stream status messages:

    Input was fully read, but some streams did not receive a terminal stream status message.
    
  2. LEGACY state rejection (on incremental syncs):

    java.lang.IllegalArgumentException: LEGACY states are deprecated.
    

These were optional in Airbyte 1.x but are now enforced in 2.x (protocol v0.2.0+).

Root Cause

Three issues in the connector:

  1. No stream status trace messages. Airbyte 2.x requires STARTED and COMPLETE/INCOMPLETE status messages for every stream. The connector emits none.

  2. Legacy global state format. The connector emits a single global state blob ({"data": {"streams": {...}}}). Airbyte 2.x requires per-stream state messages with {"type": "STREAM", "stream": {"stream_descriptor": {...}, "stream_state": {...}}}.

  3. Cannot parse v2 state input. On incremental syncs, Airbyte 2.x passes state back as a JSON array of per-stream state objects. The connector only understands the legacy global format, so json.Unmarshal fails and the process exits before reading any data.

Changes

Commit 1: Add stream status trace messages

  • Add TRACE message type and STREAM_STATUS constants (STARTED, COMPLETE, INCOMPLETE)
  • Add StreamDescriptor, AirbyteStreamStatus, AirbyteTraceMessage types
  • Replace State(SyncState) with StreamState(namespace, streamName, ShardStates) that emits per-stream state with type=STREAM
  • Add StreamStatus() method to emit stream lifecycle trace messages
  • Update AirbyteLogger interface and test mock

Commit 2: Update read loop and state parsing

  • Emit STARTED before reading each stream, COMPLETE on success, INCOMPLETE on error
  • Replace os.Exit(1) with break on per-stream errors so remaining streams still get status messages
  • Emit per-stream STATE after each stream completes (not one global blob at the end)
  • Parse Airbyte v2 per-stream state array on incremental syncs, with fallback to legacy format
  • Default empty namespace to source database name to prevent state key mismatches

Commit 3: Tests

  • 8 logger unit tests (per-stream state format, no legacy fields, stream status traces, JSON round-trip)
  • 6 read protocol integration tests (message ordering, multi-shard state, error handling, per-stream state emission)

Testing

  • All existing tests pass
  • New tests cover the protocol changes
  • Tested against Airbyte OSS v2.0.2 with 28 PlanetScale streams — both initial (full refresh) and incremental syncs succeed
  • Verified output with docker run ... read | grep STATE shows "type":"STREAM" per-stream state
  • Verified STARTED and COMPLETE trace messages appear for each stream

Compatibility

  • Backwards compatible with the legacy state format (falls back automatically)
  • No changes to the connector spec, check, or discover commands
  • No changes to the Dockerfile or CI configuration

Smidge added 3 commits March 31, 2026 10:57
Airbyte 2.x requires sources to emit STREAM_STATUS trace messages
(STARTED, COMPLETE, INCOMPLETE) for each stream. Without these,
every sync fails with:

  "streams did not receive a terminal stream status message"

Changes:
- Add TRACE message type and stream status constants to types.go
- Add StreamDescriptor, AirbyteStreamStatus, AirbyteTraceMessage types
- Replace legacy global State() with per-stream StreamState() that
  emits state.type=STREAM (required by Airbyte 2.x, which rejects
  the LEGACY format with IllegalArgumentException)
- Add StreamStatus() method to emit STARTED/COMPLETE/INCOMPLETE traces
- Update AirbyteLogger interface and test mock accordingly
Update the read command to be fully compatible with Airbyte 2.x:

Read loop changes:
- Emit STARTED before reading each stream
- Emit COMPLETE after successful read, INCOMPLETE on error
- Replace os.Exit(1) with break on per-stream errors so remaining
  streams still get status messages
- Emit per-stream STATE (type=STREAM) after each stream completes
  instead of one global state blob at the end

State parsing changes:
- Handle Airbyte v2 per-stream state format on incremental syncs.
  Airbyte 2.x passes state back as a JSON array of per-stream state
  objects, not the legacy global SyncState blob. Without this, the
  second sync always fails because json.Unmarshal fails on the array
  format, causing os.Exit(1) before any streams are processed.
- Fall back to legacy format for backwards compatibility
- Default empty namespace to source database name to prevent state
  key mismatches
Logger tests:
- StreamState emits correct per-stream format with type=STREAM
- Multiple shards included in state output
- No legacy "data" field present (would cause LEGACY rejection)
- StreamStatus emits TRACE messages with correct status values
- JSON round-trip matches exact Airbyte protocol v2 structure

Read protocol tests:
- Read emits per-stream STATE, not legacy global state
- STARTED and COMPLETE emitted for each configured stream
- Correct message ordering: STARTED -> STATE -> COMPLETE
- Multi-shard state contains all shard cursors
- Read errors emit INCOMPLETE and skip state emission
@orware orware requested a review from maxenglander April 1, 2026 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant