Skip to content

feat: Add preflight check to destination smoke test tool#1007

Merged
Aaron ("AJ") Steers (aaronsteers) merged 9 commits intomainfrom
devin/1774553037-preflight-smoke-test
Mar 26, 2026
Merged

feat: Add preflight check to destination smoke test tool#1007
Aaron ("AJ") Steers (aaronsteers) merged 9 commits intomainfrom
devin/1774553037-preflight-smoke-test

Conversation

@aaronsteers
Copy link
Copy Markdown
Member

@aaronsteers Aaron ("AJ") Steers (aaronsteers) commented Mar 26, 2026

Summary

Closes #1006.

Adds an automatic preflight check to the destination smoke test tool. Before running the user-requested scenarios, the tool now writes basic_types to the destination as a lightweight connectivity/credential validation. If the preflight fails, the tool returns early with a structured error instead of running all scenarios against a broken destination.

Also significantly improves error surfacing throughout the smoke test:

  • _sanitize_error() now resolves errors in priority order: (1) connector log file TRACE ERROR internal_message, (2) original_exception chain, (3) top-level exception. This means failures like bad credentials now show java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available... instead of the generic AirbyteConnectorWriteError: Connector failed.
  • Readback failures (e.g. cache connection issues that cause table_statistics: null) are now captured in a new warnings field instead of being silently logged.

Changes:

  • DestinationSmokeTestResult gets preflight_passed: bool | None (True/False/None=skipped) and warnings: list[str] | None
  • _run_preflight() runs basic_types write, returns (passed, sanitized_error)
  • _extract_trace_error_from_log() parses the connector's log file for TRACE ERROR JSON and extracts the internal_message field
  • _sanitize_error() uses a 3-tier resolution: log file → exception chain → top-level message
  • run_destination_smoke_test() accepts skip_preflight: bool = False
  • Readback exceptions are captured into warnings with sanitized messages
  • MCP tool and CLI both accept the new skip_preflight parameter
  • _prepare_destination_config() moved before preflight so both preflight and main run use the prepared config

Updates since last revision

  • Log file error extraction: Added _extract_trace_error_from_log() to parse the connector's log file for TRACE ERROR internal_message. Many Java-based connectors emit a generic message (e.g. "Something went wrong in the connector") while the actionable detail lives only in internal_message. This is now tried first in _sanitize_error().
  • Preflight error preserved: _run_preflight() returns tuple[bool, str | None] so the sanitized connector error is included in the result instead of being replaced by a generic wrapper.
  • Readback warnings: Readback failures are now captured in a warnings field on the result model, making it clear why table_statistics is null (e.g. PyAirbyteSecretNotFoundError: Secret not found. for Snowflake).

Local test results

Postgres (Docker container, destination-postgres):

  • skip_preflight=True, basic_types — success, full table_statistics with column stats
  • ✅ Preflight enabled, basic_typespreflight_passed: true, full table_statistics
  • ✅ Bad credentials — preflight_passed: false, error: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 60001ms

Snowflake (GSM creds, destination-snowflake):

  • skip_preflight=True, basic_types — success, 3 records delivered
  • ✅ Preflight enabled — preflight_passed: true, 3 records delivered
  • ⚠️ table_statistics: null with warning: PyAirbyteSecretNotFoundError: Secret not found. — pre-existing bug in _dest_to_cache.py where Snowflake credentials dict lacks a password attribute (outside scope of this PR, but now surfaced via warnings)

Review & Testing Checklist for Human

  • _extract_trace_error_from_log reads entire log file into memory: For long-running syncs with verbose logging, the log file could be large. The function reads all lines then scans in reverse. Confirm this is acceptable for expected log sizes.
  • _sanitize_error prefers log file over exception chain: The log file internal_message is tried first. If the log contains a TRACE ERROR from a different failure than the one that raised the exception, the wrong message could be surfaced. Verify this ordering is correct for the common case.
  • _run_preflight catches broad Exception: Intentional (preflight should never crash the main flow), but verify this aligns with the project's exception handling philosophy.
  • basic_types may be written twice: If the user's scenarios include basic_types (e.g. fast), the preflight writes it first, then it's written again. Confirm append semantics are acceptable.
  • Test with a real destination: Run pyab destination-smoke-test --destination=destination-postgres with good and bad credentials. Verify the error message in the bad-credentials case is actionable (not "Connector failed.").

Notes

  • No new automated tests added — this feature primarily needs integration testing with real destinations, which is credential-gated.
  • The Snowflake table_statistics: null issue is a pre-existing bug in snowflake_destination_to_cache() where DestinationSnowflake.credentials is parsed as a dict without a password attribute. This PR surfaces the issue via warnings but does not fix the root cause.
  • elapsed_seconds is 0.0 in the preflight-failure early return since the preflight is not timed separately.

Link to Devin session: https://app.devin.ai/sessions/e2af697f2eb64507b4bbbced88c9af6c
Requested by: Aaron ("AJ") Steers (@aaronsteers)

Summary by CodeRabbit

Release Notes

  • New Features

    • Destination smoke tests now perform automatic preflight validation before main tests
    • Added --skip-preflight CLI option to optionally disable preflight checks
    • Tests now collect and report warnings for non-fatal issues encountered
  • Bug Fixes

    • Improved error detection and reporting with enhanced error message extraction from connector logs

Open with Devin

Important

Auto-merge enabled.

This PR is set to merge automatically when all requirements are met.

- Run basic_types scenario before requested scenarios to validate
  credentials and connectivity
- Return early with structured error if preflight fails
- Add skip_preflight parameter to core function, MCP tool, and CLI
- Improve error surfacing by walking exception chain to find the
  actual connector error message instead of generic wrapper
- Add preflight_passed field to DestinationSmokeTestResult model

Closes #1006

Co-Authored-By: AJ Steers <aj@airbyte.io>
@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This PyAirbyte Version

You can test this version of PyAirbyte using the following:

# Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1774553037-preflight-smoke-test' pyairbyte --help

# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1774553037-preflight-smoke-test'

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /fix-pr - Fixes most formatting and linting issues
  • /uv-lock - Updates uv.lock file
  • /test-pr - Runs tests with the updated PyAirbyte
  • /prerelease - Builds and publishes a prerelease version to PyPI
📚 Show Repo Guidance

Helpful Resources

Community Support

Questions? Join the #pyairbyte channel in our Slack workspace.

📝 Edit this welcome message.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an optional preflight (basic_types) write to destination smoke tests (skippable), surfaces connector internal errors from logs, returns preflight_passed and warnings in results, and exposes --skip-preflight CLI flag and MCP parameter.

Changes

Cohort / File(s) Summary
Core Preflight & Result Model
airbyte/_util/destination_smoke_tests.py
Introduced PREFLIGHT_SCENARIO/PREFLIGHT_STREAM_NAME, _build_preflight_scenario(...), _run_preflight(...); added `preflight_passed: bool
CLI Integration
airbyte/cli/pyab.py
Added --skip-preflight click option and forwarded skip_preflight into run_destination_smoke_test(...).
MCP Tool Integration
airbyte/mcp/local.py
Added skip_preflight: bool = False parameter to destination_smoke_test(...) and forwarded it to run_destination_smoke_test(...) with updated help text.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant User
participant CLI_MCP as CLI/MCP
participant Runner as SmokeTestRunner
participant Destination
participant Logger

User->>CLI_MCP: invoke destination_smoke_test (maybe --skip-preflight)
CLI_MCP->>Runner: run_destination_smoke_test(skip_preflight)
Runner->>Runner: _prepare_destination_config()
alt preflight not skipped
    Runner->>Destination: write(PREFLIGHT_SCENARIO, cache=False, state_cache=False)
    Destination-->>Runner: success / exception
    alt exception
        Runner->>Runner: sanitize error (log trace extraction)
        Runner->>Logger: log preflight failure (exc_info)
        Runner-->>CLI_MCP: return { success:false, records_delivered:0, error, preflight_passed:false }
        CLI_MCP-->>User: output result
        return
    end
end
Runner->>Destination: write(requested scenarios)
Destination-->>Runner: records written / exception
Runner->>Destination: readback()
Destination-->>Runner: readback data / exception
Runner->>Runner: collect/sanitize warnings (including readback failures)
Runner->>Logger: log warnings (if any)
Runner-->>CLI_MCP: return result(success, records_delivered, preflight_passed, warnings)
CLI_MCP-->>User: output result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Would you like a brief changelog entry or an example CLI usage line for --skip-preflight, wdyt?

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main feature: adding a preflight check to the destination smoke test tool, which aligns with the primary changeset objective.
Linked Issues check ✅ Passed The PR implements all coding requirements from issue #1006: preflight check with basic_types, early return on failure, skip_preflight parameter, improved error surfacing via trace logs, and warnings field for readback failures.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the preflight check feature scope: smoke test utility, CLI integration, MCP integration, and error sanitization improvements.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch devin/1774553037-preflight-smoke-test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
airbyte/_util/destination_smoke_tests.py (1)

269-283: Consider handling non-PyAirbyte exceptions in the chain?

The chain-walking logic is a nice improvement! One thing I noticed: if the deepest exception is a standard Python exception (like ValueError or ConnectionError) without get_message(), we skip it and fall back to the wrapper's message. This could lose the actual root cause message.

For example, with a chain like AirbyteConnectorWriteError(original_exception=ConnectionError("Connection refused")), we'd still get the generic wrapper message instead of "Connection refused".

What do you think about adding a fallback for non-PyAirbyte exceptions, something like this?

💡 Suggested enhancement
     # If we found a deeper exception with a specific message, use it
-    if deepest is not ex and hasattr(deepest, "get_message"):
+    if deepest is not ex:
+      if hasattr(deepest, "get_message"):
         deep_msg = deepest.get_message()
         # Only use the deep message if it's more specific than the wrapper
         if deep_msg and hasattr(ex, "get_message") and deep_msg != ex.get_message():
             return f"{type(ex).__name__}: {deep_msg}"
+      else:
+        # For standard exceptions, use str() to get the message
+        deep_msg = str(deepest)
+        if deep_msg and hasattr(ex, "get_message") and deep_msg != ex.get_message():
+            return f"{type(ex).__name__}: {deep_msg}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@airbyte/_util/destination_smoke_tests.py` around lines 269 - 283, The
chain-walking currently prefers deepest exceptions only if they implement
get_message(); update the logic in destination_smoke_tests.py around variables
ex and deepest so that if deepest lacks get_message() but str(deepest) yields a
non-empty, more specific message than ex.get_message(), you return
f"{type(ex).__name__}: {str(deepest)}" (i.e., treat standard Python exceptions
like ConnectionError/ValueError as valid root-cause messages); retain the
existing fallback to ex.get_message() and final fallback to str(ex) if no
get_message() exists.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@airbyte/_util/destination_smoke_tests.py`:
- Around line 269-283: The chain-walking currently prefers deepest exceptions
only if they implement get_message(); update the logic in
destination_smoke_tests.py around variables ex and deepest so that if deepest
lacks get_message() but str(deepest) yields a non-empty, more specific message
than ex.get_message(), you return f"{type(ex).__name__}: {str(deepest)}" (i.e.,
treat standard Python exceptions like ConnectionError/ValueError as valid
root-cause messages); retain the existing fallback to ex.get_message() and final
fallback to str(ex) if no get_message() exists.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9307797b-c09e-41ef-b161-6182b4909d09

📥 Commits

Reviewing files that changed from the base of the PR and between 65e1c03 and ab5c061.

📒 Files selected for processing (3)
  • airbyte/_util/destination_smoke_tests.py
  • airbyte/cli/pyab.py
  • airbyte/mcp/local.py

…walking

- Add 'warnings' field to DestinationSmokeTestResult to surface
  readback failures (e.g. cache connection issues) that cause
  table_statistics to be null
- Improve _sanitize_error to handle standard Python exceptions
  (ConnectionError, ValueError, etc.) in the exception chain,
  not just PyAirbyte exceptions with get_message()

Co-Authored-By: AJ Steers <aj@airbyte.io>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 26, 2026

PyTest Results (Fast Tests Only, No Creds)

343 tests  ±0   343 ✅ ±0   5m 38s ⏱️ +16s
  1 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit e100ed2. ± Comparison against base commit 65e1c03.

♻️ This comment has been updated with latest results.

coderabbitai[bot]

This comment was marked as resolved.

_run_preflight() now returns (bool, str | None) so the caller
includes the sanitized connector error in the structured result
instead of a generic 'verify credentials' message.

Co-Authored-By: AJ Steers <aj@airbyte.io>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 26, 2026

PyTest Results (Full)

413 tests  ±0   395 ✅ ±0   40m 16s ⏱️ + 15m 4s
  1 suites ±0    18 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit e100ed2. ± Comparison against base commit 65e1c03.

♻️ This comment has been updated with latest results.

…ter error surfacing

Co-Authored-By: AJ Steers <aj@airbyte.io>
coderabbitai[bot]

This comment was marked as resolved.

devin-ai-integration bot and others added 2 commits March 26, 2026 20:50
…main run

Co-Authored-By: AJ Steers <aj@airbyte.io>
The top-level import of PREDEFINED_SCENARIOS from
airbyte.cli.smoke_test_source._scenarios caused a circular import
that broke test collection.  Replace with inline scenario data
(mirrors the basic_types schema) to avoid the cycle.

Co-Authored-By: AJ Steers <aj@airbyte.io>
@aaronsteers Aaron ("AJ") Steers (aaronsteers) marked this pull request as ready for review March 26, 2026 20:56
Copilot AI review requested due to automatic review settings March 26, 2026 20:56
coderabbitai[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an automatic preflight connectivity/credentials check to the destination smoke test workflow and improves how failures and non-fatal readback issues are surfaced to callers (CLI + MCP).

Changes:

  • Introduces skip_preflight flag to CLI and MCP tool interfaces.
  • Adds preflight execution (writes a small basic_types-derived scenario) and returns early with a structured error on failure.
  • Enhances error sanitization (log TRACE internal_message extraction, exception-chain selection) and surfaces readback failures as warnings.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
airbyte/mcp/local.py Adds skip_preflight parameter to the MCP tool and forwards it into the shared smoke test runner.
airbyte/cli/pyab.py Adds --skip-preflight CLI flag and forwards it into the shared smoke test runner.
airbyte/_util/destination_smoke_tests.py Implements preflight logic, extends result model with preflight_passed/warnings, and improves error/warning surfacing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…tatic logger template

Co-Authored-By: AJ Steers <aj@airbyte.io>
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration bot and others added 2 commits March 26, 2026 21:01
… file reading

Co-Authored-By: AJ Steers <aj@airbyte.io>
…ions

Co-Authored-By: AJ Steers <aj@airbyte.io>
@aaronsteers Aaron ("AJ") Steers (aaronsteers) merged commit 3873093 into main Mar 26, 2026
22 checks passed
@aaronsteers Aaron ("AJ") Steers (aaronsteers) deleted the devin/1774553037-preflight-smoke-test branch March 26, 2026 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add preflight check to destination smoke test tool

2 participants