Skip to content

feat(connectors): add source-smoke-test connector for destination regression testing#74058

Closed
Aaron ("AJ") Steers (aaronsteers) wants to merge 2 commits intomasterfrom
devin/1772092664-source-smoke-test
Closed

feat(connectors): add source-smoke-test connector for destination regression testing#74058
Aaron ("AJ") Steers (aaronsteers) wants to merge 2 commits intomasterfrom
devin/1772092664-source-smoke-test

Conversation

@aaronsteers
Copy link
Copy Markdown
Member

@aaronsteers Aaron ("AJ") Steers (aaronsteers) commented Feb 26, 2026

What

Adds a new source-smoke-test connector that generates synthetic data across 15 predefined scenarios designed to exercise common destination failure patterns. This is the monorepo counterpart to the PyAirbyte implementation in airbytehq/PyAirbyte#982.

Closes: airbytehq/PyAirbyte#981

How

Self-contained Python CDK-based source connector with:

  • source.pySourceSmokeTest class implementing spec, check, discover, read
  • scenarios.py — 15 predefined scenario definitions covering type variations, null handling, naming edge cases, schema variations, batch sizes, and unicode strings
  • Config-driven stream selection via all_fast_streams (default true) and all_slow_streams (default false) boolean flags, plus an explicit scenario_filter list
  • Dynamic scenario injection via custom_scenarios config field
  • Standard connector scaffolding: metadata.yaml, pyproject.toml, main.py, README.md, icon.svg

Review guide

Start here — these are the files that matter:

  1. source_smoke_test/source.py — Core connector logic. Key areas:

    • _get_all_scenarios() (line ~190): scenario selection/filtering logic combining boolean flags + explicit filter + custom scenarios
    • check(): validates custom scenario structure before calling _get_all_scenarios
    • read(): emits records from selected streams
  2. source_smoke_test/scenarios.py — All 15 predefined scenario definitions. Large file but straightforward data declarations. The large_batch_stream scenario uses a generator function rather than inline records.

  3. metadata.yaml — Registry configuration. Note: cloud.enabled: false, oss.enabled: true, releaseStage: alpha.

Things that warrant reviewer attention:

  • Code is duplicated from PyAirbyte PR hotfix: duplicate column in discover of mssql #982. Reviewer should decide if maintaining two copies is acceptable or if this connector should depend on PyAirbyte directly.
  • HIGH_VOLUME_SCENARIO_NAMES is exported from scenarios.py but never imported by source.py (which checks scenario.get("high_volume", False) directly instead).
  • definitionId: 4745b886-d299-4a4f-b14b-4803e643159a is randomly generated — verify no conflict and that this is the intended permanent ID.
  • No poetry.lock — other connectors include one, though CI passes without it.
  • No sample config or configured catalog files — README references paths that don't exist (secrets/config.json, sample_files/configured_catalog.json).

User Impact

Adds a new alpha-stage OSS source connector. No impact on existing connectors or users. The connector generates deterministic test data with no external dependencies.

Can this PR be safely reverted and rolled back?

  • YES 💚

Requested by: Aaron ("AJ") Steers (@aaronsteers)
Devin session


Open with Devin

…ression testing

Co-Authored-By: AJ Steers <aj@airbyte.io>
@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown
Contributor

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • 🛠️ Quick Fixes
    • /format-fix - Fixes most formatting issues.
    • /bump-version - Bumps connector versions, scraping changelog description from the PR title.
  • ❇️ AI Testing and Review (internal link: AI-SDLC Docs):
    • /ai-prove-fix - Runs prerelease readiness checks, including testing against customer connections.
    • /ai-canary-prerelease - Rolls out prerelease to 5-10 connections for canary testing.
    • /ai-review - AI-powered PR review for connector safety and quality gates.
  • 🚀 Connector Releases:
    • /publish-connectors-prerelease - Publishes pre-release connector builds (tagged as {version}-preview.{git-sha}) for all modified connectors in the PR.
    • /bump-progressive-rollout-version - Bumps connector version with an RC suffix (2.16.10-rc.1) for progressive rollouts (enableProgressiveRollout: true).
      • Example: /bump-progressive-rollout-version changelog="Add new feature for progressive rollout"
  • ☕️ JVM connectors:
    • /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
      Example: /update-connector-cdk-version connector=destination-bigquery
    • /bump-bulk-cdk-version bump=patch changelog='foo' - Bump the Bulk CDK's version. bump can be major/minor/patch.
  • 🐍 Python connectors:
    • /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
    • /poe source example lock - Alias for /poe connector source-example lock.
    • /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
    • /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.
  • ⚙️ Admin commands:
    • /force-merge reason="<REASON>" - Force merges the PR using admin privileges, bypassing CI checks. Requires a reason.
      Example: /force-merge reason="CI is flaky, tests pass locally"
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 26, 2026

source-smoke-test Connector Test Results

3 tests   1 ✅  3s ⏱️
1 suites  2 💤
1 files    0 ❌

Results for commit 41bdb3c.

♻️ This comment has been updated with latest results.

Co-Authored-By: AJ Steers <aj@airbyte.io>
@aaronsteers Aaron ("AJ") Steers (aaronsteers) marked this pull request as ready for review February 26, 2026 17:45
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

@devin-ai-integration
Copy link
Copy Markdown
Contributor

Closing this PR in favor of tracking the monorepo source-smoke-test connector work as an issue in the PyAirbyte repo instead. A replacement issue will be created there linking back to this PR.

Requested by Aaron ("AJ") Steers (@aaronsteers).


Devin session

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: PyAirbyte-powered Smoke Test Source for Destination Regression Testing

2 participants