Skip to content

feat: Add 7 new smoke test scenarios for improved destination coverage#1002

Merged
Aaron ("AJ") Steers (aaronsteers) merged 2 commits intomainfrom
devin/1774277339-add-smoke-test-scenarios
Mar 23, 2026
Merged

feat: Add 7 new smoke test scenarios for improved destination coverage#1002
Aaron ("AJ") Steers (aaronsteers) merged 2 commits intomainfrom
devin/1774277339-add-smoke-test-scenarios

Conversation

@aaronsteers
Copy link
Copy Markdown
Member

@aaronsteers Aaron ("AJ") Steers (aaronsteers) commented Mar 23, 2026

Summary

Adds 7 new single-sync smoke test scenarios to source-smoke-test, addressing coverage gaps identified by comparing against destination-snowflake integration tests. Total scenario count goes from 15 → 22.

New scenarios:

Scenario What it tests
duplicate_primary_keys Dedup behavior when multiple records share the same PK
time_types Time-with-timezone, time-without-timezone, timestamp-without-timezone formats (with explicit airbyte_type annotations)
union_types oneOf schemas (string-or-integer, number-or-null, object-or-string)
array_of_primitives Arrays of strings, integers, mixed types (complements existing nested_json_objects)
large_string_values 1 KB / 10 KB / 100 KB strings via a new generate_large_string_records() generator
sparse_records Different rows populate different column subsets; includes a row with only the PK
special_number_values Float64 boundary values (MAX, MIN_SUBNORMAL) and int64/int32 boundaries

Also adds a new generate_large_string_records() function and extends get_scenario_records() to dispatch on the record_generator field (supporting both "large_batch" and the new "large_strings").

Minor cosmetic changes from ruff format collapsing implicit string concatenations in existing scenario descriptions.

Multi-execution scenarios (schema evolution, truncate refresh) are tracked separately in #1001.

Review & Testing Checklist for Human

  • union_types oneOf pattern: Confirm that oneOf is the correct JSON Schema construct for union types in the Airbyte protocol (vs. using "type": ["string", "integer"] shorthand). If destinations expect the shorthand form, this scenario may need adjustment.
  • special_number_values boundary floats: 1.7976931348623157e308 (MAX_FLOAT64) and 5e-324 (MIN_SUBNORMAL) — verify these serialize correctly through the Airbyte protocol JSON layer and don't cause issues with any destination's JSON parser
  • Run a smoke test against at least one destination to verify the new scenarios are accepted without errors (e.g. airbyte destination smoke-test against a test Snowflake or DuckDB destination). CI passes but no end-to-end destination test was performed.

Notes

  • time_types now uses explicit airbyte_type annotations (time_without_timezone, time_with_timezone, timestamp_without_timezone) per review feedback from CodeRabbit and Copilot.
  • math.pi appears in a few test records because ruff enforces FURB152 (replace 3.14 literal with math.pi). Semantically harmless for test data.
  • The sparse_records scenario includes {"id": 7} — a record with only the primary key and zero optional fields — as an intentional edge case.

Link to Devin session: https://app.devin.ai/sessions/130db0d36ce9492cbfabc49102b11613
Requested by: Aaron ("AJ") Steers (@aaronsteers)


Open with Devin

New scenarios:
- duplicate_primary_keys: Tests dedup behavior with repeated PKs
- time_types: Tests time-with/without-timezone formats
- union_types: Tests oneOf/anyOf schema columns
- array_of_primitives: Tests arrays of strings, integers, mixed types
- large_string_values: Tests 1KB/10KB/100KB string values
- sparse_records: Tests rows with different column subsets populated
- special_number_values: Tests float64 boundary values and large integers

These address coverage gaps identified when comparing destination-snowflake
integration tests against source-smoke-test scenarios.

Related: #1001 (multi-execution scenarios tracked separately)
Co-Authored-By: AJ Steers <aj@airbyte.io>
@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This PyAirbyte Version

You can test this version of PyAirbyte using the following:

# Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1774277339-add-smoke-test-scenarios' pyairbyte --help

# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1774277339-add-smoke-test-scenarios'

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /fix-pr - Fixes most formatting and linting issues
  • /uv-lock - Updates uv.lock file
  • /test-pr - Runs tests with the updated PyAirbyte
  • /prerelease - Builds and publishes a prerelease version to PyPI
📚 Show Repo Guidance

Helpful Resources

Community Support

Questions? Join the #pyairbyte channel in our Slack workspace.

📝 Edit this welcome message.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

This PR fixes description formatting, adds seven new predefined smoke-test scenarios (including large_string_values), introduces generate_large_string_records(), and updates get_scenario_records() to dispatch on record_generator (supporting "large_strings" and "large_batch"), all within the scenarios module.

Changes

Cohort / File(s) Summary
Scenario Definitions & Cleanup
airbyte/cli/smoke_test_source/_scenarios.py
Fixed incidental string-concatenation artifacts in descriptions and added seven new scenarios: duplicate_primary_keys, time_types, union_types, array_of_primitives, large_string_values, sparse_records, special_number_values (each with JSON schema, primary key where applicable, and either inline records or record_generator).
Record Generation Logic
airbyte/cli/smoke_test_source/_scenarios.py
Added generate_large_string_records() producing records with progressively larger string fields, and updated get_scenario_records() to use generator = scenario.get("record_generator") and handle "large_strings" and "large_batch", falling back to scenario["records"] when no generator matches.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Would you like me to suggest any additional test cases or edge inputs for the new large_strings generator?

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding 7 new smoke test scenarios to improve destination coverage, which aligns with the core objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch devin/1774277339-add-smoke-test-scenarios

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

@aaronsteers Aaron ("AJ") Steers (aaronsteers) marked this pull request as ready for review March 23, 2026 14:58
Copilot AI review requested due to automatic review settings March 23, 2026 14:58
Adds explicit airbyte_type for time_no_tz (time_without_timezone) and
time_with_tz (time_with_timezone) so the test properly exercises how
destinations distinguish timezone semantics for time fields.

Co-Authored-By: AJ Steers <aj@airbyte.io>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Expands source-smoke-test’s predefined scenario suite to increase destination smoke-test coverage by adding new single-sync scenarios targeting edge cases (PK dedup, time formats, unions, arrays, large strings, sparse rows, numeric boundaries).

Changes:

  • Added 7 new predefined smoke test scenarios to broaden destination coverage.
  • Added a new large_strings record generator (generate_large_string_records()), wired via record_generator.
  • Applied minor formatting-only adjustments to existing scenario descriptions (implicit string concatenation cleanup).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Copy Markdown

PyTest Results (Fast Tests Only, No Creds)

343 tests  ±0   343 ✅ ±0   5m 49s ⏱️ +2s
  1 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 0825e9e. ± Comparison against base commit a8666b2.

@github-actions
Copy link
Copy Markdown

PyTest Results (Full)

413 tests  ±0   395 ✅ ±0   29m 1s ⏱️ + 3m 14s
  1 suites ±0    18 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 0825e9e. ± Comparison against base commit a8666b2.

@aaronsteers Aaron ("AJ") Steers (aaronsteers) merged commit 9d9e91b into main Mar 23, 2026
24 checks passed
@aaronsteers Aaron ("AJ") Steers (aaronsteers) deleted the devin/1774277339-add-smoke-test-scenarios branch March 23, 2026 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants