feat: Add 7 new smoke test scenarios for improved destination coverage#1002
Conversation
New scenarios: - duplicate_primary_keys: Tests dedup behavior with repeated PKs - time_types: Tests time-with/without-timezone formats - union_types: Tests oneOf/anyOf schema columns - array_of_primitives: Tests arrays of strings, integers, mixed types - large_string_values: Tests 1KB/10KB/100KB string values - sparse_records: Tests rows with different column subsets populated - special_number_values: Tests float64 boundary values and large integers These address coverage gaps identified when comparing destination-snowflake integration tests against source-smoke-test scenarios. Related: #1001 (multi-execution scenarios tracked separately) Co-Authored-By: AJ Steers <aj@airbyte.io>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1774277339-add-smoke-test-scenarios' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1774277339-add-smoke-test-scenarios'PR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful ResourcesCommunity SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
📝 WalkthroughWalkthroughThis PR fixes description formatting, adds seven new predefined smoke-test scenarios (including Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Would you like me to suggest any additional test cases or edge inputs for the new 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Adds explicit airbyte_type for time_no_tz (time_without_timezone) and time_with_tz (time_with_timezone) so the test properly exercises how destinations distinguish timezone semantics for time fields. Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Pull request overview
Expands source-smoke-test’s predefined scenario suite to increase destination smoke-test coverage by adding new single-sync scenarios targeting edge cases (PK dedup, time formats, unions, arrays, large strings, sparse rows, numeric boundaries).
Changes:
- Added 7 new predefined smoke test scenarios to broaden destination coverage.
- Added a new
large_stringsrecord generator (generate_large_string_records()), wired viarecord_generator. - Applied minor formatting-only adjustments to existing scenario descriptions (implicit string concatenation cleanup).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
Adds 7 new single-sync smoke test scenarios to
source-smoke-test, addressing coverage gaps identified by comparing against destination-snowflake integration tests. Total scenario count goes from 15 → 22.New scenarios:
duplicate_primary_keystime_typesairbyte_typeannotations)union_typesoneOfschemas (string-or-integer, number-or-null, object-or-string)array_of_primitivesnested_json_objects)large_string_valuesgenerate_large_string_records()generatorsparse_recordsspecial_number_valuesAlso adds a new
generate_large_string_records()function and extendsget_scenario_records()to dispatch on therecord_generatorfield (supporting both"large_batch"and the new"large_strings").Minor cosmetic changes from
ruff formatcollapsing implicit string concatenations in existing scenario descriptions.Multi-execution scenarios (schema evolution, truncate refresh) are tracked separately in #1001.
Review & Testing Checklist for Human
union_typesoneOf pattern: Confirm thatoneOfis the correct JSON Schema construct for union types in the Airbyte protocol (vs. using"type": ["string", "integer"]shorthand). If destinations expect the shorthand form, this scenario may need adjustment.special_number_valuesboundary floats:1.7976931348623157e308(MAX_FLOAT64) and5e-324(MIN_SUBNORMAL) — verify these serialize correctly through the Airbyte protocol JSON layer and don't cause issues with any destination's JSON parserairbyte destination smoke-testagainst a test Snowflake or DuckDB destination). CI passes but no end-to-end destination test was performed.Notes
time_typesnow uses explicitairbyte_typeannotations (time_without_timezone,time_with_timezone,timestamp_without_timezone) per review feedback from CodeRabbit and Copilot.math.piappears in a few test records becauseruffenforces FURB152 (replace3.14literal withmath.pi). Semantically harmless for test data.sparse_recordsscenario includes{"id": 7}— a record with only the primary key and zero optional fields — as an intentional edge case.Link to Devin session: https://app.devin.ai/sessions/130db0d36ce9492cbfabc49102b11613
Requested by: Aaron ("AJ") Steers (@aaronsteers)