-
Notifications
You must be signed in to change notification settings - Fork 0
Upgrade dlt to 1.20 #181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade dlt to 1.20 #181
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughDependency upgrades dlt from 1.15.0 to 1.20.0. Enhanced timestamp handling for PyIceberg with timezone support. Refactored opralogweb extraction to add resource limiting, improve incremental query handling via adapter callback, and update epoch constant to timezone-naive format. Changes
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI Agents
In @elt-common/src/elt_common/dlt_destinations/pyiceberg/helpers.py:
- Around line 111-114: The reverse mapping function iceberg_to_dlt_type
currently only handles TimestamptzType and will raise a TypeError for Iceberg
TimestampType; update iceberg_to_dlt_type to detect both TimestamptzType and
TimestampType (the same types returned by the forward mapping that uses timezone
flag) and map TimestampType to the corresponding DLT timestamp type without
timezone, while keeping TimestamptzType mapped as before; modify the conditional
in iceberg_to_dlt_type to check for both classes (TimestampType and
TimestamptzType) and return the appropriate DLT types instead of only handling
TimestamptzType.
In @warehouses/accelerator/extract_load/opralogweb/extract_and_load.py:
- Line 94: The function entries_query_adapter_callback is implemented but never
used because the query_adapter_callback parameter is commented out; either
remove the unused function entries_query_adapter_callback to eliminate dead
code, or re-enable its use by restoring the query_adapter_callback argument
where the call is made (uncomment the line containing
query_adapter_callback=entries_query_adapter_callback) so the callback is
actually passed into the caller (ensure the caller signature accepts it). Make
the change consistently: if keeping the callback, verify any upstream function
expecting query_adapter_callback handles it; if removing, delete
entries_query_adapter_callback and any related imports or references.
🧹 Nitpick comments (1)
warehouses/accelerator/extract_load/opralogweb/extract_and_load.py (1)
109-139: Consider using parameterised queries if this callback is enabled.The callback constructs SQL using f-string interpolation on line 138. While
cursor_colcomes from a trusted internal source andformattedis derived from a datetime, usingsa.text()with string formatting is generally discouraged. If this code is intended to be used, consider using SQLAlchemy's column comparison directly.Note: This is currently dead code as the usage is commented out on line 94.
🔎 Suggested refactor using SQLAlchemy column comparison
- # Construct WHERE clause with an explicit CAST for safety - where_clause = sa.text(f"{cursor_col} >= CAST('{formatted}' AS datetime)") - return select_clause.where(where_clause) + # Use SQLAlchemy's column comparison with explicit CAST + cursor_column = table.c[cursor_col] + return select_clause.where(cursor_column >= sa.cast(formatted, sa.DateTime))
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
⛔ Files ignored due to path filters (1)
elt-common/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
elt-common/pyproject.tomlelt-common/src/elt_common/dlt_destinations/pyiceberg/helpers.pywarehouses/accelerator/extract_load/opralogweb/extract_and_load.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: elt-common end-to-end tests
- GitHub Check: warehouses end-to-end tests
🔇 Additional comments (6)
elt-common/src/elt_common/dlt_destinations/pyiceberg/helpers.py (1)
32-34: LGTM!The
TimestampTypeimport is correctly added alongside the existingTimestamptzTypeto support the new timezone-aware/naive distinction.warehouses/accelerator/extract_load/opralogweb/extract_and_load.py (4)
28-28: LGTM!Making
OPRALOG_EPOCHtimezone-naive aligns with the dlt 1.20 datetime handling changes, where MSSQL datetime columns are now correctly reflected as tz-naive.
39-46: LGTM!Clean helper function that conditionally applies resource limits. The implementation correctly handles the
Nonecase and returns the resource for method chaining.
49-51: LGTM!The new
limit_max_itemsparameter provides useful debugging capability and is correctly documented as being for testing purposes.
173-174: LGTM!The type hint correctly specifies
sa.Selectfor the query parameter.elt-common/pyproject.toml (1)
11-14: Upgrade to dlt 1.20.0 is safe—the only documented breaking change does not apply to this project.The single breaking change in dlt 1.20.0 affects Ibis dataset/relation API access; the codebase does not use Ibis. No breaking changes are documented for the parquet or S3 extras. The datetime handling code present in the codebase aligns with the PR's stated improvements.
warehouses/accelerator/extract_load/opralogweb/extract_and_load.py
Outdated
Show resolved
Hide resolved
Datetime handling has been overhauled and as a result databases such as mssql now properly reflect tz-naive columns. We add in a mapping for the tz-naive column to the Iceberg type.
1978240 to
3736f93
Compare
Summary
Datetime handling has been overhauled and as a result databases such as mssql now properly reflect tz-naive columns. We add in a mapping for the tz-naive column to the Iceberg type. Fixes an error now raised for incremental loads in #177 for mssql dbs. See dlt-hub/dlt#3061.
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.