Skip to content

Conversation

@martyngigg
Copy link
Contributor

@martyngigg martyngigg commented Jan 6, 2026

Summary

Datetime handling has been overhauled and as a result databases such as mssql now properly reflect tz-naive columns. We add in a mapping for the tz-naive column to the Iceberg type. Fixes an error now raised for incremental loads in #177 for mssql dbs. See dlt-hub/dlt#3061.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added resource item limiting capability for enhanced testing and debugging flexibility.
  • Bug Fixes

    • Improved timestamp handling to correctly support timezone-aware datetime values.
  • Chores

    • Updated dlt dependency to version 1.20.0 for improved compatibility and performance.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 6, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Dependency upgrades dlt from 1.15.0 to 1.20.0. Enhanced timestamp handling for PyIceberg with timezone support. Refactored opralogweb extraction to add resource limiting, improve incremental query handling via adapter callback, and update epoch constant to timezone-naive format.

Changes

Cohort / File(s) Summary
Dependency Management
elt-common/pyproject.toml
Updated dlt[parquet,s3] dependency from ~=1.15.0 to =1.20.0; reformatted dependency array from single-line to multi-line TOML syntax; pyiceberg=0.9.1 preserved.
Timestamp Type Enhancement
elt-common/src/elt_common/dlt_destinations/pyiceberg/helpers.py
Added TimestampType import from pyiceberg.types; updated dlt_type_to_iceberg function to conditionally return TimestamptzType() for timezone-aware timestamps or TimestampType() for timezone-naive.
Extraction and Loading Refactoring
warehouses/accelerator/extract_load/opralogweb/extract_and_load.py
Replaced Select import with broader sa (SQLAlchemy) usage throughout; added with_resource_limit() helper for per-resource item limits; introduced entries_query_adapter_callback() for incremental query handling with timezone-naive conversion; updated OPRALOG_EPOCH constant to remove UTC timezone; extended opralogwebdb() signature to accept limit_max_items parameter; updated internal function signatures and enhanced docstrings.

Poem

🐰 A rabbit's verse on versions new

Dlt upgraded, two point oh's in view,
Timestamps dance 'tween zones with care,
Resources limited, spread so fair,
Incremental queries thrive at last—
The future's built upon the past! 🌟

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Upgrade dlt to 1.20' directly and clearly reflects the main objective of the changeset: upgrading the dlt dependency from version 1.15.0 to 1.20.0, which is the primary change across all modified files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI Agents
In @elt-common/src/elt_common/dlt_destinations/pyiceberg/helpers.py:
- Around line 111-114: The reverse mapping function iceberg_to_dlt_type
currently only handles TimestamptzType and will raise a TypeError for Iceberg
TimestampType; update iceberg_to_dlt_type to detect both TimestamptzType and
TimestampType (the same types returned by the forward mapping that uses timezone
flag) and map TimestampType to the corresponding DLT timestamp type without
timezone, while keeping TimestamptzType mapped as before; modify the conditional
in iceberg_to_dlt_type to check for both classes (TimestampType and
TimestamptzType) and return the appropriate DLT types instead of only handling
TimestamptzType.

In @warehouses/accelerator/extract_load/opralogweb/extract_and_load.py:
- Line 94: The function entries_query_adapter_callback is implemented but never
used because the query_adapter_callback parameter is commented out; either
remove the unused function entries_query_adapter_callback to eliminate dead
code, or re-enable its use by restoring the query_adapter_callback argument
where the call is made (uncomment the line containing
query_adapter_callback=entries_query_adapter_callback) so the callback is
actually passed into the caller (ensure the caller signature accepts it). Make
the change consistently: if keeping the callback, verify any upstream function
expecting query_adapter_callback handles it; if removing, delete
entries_query_adapter_callback and any related imports or references.
🧹 Nitpick comments (1)
warehouses/accelerator/extract_load/opralogweb/extract_and_load.py (1)

109-139: Consider using parameterised queries if this callback is enabled.

The callback constructs SQL using f-string interpolation on line 138. While cursor_col comes from a trusted internal source and formatted is derived from a datetime, using sa.text() with string formatting is generally discouraged. If this code is intended to be used, consider using SQLAlchemy's column comparison directly.

Note: This is currently dead code as the usage is commented out on line 94.

🔎 Suggested refactor using SQLAlchemy column comparison
-    # Construct WHERE clause with an explicit CAST for safety
-    where_clause = sa.text(f"{cursor_col} >= CAST('{formatted}' AS datetime)")
-    return select_clause.where(where_clause)
+    # Use SQLAlchemy's column comparison with explicit CAST
+    cursor_column = table.c[cursor_col]
+    return select_clause.where(cursor_column >= sa.cast(formatted, sa.DateTime))
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0a56faf and 1978240.

⛔ Files ignored due to path filters (1)
  • elt-common/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • elt-common/pyproject.toml
  • elt-common/src/elt_common/dlt_destinations/pyiceberg/helpers.py
  • warehouses/accelerator/extract_load/opralogweb/extract_and_load.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: elt-common end-to-end tests
  • GitHub Check: warehouses end-to-end tests
🔇 Additional comments (6)
elt-common/src/elt_common/dlt_destinations/pyiceberg/helpers.py (1)

32-34: LGTM!

The TimestampType import is correctly added alongside the existing TimestamptzType to support the new timezone-aware/naive distinction.

warehouses/accelerator/extract_load/opralogweb/extract_and_load.py (4)

28-28: LGTM!

Making OPRALOG_EPOCH timezone-naive aligns with the dlt 1.20 datetime handling changes, where MSSQL datetime columns are now correctly reflected as tz-naive.


39-46: LGTM!

Clean helper function that conditionally applies resource limits. The implementation correctly handles the None case and returns the resource for method chaining.


49-51: LGTM!

The new limit_max_items parameter provides useful debugging capability and is correctly documented as being for testing purposes.


173-174: LGTM!

The type hint correctly specifies sa.Select for the query parameter.

elt-common/pyproject.toml (1)

11-14: Upgrade to dlt 1.20.0 is safe—the only documented breaking change does not apply to this project.

The single breaking change in dlt 1.20.0 affects Ibis dataset/relation API access; the codebase does not use Ibis. No breaking changes are documented for the parquet or S3 extras. The datetime handling code present in the codebase aligns with the PR's stated improvements.

Datetime handling has been overhauled and as a result databases
such as mssql now properly reflect tz-naive columns. We
add in a mapping for the tz-naive column to the Iceberg type.
@martyngigg martyngigg merged commit 76816ba into main Jan 7, 2026
5 checks passed
@martyngigg martyngigg deleted the upgrade-dlt-1.20 branch January 7, 2026 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants