Sac 30240 tap codat standardize state#29
Sac 30240 tap codat standardize state#29akkumar-qlik wants to merge 15 commits intoSAC-28823-metadata-to-tapfrom
Conversation
* update circle config to use uv * fix yml * validate json schemas
* bump deps * changelog update
There was a problem hiding this comment.
Pull request overview
This PR refactors the state management system in tap-codat to use a properly nested bookmark structure organized by stream and company ID, replacing the previous composite key format. The changes improve state organization and align with Singer tap conventions by scoping bookmarks per company within each stream.
Changes:
- Refactored state bookmark structure from
bookmarks[stream.companyId]tobookmarks[stream][companyId]for proper hierarchical organization - Added comprehensive test suite for state management functions with 100% coverage of edge cases
- Modernized CI/CD pipeline from virtualenv/pip/nose to uv/pytest with Python 3.12
- Updated dependencies (singer-python 6.1.1→6.7.0, requests 2.32.4→2.32.5) and bumped version to 0.5.4
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tap_codat/state.py | Updated get_last_record_value_for_table and incorporate functions to support company-scoped bookmarks with nested dictionary structure |
| tap_codat/streams.py | Modified capture_state class to pass stream_id and company_id separately instead of composite string |
| tap_codat/context.py | Removed unused offset-related methods (get_offset, set_offset, clear_offsets) |
| test/test_state.py | Added comprehensive test suite covering all state management scenarios including edge cases and validation rules |
| setup.py | Bumped version to 0.5.4 and updated dependency versions |
| .circleci/config.yml | Modernized CI pipeline to use uv package manager, pytest testing framework, and Python 3.12 |
| CHANGELOG.md | Minor formatting fix (trailing newline) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| new_state['bookmarks'][table] = {} | ||
|
|
||
| current_value = new_state['bookmarks'].get(table, {}).get(company_id, {}).get('last_record') | ||
| if current_value is None or current_value < parsed: |
There was a problem hiding this comment.
The comparison current_value < parsed on line 36 compares strings lexicographically. While ISO 8601 datetime strings in the format 'YYYY-MM-DDTHH:MM:SSZ' do sort correctly lexicographically, this relies on the format being consistent. If the format changes or if timezone offsets are included (e.g., '+00:00' vs 'Z'), the comparison could produce incorrect results. Consider parsing both values to datetime objects before comparison, or add a comment explaining that lexicographic comparison is intentional and safe for this specific ISO 8601 format.
| if 'bookmarks' not in new_state: | ||
| new_state['bookmarks'] = {} | ||
|
|
||
| if(new_state['bookmarks'].get(table, {}).get('last_record') is None or | ||
| new_state['bookmarks'].get(table, {}).get('last_record') < value): | ||
| new_state['bookmarks'][table] = { | ||
| if table not in new_state['bookmarks']: | ||
| new_state['bookmarks'][table] = {} | ||
|
|
||
| current_value = new_state['bookmarks'].get(table, {}).get(company_id, {}).get('last_record') | ||
| if current_value is None or current_value < parsed: | ||
| new_state['bookmarks'][table][company_id] = { |
There was a problem hiding this comment.
This change modifies the state bookmark structure from bookmarks[stream.companyId] to bookmarks[stream][companyId], which is a breaking change. Existing state files using the old format will not be recognized, causing the tap to resync all data from the start_date. Consider either: (1) adding migration logic to convert old state format to new format, (2) documenting this as a breaking change in CHANGELOG.md with clear upgrade instructions, or (3) adding backward compatibility to read from both old and new formats during a transition period.
There was a problem hiding this comment.
agree we need to mention this in change log as breaking change.
| new_state['bookmarks'][table][company_id] = { | ||
| 'field': field, | ||
| 'last_record': parsed, | ||
| } |
There was a problem hiding this comment.
Although as per the @skuttleman comments. we can keep the nested key value data, but it should not be null or {}, we need to check this before finalizing the state file. and should add one check for this.
There was a problem hiding this comment.
Updated the code to do sanity before finalizing the state file
| def get_max(self): | ||
| company_stream = "{}.{}".format(self.stream_id, self.company_id) | ||
| state_dt = get_last_record_value_for_table(self.ctx.state, company_stream) | ||
| state_dt = get_last_record_value_for_table(self.ctx.state, self.stream_id, self.company_id) |
There was a problem hiding this comment.
I think we'll need to support retrieving state values in the old format to keep the tap compatible with any existing connections.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
|
||
|
|
||
| def sync(ctx): | ||
| sanitize_bookmarks(ctx.state) |
There was a problem hiding this comment.
Sanitization is needed after sync is completed than at the begenning.
| for table in list(bookmarks.keys()): | ||
| _sanitize_stream_bookmark(state, table) |
There was a problem hiding this comment.
| for table in list(bookmarks.keys()): | |
| _sanitize_stream_bookmark(state, table) | |
| for stream_name in list(bookmarks.keys()): | |
| _sanitize_stream_bookmark(state, stream_name) |
| install_requires=[ | ||
| "singer-python==6.1.1", | ||
| "requests==2.32.4", | ||
| "singer-python==6.7.0", |
There was a problem hiding this comment.
| "singer-python==6.7.0", | |
| "singer-python==6.8.0", |
| setup( | ||
| name="tap-codat", | ||
| version="0.5.3", | ||
| version="0.5.4", |
There was a problem hiding this comment.
Versioning is not in sync with CHANGELOG, fix it.
Co-authored-by: Rushikesh Todkar <98420315+RushiT0122@users.noreply.github.com>
Co-authored-by: Rushikesh Todkar <98420315+RushiT0122@users.noreply.github.com>
Description of change
https://qlik-dev.atlassian.net/browse/SAC-30240
Added changes related to standardize state and bookmark update as per the discussion in the ticket:
The company-scoped bookmark pattern is incompatible with both target-qlik and stitch-menagerie-service. They are expecting the keys under the bookmarks to be the tap_stream_id.
tap-codat's Current Implementation:
How data gets nested under that shouldn’t matter to either the target or menagerie, so something like this would be acceptable:
As far as the custom bookmark structure goes, as long as the entire bookmark gets cleared out on reset, the target and menagerie should work as expected.