Skip to content

Sac 30240 tap codat standardize state#29

Open
akkumar-qlik wants to merge 15 commits intoSAC-28823-metadata-to-tapfrom
SAC-30240-tap-codat-standardize-state
Open

Sac 30240 tap codat standardize state#29
akkumar-qlik wants to merge 15 commits intoSAC-28823-metadata-to-tapfrom
SAC-30240-tap-codat-standardize-state

Conversation

@akkumar-qlik
Copy link
Copy Markdown

@akkumar-qlik akkumar-qlik commented Feb 23, 2026

Description of change

https://qlik-dev.atlassian.net/browse/SAC-30240

Added changes related to standardize state and bookmark update as per the discussion in the ticket:
The company-scoped bookmark pattern is incompatible with both target-qlik and stitch-menagerie-service. They are expecting the keys under the bookmarks to be the tap_stream_id.

tap-codat's Current Implementation:

{
  "bookmarks": {
    "stream_name.companyId": {
      "field": "modifiedDate",
      "last_record": "2024-01-01T00:00:00Z"
    }
  }
}

How data gets nested under that shouldn’t matter to either the target or menagerie, so something like this would be acceptable:

{
  "bookmarks": {
    "companies": {
      "{COMPANY_ID_1}": {...},
      "{COMPANY_ID_2}": {...}
    }
  }
}

As far as the custom bookmark structure goes, as long as the entire bookmark gets cleared out on reset, the target and menagerie should work as expected.

// BAD
{
  "bookmarks": {
    "companies": {
      "{COMPANY_ID_1}": {},
      "{COMPANY_ID_2}": null
    }
  }
}
// GOOD
{
  "bookmarks": {
    "companies": {}
  }
}
// ALSO GOOD
{
  "bookmarks": {}
}

skuttleman and others added 3 commits August 7, 2025 10:40
* update circle config to use uv

* fix yml

* validate json schemas
* bump deps

* changelog update
@akkumar-qlik akkumar-qlik self-assigned this Feb 23, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the state management system in tap-codat to use a properly nested bookmark structure organized by stream and company ID, replacing the previous composite key format. The changes improve state organization and align with Singer tap conventions by scoping bookmarks per company within each stream.

Changes:

  • Refactored state bookmark structure from bookmarks[stream.companyId] to bookmarks[stream][companyId] for proper hierarchical organization
  • Added comprehensive test suite for state management functions with 100% coverage of edge cases
  • Modernized CI/CD pipeline from virtualenv/pip/nose to uv/pytest with Python 3.12
  • Updated dependencies (singer-python 6.1.1→6.7.0, requests 2.32.4→2.32.5) and bumped version to 0.5.4

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tap_codat/state.py Updated get_last_record_value_for_table and incorporate functions to support company-scoped bookmarks with nested dictionary structure
tap_codat/streams.py Modified capture_state class to pass stream_id and company_id separately instead of composite string
tap_codat/context.py Removed unused offset-related methods (get_offset, set_offset, clear_offsets)
test/test_state.py Added comprehensive test suite covering all state management scenarios including edge cases and validation rules
setup.py Bumped version to 0.5.4 and updated dependency versions
.circleci/config.yml Modernized CI pipeline to use uv package manager, pytest testing framework, and Python 3.12
CHANGELOG.md Minor formatting fix (trailing newline)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tap_codat/state.py
new_state['bookmarks'][table] = {}

current_value = new_state['bookmarks'].get(table, {}).get(company_id, {}).get('last_record')
if current_value is None or current_value < parsed:
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comparison current_value < parsed on line 36 compares strings lexicographically. While ISO 8601 datetime strings in the format 'YYYY-MM-DDTHH:MM:SSZ' do sort correctly lexicographically, this relies on the format being consistent. If the format changes or if timezone offsets are included (e.g., '+00:00' vs 'Z'), the comparison could produce incorrect results. Consider parsing both values to datetime objects before comparison, or add a comment explaining that lexicographic comparison is intentional and safe for this specific ISO 8601 format.

Copilot uses AI. Check for mistakes.
Comment thread tap_codat/state.py Outdated
Comment thread tap_codat/state.py
Comment on lines 29 to +37
if 'bookmarks' not in new_state:
new_state['bookmarks'] = {}

if(new_state['bookmarks'].get(table, {}).get('last_record') is None or
new_state['bookmarks'].get(table, {}).get('last_record') < value):
new_state['bookmarks'][table] = {
if table not in new_state['bookmarks']:
new_state['bookmarks'][table] = {}

current_value = new_state['bookmarks'].get(table, {}).get(company_id, {}).get('last_record')
if current_value is None or current_value < parsed:
new_state['bookmarks'][table][company_id] = {
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change modifies the state bookmark structure from bookmarks[stream.companyId] to bookmarks[stream][companyId], which is a breaking change. Existing state files using the old format will not be recognized, causing the tap to resync all data from the start_date. Consider either: (1) adding migration logic to convert old state format to new format, (2) documenting this as a breaking change in CHANGELOG.md with clear upgrade instructions, or (3) adding backward compatibility to read from both old and new formats during a transition period.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@atttiwari atttiwari Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree we need to mention this in change log as breaking change.

Comment thread .circleci/config.yml Outdated
Comment thread tap_codat/state.py
new_state['bookmarks'][table][company_id] = {
'field': field,
'last_record': parsed,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although as per the @skuttleman comments. we can keep the nested key value data, but it should not be null or {}, we need to check this before finalizing the state file. and should add one check for this.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code to do sanity before finalizing the state file

Comment thread tap_codat/streams.py
def get_max(self):
company_stream = "{}.{}".format(self.stream_id, self.company_id)
state_dt = get_last_record_value_for_table(self.ctx.state, company_stream)
state_dt = get_last_record_value_for_table(self.ctx.state, self.stream_id, self.company_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to support retrieving state values in the old format to keep the tap compatible with any existing connections.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code

akkumar-qlik and others added 3 commits February 24, 2026 10:27
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Comment thread tap_codat/__init__.py Outdated


def sync(ctx):
sanitize_bookmarks(ctx.state)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanitization is needed after sync is completed than at the begenning.

Comment thread tap_codat/state.py Outdated
Comment on lines +86 to +87
for table in list(bookmarks.keys()):
_sanitize_stream_bookmark(state, table)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for table in list(bookmarks.keys()):
_sanitize_stream_bookmark(state, table)
for stream_name in list(bookmarks.keys()):
_sanitize_stream_bookmark(state, stream_name)

Comment thread setup.py Outdated
install_requires=[
"singer-python==6.1.1",
"requests==2.32.4",
"singer-python==6.7.0",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"singer-python==6.7.0",
"singer-python==6.8.0",

Comment thread setup.py Outdated
setup(
name="tap-codat",
version="0.5.3",
version="0.5.4",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Versioning is not in sync with CHANGELOG, fix it.

@akkumar-qlik akkumar-qlik requested a review from RushiT0122 March 3, 2026 11:15
Comment thread CHANGELOG.md Outdated
Comment thread setup.py Outdated
akkumar-qlik and others added 2 commits March 10, 2026 19:19
Co-authored-by: Rushikesh Todkar <98420315+RushiT0122@users.noreply.github.com>
Co-authored-by: Rushikesh Todkar <98420315+RushiT0122@users.noreply.github.com>
@akkumar-qlik akkumar-qlik requested a review from RushiT0122 March 10, 2026 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants