Skip to content

Merge development (with our current Datawarehouse code) into the AWS branch.#232

Open
jgrantr wants to merge 72 commits intofeature/aws-sdk-v3-againfrom
development
Open

Merge development (with our current Datawarehouse code) into the AWS branch.#232
jgrantr wants to merge 72 commits intofeature/aws-sdk-v3-againfrom
development

Conversation

@jgrantr
Copy link
Copy Markdown

@jgrantr jgrantr commented Nov 3, 2025

Note

High Risk
High risk because it changes core data-warehouse ingest semantics (delete/merge behavior, dimension/fact upsert strategy, and Redshift load paths) and introduces new configuration-dependent branches that affect data correctness and performance.

Overview
Data warehouse ingest behavior is updated. common/datawarehouse/combine.js now treats either the current or previous record being marked __leo_delete__ as overriding merges, changing how delete-vs-update sequences collapse.

Failed validation routing is changed. common/datawarehouse/load.js updates the error pipeline to emit to ${source}_error via ls.toLeo() and adds payload.source_id to preserve the original event id.

Postgres connector gains Redshift-oriented load paths and hashed key options. postgres/lib/connect.js adds streamToTableFromS3() to stage CSVs in S3 and then COPY into tables (optionally deleting the S3 object), and postgres/lib/dwconnect.js adds config-driven branches for Redshift staging (DISTSTYLE/SORTKEY), adjustable delete flushing, safer temp-table cleanup, optional bypass of SCD processing, and hashedSurrogateKeys support (including bigint surrogate/dimension columns and fingerprint-based dimension linking).

Packaging/release updates. Versions are bumped across common, entity-table, and postgres, Postgres now pins leo-connector-common@4.0.13-rc and adds @dsco/layer-leo devDependency, and a new Publish-Postgres GitHub Actions workflow publishes postgres to GitHub Packages on release publish.

Written by Cursor Bugbot for commit 4f13d68. This will update automatically on new commits. Configure here.

mscranton-CH and others added 25 commits March 6, 2023 11:41
ES-2516 - reset deleted flag on update or insert
…fixes

ES-2516 - don't write deletes to the CSV file/staging table
@jgrantr jgrantr requested a review from czirker November 3, 2025 17:42
@ch-snyk-sa
Copy link
Copy Markdown

ch-snyk-sa commented Nov 3, 2025

Snyk checks have failed. 31 issues have been found so far.

Status Scan Engine Critical High Medium Low Total (31)
Open Source Security 0 24 6 1 31 issues
Licenses 0 0 0 0 0 issues
Code Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

* @param error {string}
*/
function handleFailedValidation (ID, source, eventObj, error) {
function handleFailedValidation(ID, source, eventObj, error) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Incorrect Case-Sensitive Property Check Breaks Error Stream Initialization

Same incorrect property check as above. The code checks !errorStream.Writable but should check !errorStream.writable (lowercase w). This will cause the error stream initialization logic to be executed every time handleFailedValidation is called, potentially creating multiple pipelines for the same error stream.

Fix in Cursor Fix in Web

feat: add source id to dim error event
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

useSurrogateDateKeys: true,
}, columnConfig || {});

console.log(`has delete_fix in-place`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug console.log left in production code

Medium Severity

A console.log(has delete_fix in-place) statement is left in the module initialization path of dwconnect.js. This will print to stdout every time the module is loaded, which is noisy in production and clearly debug/development code that wasn't intended to be committed.

Fix in Cursor Fix in Web

if (err) {
return done(err);
} else {
if (results[0].sortKey != null) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case mismatch prevents sortKey from being read in importFact

High Severity

In importFact, the check results[0].sortKey != null uses camelCase sortKey, but the SQL query returns lowercase column names (sortkey). This means results[0].sortKey is always undefined, so sortKey and sortKeyType are never assigned. This silently disables the sort key optimization for Redshift staging tables and the natural key filter logic in the hashed surrogate keys path, potentially causing full table scans and incorrect SORTKEY clauses.

Fix in Cursor Fix in Web

if (err) {
return done(err);
} else {
if (results[0].sortKey != null) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case mismatch prevents sortKey from being read in linkDimensions

High Severity

In linkDimensions, the check results[0].sortKey != null has the same camelCase typo as importFact. The SQL query returns sortkey (lowercase), so sortKey and sortKeyType are never assigned. This disables the natural key filter optimization for the dimension linking queries on Redshift, causing unnecessary full table scans during UPDATE operations.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants