Skip to content

docs: migrate-v3 skill — 12 gotchas from Glue migration#1267

Open
tanishkhot wants to merge 11 commits intomainfrom
glue-findings
Open

docs: migrate-v3 skill — 12 gotchas from Glue migration#1267
tanishkhot wants to merge 11 commits intomainfrom
glue-findings

Conversation

@tanishkhot
Copy link
Copy Markdown
Contributor

Summary

  • Added 12 new Known Gotchas to the migrate-v3 skill, all discovered during the atlan-glue-app migration (API/boto3 connector)
  • Updated Phase 2a decision tree to detect boto3/AWS SDK connectors
  • Added fingerprinter misclassification warning for non-SQL connectors

Test plan

  • Review each gotcha for accuracy
  • Verify code examples compile
  • Run migrate-v3 skill on a test connector to confirm no regressions

🤖 Generated with Claude Code

@snykgituser
Copy link
Copy Markdown

snykgituser commented Apr 9, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues
Code Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

tanishkhot and others added 3 commits April 16, 2026 09:17
Findings from migrating atlan-glue-app (API/boto3 connector) to SDK v3:
- Workflow type name must match v2 class name for platform compatibility
- Dual-path credential resolution (context.get_secret + SecretStore)
- get_workflow_args must replicate v2 StateStore fallback
- ParquetFileWriter/JsonFileWriter lose files without Dapr
- Empty nested column dirs cause 60s Dapr timeouts
- run_dev.py must read Temporal host from env vars
- run_dev_combined() does not accept handler_class
- QueryBasedTransformer.transform_metadata() signature changed
- @task enforces typed Input/Output — no raw dicts
- pyproject.toml local path fails in Docker builds
- Updated Phase 2a decision tree for boto3/API connectors
- Added fingerprinter misclassification warning

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ParquetFileWriter: upload failure raises an exception (not silent);
  local files are not deleted since cleanup runs only after success
- run_dev.py: prefer ATLAN_TEMPORAL_HOST (v3 primary) over legacy
  ATLAN_WORKFLOW_HOST + ATLAN_WORKFLOW_PORT split vars
Replace broken v2 SecretStore.get_credentials() fallback with the
correct v3 path: context.resolve_credential_raw(legacy_credential_ref(guid)).

application_sdk.services.secretstore does not exist in v3 — the import
would raise ModuleNotFoundError. The v3 resolver checks the local secret
store first (InMemorySecretStore for dev) then falls back to
DaprCredentialVault for platform-issued GUIDs.
@cmgrote
Copy link
Copy Markdown
Collaborator

cmgrote commented Apr 16, 2026

@sdk-review

@github-actions
Copy link
Copy Markdown
Contributor

🔄 SDK Review starting (review) — ~10 min. Watch live progress

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 16, 2026

SDK Review: PR #1267 — docs: migrate-v3 skill — 12 gotchas from Glue migration

Verdict: NEEDS FIXES

Documentation-only PR adding valuable migration gotchas from the Glue connector migration. The advice is technically sound and the gotchas capture real production pitfalls. However, several new entries contradict or duplicate pre-existing gotchas in the same file, which would give an AI agent conflicting instructions. These must be reconciled before merge.

Findings by File

.claude/skills/migrate-v3/SKILL.md

  • Critical [STRUCT] L1154 — Contradicts existing gotcha at L1072. The pre-existing "Credential resolution only via credential_guid" gotcha recommends self.context.get_secret(credential_guid). The new "Credential resolution — use context.resolve_credential_raw()" gotcha explicitly says get_secret() alone will NOT find platform-issued credentials and recommends resolve_credential_raw() instead. An AI agent following this file will encounter contradictory instructions depending on which entry it reads first. Fix: Update or delete the stale gotcha at L1072 to match the new (correct) advice at L1154. The new entry's recommendation of resolve_credential_raw() with legacy_credential_ref() is more accurate.

  • Important [STRUCT] L1184 — Duplicates existing gotcha at L1041. Both "Write to local disk, not ParquetFileWriter/JsonFileWriter" (L1041) and "ParquetFileWriter/JsonFileWriter require a running Dapr sidecar" (L1184) describe the same Dapr sidecar failure mode. The new entry adds a useful LocalParquetWriter class but ~80% overlaps with the existing advice. Fix: Merge the LocalParquetWriter example into the existing gotcha at L1041 and remove the duplicate at L1184.

  • Important [STRUCT] L1238 — Duplicates existing gotcha at L1080. Both "QueryBasedTransformer needs workflow_id and workflow_run_id" (L1080) and "QueryBasedTransformer.transform_metadata() signature changed" (L1238) address the same API. The existing one uses a kwargs dict approach while the new one says explicit positional args are required — a subtle contradiction. Fix: Merge the TypeError detail and explicit-args pattern from L1238 into the existing gotcha at L1080. Remove the duplicate.

  • Important [STRUCT] L773 — Dumping ground risk. The "Known Gotchas" section now spans ~500 lines (L773–L1272) with 30+ flat ### entries and no sub-categorization. This makes contradictory duplicates easy to introduce (as seen above) and hard to scan. Fix: Group gotchas under sub-headings by domain: #### Credentials & Secrets, #### Dapr & Infrastructure, #### Temporal & Workflow Runtime, #### File I/O & Output Paths, #### Build & Deployment. (This is a design change — could be done in a follow-up.)

  • Minor [DX] L1197 — LocalParquetWriter defines async def write_batches and async def close but contains only synchronous code (df.to_parquet(), os.makedirs). Copy-paste usage inside a @task would block the event loop. Fix: Make them plain def, or add a note about wrapping with run_in_thread().

  • Minor [DX] L1173 — "get_workflow_args must replicate v2 StateStore fallback" describes 5 steps but is the only complex gotcha without a code example. Fix: Add a skeleton showing the v3 equivalent.

  • Minor [DX] L1162 — Credential resolution example _get_client(self, workflow_args) lacks type annotations. Since these examples serve as copy-paste templates, missing types propagate to production code. Fix: Add workflow_args: dict[str, Any] at minimum.

  • Minor [ARCH] L1258 — WorkflowArgsInput(Input, allow_unbounded_fields=True) opts out of ADR-0008 payload safety without noting it's an escape hatch. Fix: Add a note that this should be replaced with bounded types once dict contents are known.

  • Minor [STRUCT] L1253 — "@task enforces typed Input/Output" overlaps with existing "Payload safety and allow_unbounded_fields" gotcha at L873. Both describe dict rejection in task contracts. Fix: Merge the TaskContractError detail into the existing entry at L873, or cross-reference.

  • Minor [BUG] L1255 — "raises TaskContractError at class definition time" is imprecise — the error is raised at @task decoration time by _validate_task_signature(). Fix: Change to "when the @task decorator validates the method signature".

  • Minor [QUAL] L1210 — "Empty nested column dirs" references process_column_data and transform_data which are Glue-specific methods. Non-Glue migrators won't recognize these. Fix: Add a qualifier: "applies to connectors that write hierarchical column data across relation-level subdirectories".

  • Minor [DX] L1266 — pyproject.toml gotcha pins to branch = "refactor-v3". This will become stale when v3 merges to main. Fix: Add note: "Switch to main or a release tag after v3 stabilizes."

Strengths

  • The 12 gotchas capture genuine production failure modes from a real migration — this is exactly the kind of hard-won knowledge that prevents repeat failures.
  • The fingerprinter misclassification warning (L110) is a valuable addition that could save hours of debugging for API-based connectors.
  • Code examples are concrete and actionable (credential resolution, LocalParquetWriter, temporal host env vars).
  • The commit history shows iterative refinement: commits 2 and 3 fixed stale descriptions caught during self-review.

Comment thread .claude/skills/migrate-v3/SKILL.md Outdated

Every migrated connector will hit this. The skill MUST check the v2 `@workflow.defn` class name and carry it over.

### Credential resolution — use context.resolve_credential_raw()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical [STRUCT] — Contradicts existing gotcha at L1072. The pre-existing gotcha recommends self.context.get_secret(credential_guid), but this new entry correctly says that's insufficient and recommends resolve_credential_raw() instead. An AI agent following this skill file will get conflicting instructions depending on which entry it reads first.

Fix: Update or delete the stale gotcha at L1072 to match this new (correct) advice.

Comment thread .claude/skills/migrate-v3/SKILL.md Outdated

The skill must replicate ALL of these steps. Missing the StateStore lookup means `credential_guid` won't be found in production (platform stores it in state store, not in the workflow start input).

### ParquetFileWriter/JsonFileWriter require a running Dapr sidecar
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important [STRUCT] — Duplicates existing gotcha at L1041 ("Write to local disk, not ParquetFileWriter/JsonFileWriter"). Both describe the same Dapr sidecar failure mode. The LocalParquetWriter class here is a useful addition, but the ~80% overlap means two entries for the same problem.

Fix: Merge the LocalParquetWriter example into the existing gotcha at L1041 and remove this duplicate entry.

Comment thread .claude/skills/migrate-v3/SKILL.md Outdated

Handler is auto-discovered by type inspection — importing the handler class in the App module is sufficient. Do NOT pass `handler_class=` to `run_dev_combined()`.

### QueryBasedTransformer.transform_metadata() signature changed
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important [STRUCT] — Duplicates existing gotcha at L1080 ("QueryBasedTransformer needs workflow_id and workflow_run_id"). The existing entry uses a kwargs dict approach while this one says explicit positional args are required — a subtle contradiction. The explicit-args advice here is more precise.

Fix: Merge the TypeError detail and explicit-positional-args pattern into the existing gotcha at L1080 and remove this duplicate.

Comment thread .claude/skills/migrate-v3/SKILL.md Outdated
self._chunk = 0
self._rows = 0

async def write_batches(self, batches):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor [DX] — These async def methods contain only synchronous code (df.to_parquet(), os.makedirs). Since this is a copy-paste template, migrators using it inside a @task would block the event loop.

Fix: Make them plain def, or add a note about wrapping to_parquet() with run_in_thread().


Note: `application_sdk.services.secretstore.SecretStore` (v2) does not exist in v3. Any code that imports it will raise `ModuleNotFoundError`.

### get_workflow_args must replicate v2 StateStore fallback
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor [DX] — This is the only complex gotcha (5 steps) without a code example. Every other gotcha with comparable complexity includes a Fix code block. A migrator would have to reverse-engineer the v2 source to implement this.

Fix: Add a skeleton code example showing the v3 equivalent of get_workflow_args().

Comment thread .claude/skills/migrate-v3/SKILL.md Outdated
)
```

### @task enforces typed Input/Output — no raw dicts
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor [STRUCT] — Overlaps with existing "Payload safety and allow_unbounded_fields" gotcha at L873. Both describe dict rejection in task contracts with the same allow_unbounded_fields=True solution.

Fix: Merge the TaskContractError detail and the WorkflowArgsInput/StatsOutput wrapper examples into the existing entry at L873, or add a cross-reference.


Use `ParquetFileReader` for reads (it works fine — reads local files first, only tries objectstore as fallback).

### Empty nested column dirs cause 60s Dapr timeouts
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor [QUAL] — process_column_data and transform_data are Glue-specific method names. A non-Glue migrator won't recognize them.

Fix: Add a qualifier: "This applies to connectors that write hierarchical column data across multiple relation-level subdirectories (e.g., Glue, Hive-style extractors)."

…d errors

Critical:
- Merge stale 'Credential resolution only via credential_guid' (get_secret)
  with new resolve_credential_raw gotcha; remove the duplicate entry

Important:
- Merge LocalParquetWriter into 'Write to local disk' gotcha; remove
  duplicate 'ParquetFileWriter require a running Dapr sidecar' entry
- Merge transform_metadata TypeError detail into existing QueryBasedTransformer
  gotcha; remove duplicate 'signature changed' entry

Minor:
- Fix LocalParquetWriter async def -> def (methods are synchronous)
- Add type annotation to _get_client workflow_args parameter
- Add escape hatch note on allow_unbounded_fields=True in @task gotcha
- Add cross-reference from @task gotcha to 'Payload safety' gotcha
- Fix '@task raises TaskContractError at class definition time' -> 'at decoration time'
- Add Glue-specific qualifier to empty column dirs gotcha
- Add note to pyproject.toml gotcha about switching to main after v3 stabilizes
@cmgrote
Copy link
Copy Markdown
Collaborator

cmgrote commented Apr 16, 2026

@sdk-review

@github-actions
Copy link
Copy Markdown
Contributor

🔄 SDK Re-review starting (review) — ~10 min. Watch live progress

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 16, 2026

SDK Re-review: PR #1267 — docs: migrate-v3 skill — 12 gotchas from Glue migration

Verdict: NEEDS FIXES

Significant improvement since last review — 9 of 12 prior findings resolved. The Critical credential-resolution contradiction and both Important duplicates were fixed by updating existing entries in-place (exactly the right approach). Two new issues surfaced: the ATLAN_TEMPORAL_HOST env var semantics may be incorrect, and the Known Gotchas section's flat structure continues to grow without categorization.

Delta from prior review

  • [RESOLVED] .claude/skills/migrate-v3/SKILL.md — prior Critical: credential resolution contradiction (get_secret vs resolve_credential_raw) — existing gotcha updated in-place to use resolve_credential_raw() with legacy_credential_ref()
  • [RESOLVED] .claude/skills/migrate-v3/SKILL.md — prior Important: ParquetFileWriter duplicate entry — LocalParquetWriter merged into existing "Write to local disk" gotcha
  • [RESOLVED] .claude/skills/migrate-v3/SKILL.md — prior Important: QueryBasedTransformer duplicate entry — existing gotcha updated with explicit-args pattern
  • [STILL] .claude/skills/migrate-v3/SKILL.md — prior Important: Known Gotchas section is a flat dumping ground (now 33 entries with no sub-categorization, up from 26)
  • [RESOLVED] .claude/skills/migrate-v3/SKILL.md — prior Minor: async def with sync code in LocalParquetWriter — now plain def with explicit note
  • [STILL] .claude/skills/migrate-v3/SKILL.md:1185 — prior Minor: get_workflow_args gotcha (5 steps) lacks a code example
  • [RESOLVED] .claude/skills/migrate-v3/SKILL.md — prior Minor: missing type annotations in credential example — workflow_args: dict[str, Any] added
  • [RESOLVED] .claude/skills/migrate-v3/SKILL.md — prior Minor: allow_unbounded_fields without escape-hatch note — comment added
  • [STILL] .claude/skills/migrate-v3/SKILL.md:1224 — prior Minor: @task enforces typed I/O overlaps with existing "Payload safety" gotcha — cross-ref added but redundancy remains
  • [RESOLVED] .claude/skills/migrate-v3/SKILL.md — prior Minor: imprecise error timing — "at decoration time" added
  • [RESOLVED] .claude/skills/migrate-v3/SKILL.md — prior Minor: Glue-specific references — qualifier added ("connectors that write hierarchical column data")
  • [RESOLVED] .claude/skills/migrate-v3/SKILL.md — prior Minor: pyproject.toml branch pinning — note added ("Switch to main or a pinned release tag")
  • [NEW] .claude/skills/migrate-v3/SKILL.md:1213 — Important [BUG]: ATLAN_TEMPORAL_HOST may be host-only, not combined host:port
  • [NEW] .claude/skills/migrate-v3/SKILL.md:1111 — Minor [BUG]: misleading **kwargs TypeError description
  • [NEW] .claude/skills/migrate-v3/SKILL.md:1062 — Minor [QUAL]: write_batches missing type hints
  • [NEW] .claude/skills/migrate-v3/SKILL.md:1069 — Minor [QUAL]: close() unparameterized dict return type
  • [NEW] .claude/skills/migrate-v3/SKILL.md:1099 — Minor [DX]: _get_client example ends abruptly — creds unused
  • [NEW] .claude/skills/migrate-v3/SKILL.md:1198 — Minor [STRUCT]: three Dapr-timeout gotchas with no cross-references

Findings by File

.claude/skills/migrate-v3/SKILL.md

  • Important [BUG] L1213 — ATLAN_TEMPORAL_HOST is described as a "combined host:port" variable, but existing SDK evidence (scale-tests.yaml) shows it as host-only with a separate ATLAN_TEMPORAL_PORT. The code example uses os.environ.get("ATLAN_TEMPORAL_HOST") directly as temporal_host, which would yield a bare hostname and fail to connect. Fix: If host-only, build host:port from both vars; if truly combined in v3, add a note clarifying this differs from existing conventions and update scale-tests for consistency.

  • Important [STRUCT] L773 — Known Gotchas section now has 33 flat ### entries across ~480 lines with no sub-categorization. This makes contradictions easy to introduce and hard to scan. The prior review flagged this and the PR adds 7 more flat entries. Fix: Group under sub-headings by domain (e.g. #### Credentials & Secrets, #### Dapr & Infrastructure, #### Temporal & Workflow Runtime, #### File I/O & Output Paths, #### Build & Deployment). Could be a follow-up PR.

  • Minor [BUG] L1111 — Description states "passing them inside a **kwargs dict raises TypeError" but Python dict unpacking via ** correctly passes named parameters. The old kwargs-dict pattern would work fine if the keys are present. The real risk is accidentally omitting these keys. Fix: Rephrase to "these are easy to accidentally omit when building a kwargs dict — pass them as explicit keyword arguments for clarity."

  • Minor [QUAL] L1062 — write_batches(self, batches) missing type hints on batches parameter and return type. Since this is a copy-paste template, migrators will propagate the omission. Fix: def write_batches(self, batches: list[Any]) -> None:.

  • Minor [QUAL] L1069 — close(self) -> dict uses unparameterized dict. The return shape is known. Fix: def close(self) -> dict[str, int]:.

  • Minor [DX] L1099 — _get_client example ends after resolve_credential_raw() — the resolved creds is never used or described. Migrators won't know the return shape. Fix: Add 1–2 lines showing typical usage (e.g. host = creds.get("host")).

  • Minor [STRUCT] L1224 — "@task enforces typed Input/Output" overlaps ~60% with existing "Payload safety and allow_unbounded_fields" (L873). Cross-ref was added (good) but the entries still duplicate the prohibition on dict[str, Any]. Fix: Merge the TaskContractError detail and wrapper examples into the existing entry at L873.

  • Minor [STRUCT] L1198 — Three separate Dapr-timeout gotchas exist without cross-references: "Write to local disk" (L1041), "auto_heartbeat_seconds" (L1146), and "Empty nested column dirs" (L1198). An agent hitting a 60s timeout must check all three independently. Fix: Add cross-references between them, or group under a "Dapr & Infrastructure" heading.

  • Minor [DX] L1185 — get_workflow_args gotcha describes 5 complex steps but is the only multi-step gotcha without a code example. Fix: Add a skeleton showing the v3 equivalent.

Strengths

  • Excellent response to prior review — 9 of 12 findings resolved by updating existing entries in-place rather than adding duplicates. This is the right editorial approach.
  • The 12 new gotchas capture real production failure modes (workflow type name, StateStore fallback, empty column dirs, Temporal host) that would cost hours to debug without this documentation.
  • The fingerprinter misclassification warning (L110) is a valuable safety net for API-based connectors.
  • Code examples are concrete and copy-paste-ready (credential resolution, LocalParquetWriter, temporal host env vars).
  • The LocalParquetWriter now correctly uses plain def with an explicit note explaining why — addresses both the bug and educates the reader.

Important:
- Fix ATLAN_TEMPORAL_HOST env var: guard for host-only case (scale-tests
  sets it without port); append ATLAN_TEMPORAL_PORT when no ':' present

Important (deferred):
- Known Gotchas flat structure (33 entries, no sub-categories) noted as
  follow-up PR

Minor fixes:
- Rephrase QueryBasedTransformer kwargs description: not a TypeError from
  dict unpacking, but easy to accidentally omit the keys
- Add type hints to LocalParquetWriter.write_batches() and close()
- Show creds.get() usage after resolve_credential_raw() in _get_client example
- Merge @task enforces typed I/O into Payload safety gotcha (removes overlap)
- Add cross-reference between three Dapr-timeout gotchas
- Add code skeleton to get_workflow_args StateStore fallback gotcha
@cmgrote
Copy link
Copy Markdown
Collaborator

cmgrote commented Apr 16, 2026

@sdk-review

@github-actions
Copy link
Copy Markdown
Contributor

🔄 SDK Re-review starting (review) — ~10 min. Watch live progress

@github-actions
Copy link
Copy Markdown
Contributor

🔄 SDK Re-review starting (review) — ~10 min. Watch live progress

@github-actions
Copy link
Copy Markdown
Contributor

🔄 SDK Re-review starting (review) — ~10 min. Watch live progress

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

🔄 SDK Re-review starting (review) — ~10 min. Watch live progress

Base automatically changed from refactor-v3 to main April 19, 2026 01:01
@Aryamanz29
Copy link
Copy Markdown
Member

v3 relevance check: ✅ Still relevant — migrate-v3 skill gotchas from Glue migration. Updated 3 days ago. Valuable docs addition for the migration tooling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants