docs: migrate-v3 skill — 12 gotchas from Glue migration#1267
docs: migrate-v3 skill — 12 gotchas from Glue migration#1267tanishkhot wants to merge 11 commits intomainfrom
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Findings from migrating atlan-glue-app (API/boto3 connector) to SDK v3: - Workflow type name must match v2 class name for platform compatibility - Dual-path credential resolution (context.get_secret + SecretStore) - get_workflow_args must replicate v2 StateStore fallback - ParquetFileWriter/JsonFileWriter lose files without Dapr - Empty nested column dirs cause 60s Dapr timeouts - run_dev.py must read Temporal host from env vars - run_dev_combined() does not accept handler_class - QueryBasedTransformer.transform_metadata() signature changed - @task enforces typed Input/Output — no raw dicts - pyproject.toml local path fails in Docker builds - Updated Phase 2a decision tree for boto3/API connectors - Added fingerprinter misclassification warning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ParquetFileWriter: upload failure raises an exception (not silent); local files are not deleted since cleanup runs only after success - run_dev.py: prefer ATLAN_TEMPORAL_HOST (v3 primary) over legacy ATLAN_WORKFLOW_HOST + ATLAN_WORKFLOW_PORT split vars
Replace broken v2 SecretStore.get_credentials() fallback with the correct v3 path: context.resolve_credential_raw(legacy_credential_ref(guid)). application_sdk.services.secretstore does not exist in v3 — the import would raise ModuleNotFoundError. The v3 resolver checks the local secret store first (InMemorySecretStore for dev) then falls back to DaprCredentialVault for platform-issued GUIDs.
|
@sdk-review |
|
🔄 SDK Review starting (review) — ~10 min. Watch live progress |
SDK Review: PR #1267 — docs: migrate-v3 skill — 12 gotchas from Glue migrationVerdict: NEEDS FIXES
Findings by File
Strengths
|
|
|
||
| Every migrated connector will hit this. The skill MUST check the v2 `@workflow.defn` class name and carry it over. | ||
|
|
||
| ### Credential resolution — use context.resolve_credential_raw() |
There was a problem hiding this comment.
Critical [STRUCT] — Contradicts existing gotcha at L1072. The pre-existing gotcha recommends self.context.get_secret(credential_guid), but this new entry correctly says that's insufficient and recommends resolve_credential_raw() instead. An AI agent following this skill file will get conflicting instructions depending on which entry it reads first.
Fix: Update or delete the stale gotcha at L1072 to match this new (correct) advice.
|
|
||
| The skill must replicate ALL of these steps. Missing the StateStore lookup means `credential_guid` won't be found in production (platform stores it in state store, not in the workflow start input). | ||
|
|
||
| ### ParquetFileWriter/JsonFileWriter require a running Dapr sidecar |
There was a problem hiding this comment.
Important [STRUCT] — Duplicates existing gotcha at L1041 ("Write to local disk, not ParquetFileWriter/JsonFileWriter"). Both describe the same Dapr sidecar failure mode. The LocalParquetWriter class here is a useful addition, but the ~80% overlap means two entries for the same problem.
Fix: Merge the LocalParquetWriter example into the existing gotcha at L1041 and remove this duplicate entry.
|
|
||
| Handler is auto-discovered by type inspection — importing the handler class in the App module is sufficient. Do NOT pass `handler_class=` to `run_dev_combined()`. | ||
|
|
||
| ### QueryBasedTransformer.transform_metadata() signature changed |
There was a problem hiding this comment.
Important [STRUCT] — Duplicates existing gotcha at L1080 ("QueryBasedTransformer needs workflow_id and workflow_run_id"). The existing entry uses a kwargs dict approach while this one says explicit positional args are required — a subtle contradiction. The explicit-args advice here is more precise.
Fix: Merge the TypeError detail and explicit-positional-args pattern into the existing gotcha at L1080 and remove this duplicate.
| self._chunk = 0 | ||
| self._rows = 0 | ||
|
|
||
| async def write_batches(self, batches): |
There was a problem hiding this comment.
Minor [DX] — These async def methods contain only synchronous code (df.to_parquet(), os.makedirs). Since this is a copy-paste template, migrators using it inside a @task would block the event loop.
Fix: Make them plain def, or add a note about wrapping to_parquet() with run_in_thread().
|
|
||
| Note: `application_sdk.services.secretstore.SecretStore` (v2) does not exist in v3. Any code that imports it will raise `ModuleNotFoundError`. | ||
|
|
||
| ### get_workflow_args must replicate v2 StateStore fallback |
There was a problem hiding this comment.
Minor [DX] — This is the only complex gotcha (5 steps) without a code example. Every other gotcha with comparable complexity includes a Fix code block. A migrator would have to reverse-engineer the v2 source to implement this.
Fix: Add a skeleton code example showing the v3 equivalent of get_workflow_args().
| ) | ||
| ``` | ||
|
|
||
| ### @task enforces typed Input/Output — no raw dicts |
There was a problem hiding this comment.
Minor [STRUCT] — Overlaps with existing "Payload safety and allow_unbounded_fields" gotcha at L873. Both describe dict rejection in task contracts with the same allow_unbounded_fields=True solution.
Fix: Merge the TaskContractError detail and the WorkflowArgsInput/StatsOutput wrapper examples into the existing entry at L873, or add a cross-reference.
|
|
||
| Use `ParquetFileReader` for reads (it works fine — reads local files first, only tries objectstore as fallback). | ||
|
|
||
| ### Empty nested column dirs cause 60s Dapr timeouts |
There was a problem hiding this comment.
Minor [QUAL] — process_column_data and transform_data are Glue-specific method names. A non-Glue migrator won't recognize them.
Fix: Add a qualifier: "This applies to connectors that write hierarchical column data across multiple relation-level subdirectories (e.g., Glue, Hive-style extractors)."
…d errors Critical: - Merge stale 'Credential resolution only via credential_guid' (get_secret) with new resolve_credential_raw gotcha; remove the duplicate entry Important: - Merge LocalParquetWriter into 'Write to local disk' gotcha; remove duplicate 'ParquetFileWriter require a running Dapr sidecar' entry - Merge transform_metadata TypeError detail into existing QueryBasedTransformer gotcha; remove duplicate 'signature changed' entry Minor: - Fix LocalParquetWriter async def -> def (methods are synchronous) - Add type annotation to _get_client workflow_args parameter - Add escape hatch note on allow_unbounded_fields=True in @task gotcha - Add cross-reference from @task gotcha to 'Payload safety' gotcha - Fix '@task raises TaskContractError at class definition time' -> 'at decoration time' - Add Glue-specific qualifier to empty column dirs gotcha - Add note to pyproject.toml gotcha about switching to main after v3 stabilizes
|
@sdk-review |
|
🔄 SDK Re-review starting (review) — ~10 min. Watch live progress |
SDK Re-review: PR #1267 — docs: migrate-v3 skill — 12 gotchas from Glue migrationVerdict: NEEDS FIXES
Delta from prior review
Findings by File
Strengths
|
Important: - Fix ATLAN_TEMPORAL_HOST env var: guard for host-only case (scale-tests sets it without port); append ATLAN_TEMPORAL_PORT when no ':' present Important (deferred): - Known Gotchas flat structure (33 entries, no sub-categories) noted as follow-up PR Minor fixes: - Rephrase QueryBasedTransformer kwargs description: not a TypeError from dict unpacking, but easy to accidentally omit the keys - Add type hints to LocalParquetWriter.write_batches() and close() - Show creds.get() usage after resolve_credential_raw() in _get_client example - Merge @task enforces typed I/O into Payload safety gotcha (removes overlap) - Add cross-reference between three Dapr-timeout gotchas - Add code skeleton to get_workflow_args StateStore fallback gotcha
|
@sdk-review |
|
🔄 SDK Re-review starting (review) — ~10 min. Watch live progress |
|
🔄 SDK Re-review starting (review) — ~10 min. Watch live progress |
|
🔄 SDK Re-review starting (review) — ~10 min. Watch live progress |
1 similar comment
|
🔄 SDK Re-review starting (review) — ~10 min. Watch live progress |
|
v3 relevance check: ✅ Still relevant — migrate-v3 skill gotchas from Glue migration. Updated 3 days ago. Valuable docs addition for the migration tooling. |
Summary
Test plan
🤖 Generated with Claude Code