Skip to content

feat(templates): add EntityDef for declarative entity extraction#1289

Open
AtMrun wants to merge 126 commits intomainfrom
feat/entity-registry
Open

feat(templates): add EntityDef for declarative entity extraction#1289
AtMrun wants to merge 126 commits intomainfrom
feat/entity-registry

Conversation

@AtMrun
Copy link
Copy Markdown
Collaborator

@AtMrun AtMrun commented Apr 13, 2026

Summary

  • Adds EntityDef dataclass (templates/entity.py) for declaring extractable entity types — name, SQL, endpoint, phase, enabled, timeout
  • Rewrites SqlMetadataExtractor.run() to use entity-driven phased orchestration instead of hardcoded 4-entity asyncio.gather
  • Entities in the same phase run concurrently; phases execute sequentially
  • _fetch_entity() dispatches to fetch_{name}() methods by convention
  • Full backward compat: empty entities list falls back to default 4 (databases, schemas, tables, columns)

Before (hardcoded 4 entities)

# Must override run() entirely to add stages, streams, etc.
db, schema, table, column = await asyncio.gather(
    self.fetch_databases(...),
    self.fetch_schemas(...),
    self.fetch_tables(...),
    self.fetch_columns(...),
)

After (declarative)

class SnowflakeExtractor(SqlMetadataExtractor):
    entities = [
        EntityDef(name="databases", phase=1),
        EntityDef(name="schemas",   phase=1),
        EntityDef(name="tables",    phase=1),
        EntityDef(name="columns",   phase=1),
        EntityDef(name="stages",    phase=2),
        EntityDef(name="streams",   phase=2),
    ]
    # No run() override needed. Adding an entity = one line.

Part of: One-Prompt Connectors proposal (Phase 2 of 5)

Test plan

  • 7 tests for EntityDef dataclass (defaults, frozen, custom fields, API fields)
  • 4 tests for _get_entities() (defaults, custom, disabled filtering)
  • 5 tests for _fetch_entity() (dispatch, custom result key, missing method, credential forwarding)
  • 1 test for phase grouping
  • All 37 existing SqlMetadataExtractor tests pass unchanged
  • Full suite: 1849 pass, 0 regressions
  • Pre-commit clean (ruff, pyright, isort)

🤖 Generated with Claude Code

cmgrote and others added 30 commits March 18, 2026 02:08
- Add InfrastructureContext ContextVar (infrastructure/context.py)
- Create Dapr/InMemory infrastructure in main.py for all run modes
- Add secret/storage fields and convenience methods to AppContext
- Inject infrastructure into task activities via get_infrastructure()
- Replace v2 EventStore in interceptors/events.py and worker.py with
  v3 _publish_event_via_binding(); remove is_component_registered Dapr
  health-check calls from the hot path
- Add secret/state fields to HandlerContext; inject from ContextVar in
  handler service
- Add DeprecationWarning to all five v2 services modules
- Add migration guide (docs/migration-guide-v3.md)
- Fix unit test hangs: set ATLAN_ENABLE_OBSERVABILITY_DAPR_SINK=false
  in conftest.py to prevent metrics flush from connecting to Dapr
  sidecar; remove stale DaprClient mock targeting defunct eventstore
  import path
- Add application_sdk/storage/ module: direct obstore I/O (get_bytes,
  put, delete, exists, list_keys, delete_prefix, normalize_key) with
  optional store param that auto-resolves from infrastructure context,
  v2-compatible aliases (get_content, delete_file, list_files), and
  SHA-256 sidecar-based deduplication in transfer.py
- Add App.upload / App.download @task methods backed by storage.transfer;
  lower-level transfer.upload / transfer.download available for direct
  use inside existing activities
- Generalise FileReference: storage_key → storage_path, add file_count
  (covers both single-file and directory refs), drop checksum/size_bytes/
  content_type (handled transparently via sidecars)
- Add UploadInput/UploadOutput/DownloadInput/DownloadOutput contracts
  replacing SyncLocalToStorageInput/Output and SyncFromStorageInput/Output
- Restore transparent FileReference materialise/persist hooks in
  create_activity_from_task() via file_ref_sync.py
- Remove StorageBinding/InMemoryBinding (superseded by obstore module);
  update deprecation warnings in services/ to point to App.upload/download
- Add obstore>=0.9.1 and pyyaml>=6.0.2 to core dependencies
Introduces a typed credential pipeline ported from the experimental SDK,
with a backward-compatible bridge so existing apps can migrate incrementally.

New packages:
- application_sdk/credentials/ — CredentialRef, Credential Protocol,
  built-in types (Basic, ApiKey, BearerToken, OAuthClient, Certificate),
  Git types (GitSsh, GitToken), Atlan types (AtlanApiToken, AtlanOAuthClient),
  CredentialTypeRegistry singleton, CredentialResolver (new + legacy paths),
  and credential error hierarchy (CRD error codes)
- application_sdk/test_utils/credentials.py — MockCredentialStore for tests

Integrations:
- handler/contracts.py: Credential renamed to HandlerCredential (alias kept)
- templates/contracts/sql_{metadata,query}.py: add credential_ref field
- app/context.py: add resolve_credential() and resolve_credential_raw()
- sql_{metadata,query}_extractor.py: bridge credential_ref ↔ credential_guid
- docs/migration-guide-v3.md: Step 7 — Migrate to Typed Credentials

88 new unit tests; all existing tests pass.
Migrates three proven patterns from the experimental app-framework:

- GitReference: frozen dataclass for git repo coordinates in workflow inputs,
  with optional CredentialRef and commit > tag > branch checkout precedence.
- create_async_atlan_client: async-only factory for pyatlan_v9 AsyncAtlanClient
  from AtlanApiToken (api_key) or AtlanOAuthClient (oauth params).
- AtlanClientMixin: mixin for App subclasses providing cached async Atlan client
  access via get_or_create_async_atlan_client, reusing validated-phase clients.

Both AtlanClientMixin and create_async_atlan_client are re-exported from
application_sdk.credentials and application_sdk.app for ergonomic imports.
Replace in-memory put/get_bytes with streaming upload_file/download_file
primitives backed by obstore's multipart writer and streaming GET. Adaptive
part sizing (_compute_part_size) keeps uploads under S3's 10,000-part limit.

- ops.py: add upload_file (open_writer_async), download_file (stream),
  _compute_part_size; rename put/_get_bytes to private _put/_get_bytes
- reference.py: persist_file_reference handles directories (rglob + per-file
  upload); materialize_file_reference uses list_keys to detect prefix vs
  single key, streaming download for both cases
- transfer.py: _upload_one/_download_one use upload_file/download_file
- file_ref_sync.py: _local_sidecar_ok returns False for directories so they
  always go through full materialize check
- __init__.py: export upload_file/download_file; drop put/get_bytes/get_content
- context.py: remove upload_bytes/download_bytes (no callers, promoted
  in-memory pattern); add storage property for direct streaming API access
- base.py: use context.storage property instead of context._storage
- service.py: /workflows/v1/file streams via asyncio.to_thread + upload_file
  instead of file.read() + _put
- tests: new TestUploadFile/TestDownloadFile in test_ops.py; new
  TestDirectoryPersistMaterialize and updated sidecar tests in
  test_file_ref_sync.py
Resolve 4 conflict-marker files:
- version.py, pyproject.toml: keep v3.0.0 and pyatlan>=9,<10
- CHANGELOG.md: v3.0.0 entry first, then v2.8.0–v2.7.0 from main
- uv.lock: regenerated via `uv lock` after resolving pyproject.toml

Migrate observability and atlan_storage from Dapr to v3 storage:
- observability.py: replace ObjectStore.upload_file (×2) with v3
  upload_file, replace DaprClient state calls with StateStore protocol,
  replace DaprClient invoke_binding delete with v3 delete; add
  lazy-cached _deployment_store / _upstream_store class attrs
- atlan_storage.py: replace ObjectStore.get_content + DaprClient
  invoke_binding with download_file + upload_file streaming through
  a temp file; replace ObjectStore.list_files with list_keys; add
  lazy-cached store attrs; streams through disk instead of loading
  entire file into memory
- Update test_atlan_storage.py to mock v3 storage functions instead
  of Dapr; remove stale DaprClient patch from conftest.py
Category E — Move behind execution layer:
- New canonical locations for all Temporal interceptors:
  execution/_temporal/interceptors/{events,cleanup,correlation_context,lock,activity_failure_logging}.py
- New canonical locations for lock + activity utilities:
  execution/_temporal/{lock_activities,activity_utils}.py
- contracts/events.py: pure Pydantic event models moved out of interceptors/
- execution/_temporal/worker.py: updated to import from canonical locations
- Old interceptors/* and activities/lock_management.py: replaced with
  re-export shims emitting DeprecationWarning

Category A — Refactor implementations:
- clients/sql.py, io/json.py, io/parquet.py: removed module-level
  `activity.logger = logger` side-effects and `from temporalio import activity`
- activities/common/models.py → common/models.py (with re-export shim)
- activities/common/sql_utils.py → common/sql_utils.py (with re-export shim)
- observability/utils.py: lazy-imports temporalio inside try blocks
  instead of at module level

Category C — Add DeprecationWarning to 12 v2 modules:
- workflows/__init__.py, workflows/metadata_extraction/__init__.py,
  workflows/metadata_extraction/incremental_sql.py,
  workflows/query_extraction/__init__.py
- activities/__init__.py
- worker.py
- clients/workflow.py, clients/temporal.py, clients/atlan.py
- handlers/__init__.py, handlers/base.py, handlers/sql.py

Test updates: updated mock patch targets in 5 test files to point at
canonical module locations post-move. Fixed pre-existing unused imports
in tests/unit/storage/test_transfer.py.

All 1797 unit tests pass. Pre-commit (ruff, ruff-format, isort, pyright) passes.
credentials/oauth.py — new OAuthTokenService:
- General-purpose OAuth 2.0 client_credentials token service
- Wraps OAuthClientCredential; handles acquire, cache, and refresh
- asyncio.Lock prevents concurrent exchanges
- 60s pre-expiry refresh buffer; current_expires_at property for callers
- OAuthTokenError for failed exchanges
- Exported from credentials/__init__.py

execution/_temporal/auth.py — TemporalAuthManager refactored:
- Delegates HTTP token exchange to OAuthTokenService (eliminates ~40
  lines of duplicate httpx code from acquire_initial_token + _do_refresh)
- Constructs OAuthTokenService lazily from TemporalAuthConfig
- Background refresh loop, sleep calculation, client.api_key update, and
  event emission all remain in TemporalAuthManager (Temporal-specific)

execution/_temporal/interceptors/events.py:
- Replaces per-call AtlanAuthClient() with a module-level OAuthTokenService
  singleton (_get_event_token_service) — fixes token cache being discarded
  on every event publish

clients/atlan_auth.py — DeprecationWarning added, pointing to
  credentials.OAuthTokenService as the v3 replacement.
…egories B, D, F)

Observability context refactor (ADR-0005):
- Add ExecutionContext frozen dataclass + ContextVar to observability/context.py
- Add ExecutionContextInterceptor (sets ContextVar once at workflow/activity start)
- Rewrite get_workflow_context() to read ContextVar — zero temporalio imports
- Remove top-level temporalio import from app/context.py; lazy-import inside methods
- Register ExecutionContextInterceptor unconditionally first in create_worker()

Deprecation warning fixes:
- cleanup.py: CleanupResult BaseModel → @DataClass (eliminates pydantic_data_converter warning)
- cleanup.py: fix build_output_path import (activities.common.utils → execution._temporal.activity_utils)
- contracts/events.py, observability/logger_adaptor.py: class Config → model_config dict
- observability/models.py: remove dead Config.parse_obj (never invoked externally)

v3 migration (categories B, D, F):
- templates: add BaseMetadataExtractor template and base_metadata_extraction contract
- testing: migrate scale_data_generator from test_utils; add mocks, fixtures
- test_utils: stub deprecated modules pointing to testing equivalents
- activities/common/models, metadata_extraction/base: v3 activity model updates
- common/models, sql_utils, contracts/types, io: v3 type and utility alignment

Tests:
- New: test_execution_context.py, test_execution_context_interceptor.py
- New: test_base_metadata_extractor.py
- Updated: test_logger_adaptor.py (set_execution_context replaces temporalio mocks)
The stderr sink format references {extra[logger_name]} but no default
was set for log calls that bypass AtlanLoggerAdapter.process() or
InterceptHandler.emit(). Mirrors the existing pattern used for
_trace_id_str by calling setdefault in the format callable.
…tor (H5)

- Add App.on_complete() called in workflow_run_wrapper finally block (before
  state teardown) so subclasses can run custom post-run logic
- Add App.cleanup_files() @task that deletes tracked FileReference local paths
  (+ .sha256 sidecars) and convention-based temp directories
- Track FileReference objects in _app_state during auto-materialize/persist in
  create_activity_from_task() via new _track_file_refs() helper
- Add CleanupInput/CleanupOutput contracts
- Add TRACKED_FILE_REFS_KEY constant
- Remove CleanupInterceptor from default worker interceptor chain; deprecate
  the module with DeprecationWarning
- Default InterceptorSettings.enable_cleanup_interceptor to False (interceptor
  no longer registered; cleanup runs via on_complete() instead)
- Fix obstore API: LocalStore(root_path=) -> LocalStore(prefix=),
  AzureStore(container=) -> AzureStore(container_name=)
…up (H6)

- Add StorageCleanupInput / StorageCleanupOutput contracts
- Add PROTECTED_STORAGE_PREFIXES constant (guards persistent-artifacts/)
- Add cleanup_storage task: deletes tracked file_refs/ objects + .sha256
  sidecars; opt-in run-scoped prefix deletion via include_prefix_cleanup
- Stream-and-delete with asyncio.Semaphore(20) to cap concurrency
- Update on_complete() to run cleanup_files + cleanup_storage concurrently
  via asyncio.gather; each failure is swallowed independently
- Add tests/unit/app/test_cleanup_storage.py (9 cases)
- Update test_on_complete.py and test_activities.py for new task
…hServer

- entrypoint.sh: replace dapr-run wrapper with daprd directly; enables
  --dapr-graceful-shutdown-seconds for KEDA scale-to-zero drain; robust
  PID tracking and ordered shutdown (Python first, then daprd); restore
  --metrics-port (3100) and --max-request-body-size (1024Mi)
- Dockerfile: align env vars to DAPR_* prefix; add DAPR_METRICS_PORT,
  DAPR_MAX_BODY_SIZE, DAPR_GRACEFUL_SHUTDOWN_SECONDS; keep bash for
  entrypoint shebang; update ENTRYPOINT to use script directly
- helm/atlan-app/: port full Helm chart from experimental-app-sdk with
  adaptations (image repo default, app.kubernetes.io/part-of label);
  fix dapr-eventstore gap — ConfigMap and projected volume mount are now
  conditional on eventstore.url being set
- application_sdk/server/health.py: add WorkerHealthServer (asyncio HTTP,
  no FastAPI dependency) with /health, /ready, /live endpoints for k8s
  liveness/readiness probes
- application_sdk/main.py: wire WorkerHealthServer into worker and
  combined modes; add health_port to AppConfig (--health-port /
  ATLAN_HEALTH_PORT, default 8081)
- .github/workflows/publish-helm-chart.yml: publish chart to GHCR OCI
  registry on push to main or helm/v* tags
- .github/workflows/pull_request.yaml: add label-gated helm-lint job
- tests/unit/server/test_health.py: 14 tests covering all endpoints,
  404/405, manual start/stop, activity tracking, convenience function
Ports e2e testing infrastructure from experimental-app-sdk as a
first-class package in application_sdk.testing.e2e, and removes the
OTel sidecar from the Helm chart in favour of direct DaemonSet emission.

Helm OTel simplification:
- Remove otel-collector sidecar from handler deployment
- Remove configmap-otel.yaml (sidecar config)
- Remove otelConfigMapName and workerOtelEndpoint helpers from _helpers.tpl
- Both handler and worker now emit OTLP directly to $(K8S_NODE_IP):4317
- K8S_NODE_IP (downward API status.hostIP) now always set on both deployments
- Remove otel.collector block from values.yaml

Testing infrastructure (application_sdk/testing/e2e/):
- AppConfig dataclass for deployment configuration
- AppDeployer: helm upgrade --install / uninstall with DeploymentError
- kube_http_call: ephemeral port-forward pattern for HTTP calls to services
- LogCollector: kubectl-based pod/event log collection (best-effort)
- run_workflow / wait_for_workflow: handler API helpers
- get_pods / wait_for_pods_ready / get_pod_logs: pod status helpers

Test plumbing:
- Move autouse mock fixtures from tests/conftest.py to tests/unit/conftest.py
  so e2e tests get real infrastructure
- Add tests/e2e/conftest.py with app_config, deployed_app, handler_call fixtures
- 26 new unit tests in tests/unit/testing/e2e/
- Register e2e pytest marker in pyproject.toml
- Add httpx to tests optional dependencies
- Add label-gated e2e-k8s CI job (kind cluster, helm install, pytest)
Introduces IncrementalSqlMetadataExtractor — a v3 typed-contract template
that replaces the v2 IncrementalSQLMetadataExtractionActivities pattern.

Key changes:
- `templates/contracts/incremental_sql.py`: IncrementalRunContext dataclass
  (with is_incremental_ready()), IncrementalExtractionInput/Output,
  IncrementalTaskInput, and typed per-task contracts for all six incremental
  phases (fetch_incremental_marker, read_current_state, prepare_column_queries,
  execute_single_column_batch, write_current_state, update_incremental_marker)
- `templates/incremental_sql_metadata_extractor.py`: Full v3 implementation
  with Phase 1-5 run() orchestration, chunked column batch fan-out
  (MAX_CONCURRENT_COLUMN_BATCHES=3, matching v2 semantics), and thorough
  docstrings for abstract tasks (especially fetch_tables) guiding implementers
- `templates/contracts/sql_metadata.py`: Replace workflow_args dict[str,Any]
  blobs with typed ExtractionTaskInput base; add FetchProceduresInput/Output;
  remove any dict[str,Any] smells from parent contracts
- `templates/sql_metadata_extractor.py`: Wire typed task inputs; add
  fetch_procedures task (raises NotImplementedError; not called from base run())
- `common/incremental/helpers.py`, `marker.py`, `state/state_reader.py`,
  `state/state_writer.py`: Replace workflow_args: Dict[str,Any] parameter
  blobs with explicit typed parameters
- `activities/metadata_extraction/incremental.py`: Add module-level
  DeprecationWarning pointing to IncrementalSqlMetadataExtractor
…ype tightening, lifecycle integration

- Phase 1a: Remove dead LogRecord + base parquet_sink (superseded by AtlanLoggerAdapter.parquet_sink)
- Phase 1b/1c: Extract parse_otel_resource_attributes and build_otel_resource to utils.py; remove three identical copies from logger/metrics/traces adaptors
- Phase 2a: Tighten AtlanObservability abstract method signatures (Any → T); buffer typed as list[dict[str, object]]
- Phase 2b: Tighten observability_decorator.py params from Any to concrete adapter types (TYPE_CHECKING)
- Phase 3: MetricType(str, Enum) → MetricType(SerializableEnum) — backwards-compatible, MetricType.COUNTER == "counter" preserved
- Phase 4: Move TraceRecord from traces_adaptor.py to models.py; re-export for backwards compat
- Phase 5a/5b: Add ClassVar annotations to _instances/_deployment_store/_upstream_store; add _reset_for_testing() to base + metrics + traces
- Phase 6a: Rename _flush_all_instances → flush_all (public classmethod)
- Phase 6b: Call AtlanObservability.flush_all() from App.on_complete() to flush on workflow completion
- Phase 7: Add __all__ re-exports to observability/__init__.py for context + correlation public surface
Add ATLAN_APP_BUILD_ID and ATLAN_APP_DEPLOYMENT_NAME constants (deprecating
the old TEMPORAL_* equivalents) and wire them into the Temporal worker setup:

- ATLAN_APP_BUILD_ID alone: legacy build-ID mode
- ATLAN_APP_BUILD_ID + ATLAN_APP_DEPLOYMENT_NAME: full Worker Deployment
  versioning via WorkerDeploymentConfig / WorkerDeploymentVersion

The worker_start lifecycle event now includes build_id and
use_worker_versioning fields so downstream consumers can observe the
versioning mode in use.
Add MCPServer.register_tools_from_registry() for v3 task discovery
(reads TaskRegistry instead of v2 WorkflowInterface/ActivitiesInterface
pairs) and wire it into create_app_handler_service() via a FastAPI
lifespan when ENABLE_MCP=true. Both run_handler_mode and
run_combined_mode in main.py inherit MCP support automatically.
The Deprecations section only listed 6 of the 40 deprecated v2 module
paths. Add the full set, grouped by category: application, worker,
workflows, activities, handlers, services, clients, interceptors,
test utilities, and within-v3 (CleanupInterceptor).
eventstore.py imported interceptors.models (a shim) instead of the
canonical contracts.events — triggering a second warning on top of
its own. statestore.py imported services.objectstore — same problem.

Fix: import from contracts.events directly in eventstore.py; wrap the
objectstore import in warnings.catch_warnings() in statestore.py.
Each deprecated module now emits exactly one DeprecationWarning.
…esting

entrypoint.sh:
- Switch shebang to /bin/sh and set -eu (no pipelines, bash not needed)
- Export all DAPR_* vars so Python child process sees them via os.environ
  (was falling back to InMemory mode because DAPR_HTTP_PORT was unset)
- Use uv run --no-sync to invoke Python (ensures virtualenv is active)
- Fix daprd flag: --max-request-body-size → --max-body-size (renamed in newer daprd)

Dockerfile:
- apk del bash alongside curl — no longer needed at runtime (Dapr CLI
  install uses it at build time only; entrypoint now uses /bin/sh)

helm/atlan-app:
- Add imagePullSecrets to handler and worker deployments (Kubed copies
  github-docker-registry into each namespace but pods must reference it)
- Default imagePullSecret.create=false, name=github-docker-registry to
  match what Kubed provides out of the box
- Add emptyDir /tmp mount to handler pod (uv needs writable /tmp to start)
- Fix dapr-eventstore ConfigMap: always create it so projected volume
  never fails with "configMap not found" when eventstore.url is unset
- Fix storage binding component name: app-storage → objectstore to match
  what create_store_from_binding() looks up in /app/components

application_sdk/infrastructure/_dapr/client.py:
- Fix binding metadata field: result.metadata → result.binding_metadata

docs/migration-guide-v3.md:
- Fix deprecated import in credential testing example
- Add Steps 3 (incremental SQL), 6 (worker setup), 10 (Atlan client),
  11 (heartbeating), 12 (lifecycle hooks), 13 (test utilities)
- Expand Quick Reference table from 9 to 14 rows
- Expand "Removed in v3.1.0" to full categorised table (was 6 items)
Signed-off-by: Chris (He/Him) <cgrote@gmail.com>
…ntics

- Check per-credential named cache first (was checking validated key
  first, causing wrong client to be returned for different cred refs)
- Claim the validated hand-off key: move to named slot and clear the
  well-known key so it cannot be consumed by a second caller
- Add debug logging throughout: client creation, cache hit, and
  validated-key reuse
- Add OAuth access_token warning comment to create_async_atlan_client
Add a new lint-helm-chart.yml workflow that runs on every push/PR
touching helm/atlan-app/**. Exercises three render scenarios:
default values, full feature set (TLS, auth, storage, eventstore,
imagePullSecret), and KEDA disabled.

Also gate publish-helm-chart.yml behind helm lint so a broken chart
can never be pushed to GHCR.
Replace the stale v2 architecture doc with a comprehensive v3 overview
covering the App + @task mental model, typed contracts, infrastructure
abstraction, handler/worker deployment split, storage, observability,
and module structure.

Add docs/adr/ with 11 Architecture Decision Records ported from the
experimental-app-sdk, adapted for application_sdk namespaces:
  0001 — per-app handlers vs uber-handler
  0002 — per-app workers vs uber-worker
  0003 — per-app observability with correlation-based tracing
  0004 — build-time type safety (dataclasses, not Pydantic)
  0005 — complete abstraction of Temporal and Dapr
  0006 — schema-driven contracts with additive evolution rules
  0007 — apps as the unit of inter-app coordination (child workflows)
  0008 — import-time payload safety validation (2MB Temporal limit)
  0009 — separate handler/worker deployments + env var convention
  0011 — async-first design; run_in_thread() for blocking code
  0013 — structured logging level conventions

Adaptations from experimental: OTel sidecar section removed (not
present in this SDK); workers export to $(K8S_NODE_IP):4317 central
collector; run_in_thread() exposed as self.task_context.run_in_thread().
… e2e test generation

New FAIL checks in check_migration.py:
- handler-typed-signatures: fires when a Handler subclass method still uses
  *args/**kwargs instead of typed AuthInput/PreflightInput/MetadataInput
- no-unbounded-escape-hatch: fires on any allow_unbounded_fields=True in
  connector code (escape hatch is SDK-internal only)

New WARN check:
- no-v2-directory-structure: fires when app/activities/ or app/workflows/
  directories are still present after migration

MIGRATION_PROMPT.md:
- §7: explicit prohibition on connector use of allow_unbounded_fields=True
- §9 (new): BaseTest → AppConfig/AppDeployer/run_workflow/wait_for_workflow
  before/after e2e test reference
- §10 (new): step-by-step directory consolidation instructions

SKILL.md:
- Phase 2b: hard constraint blocks for handler signatures and escape hatch
- Phase 2c (new): directory consolidation phase
- Phase 4b (new): e2e test generation phase
- Phase 5: summary template updated for new phases

tests/unit/tools/test_check_migration.py (new): 12 tests covering all 3 new
checks including the non-handler false-positive guard for handler-typed-signatures

README.md: updated FAIL/WARN tables and workflow diagram

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Always mount eventstore component; use bindings.localstorage no-op
  sink when url is empty so SDK calls never get "binding not found"
- Fix external credentials skip-guard to ignore type/base_url metadata
  fields, preventing innocent defaults from blocking OAuth credential
  creation
- Rename storage binding data key from app-storage.yaml to objectstore.yaml
- Add namespace.disableImageMapper flag for SAP image-mapper webhook
  clusters; add namespace.extraLabels for additional namespace labels
- Rename handler volume tmp -> handler-tmp to avoid ambiguity
Fills the gap in ADR numbering (0010, 0012 were absent):
- Rename 0011-async-first-blocking-code → 0010
- Rename 0013-logging-level-guidelines → 0011
- Update all cross-references in docs/guides/architecture.md

Removes the specific prohibition of Pydantic in ADR-0004. The decision
now prescribes strongly-typed contracts (dataclasses, Pydantic v2,
msgspec, etc.) — the implementation mechanism is left to the author
provided it supports pyright analysis and Temporal JSON serialisation.

Also pins obstore upper bound to <0.10.0 in pyproject.toml to prevent
silent breakage from Rust/Python binding API changes between minors.
vaibhavatlan and others added 24 commits April 8, 2026 12:35
Co-authored-by: Satabrata Paul <satabrata.paul@atlan.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Vaibhav Chopra <vaibhav.chopra@atlan.com>
Co-authored-by: Aryaman <56113566+Aryamanz29@users.noreply.github.com>
Co-authored-by: Aryamanz29 <aryamanz29@gmail.com>
…aft multi-file reads (#1186)

Co-authored-by: Aryamanz29 <aryamanz29@gmail.com>
…iles (#1225)

Co-authored-by: Aryamanz29 <aryamanz29@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Vaibhav Chopra <vaibhav.chopra@atlan.com>
…s dict to v2 SecretStore (#1226)

Co-authored-by: Vaibhav Chopra <vaibhav.chopra@atlan.com>
Co-authored-by: Aryaman <56113566+Aryamanz29@users.noreply.github.com>
…actionOutput AE fields (#1182)

Co-authored-by: Aryamanz29 <aryamanz29@gmail.com>
Co-authored-by: Aryaman <56113566+Aryamanz29@users.noreply.github.com>
#1277)

Co-authored-by: Aryaman <56113566+Aryamanz29@users.noreply.github.com>
Remove all deprecated v2 modules that have been superseded by v3
equivalents. This includes activities/, application/, workflows/,
handlers/, interceptors/, deprecated client files, worker.py,
the deprecated APIServer, and deprecated test utilities.

Key changes:
- Relocate workflows/outputs/ to application_sdk/outputs/ (active code)
- Update non-deprecated imports to use v3 paths (handler.base.Handler,
  contracts.events, execution._temporal.activity_utils)
- Remove deprecated register_tools() from MCP server (v3 uses
  register_tools_from_registry)
- Clean up server/fastapi/models.py (remove WorkflowTrigger classes)
- Delete ~23K lines of deprecated source and test code

Deferred (still actively used by v3 code):
- services/objectstore.py, statestore.py, secretstore.py
- test_utils/hypothesis/, test_utils/e2e/
- standards/documentation.md: replace deleted v2 module-to-doc mappings
  with v3 equivalents (app/, handler/, templates/, execution/, etc.)
- standards/exceptions.md: update directory paths (handlers/ -> handler/,
  workflows/ -> app/task.py)
- concepts/clients.md: remove deleted TemporalWorkflowClient,
  WorkflowClient, get_workflow_client sections; point to execution module
…MetadataExtractor

Introduces declarative entity definitions that replace hardcoded
asyncio.gather of 4 entities in SqlMetadataExtractor.run().

- EntityDef dataclass: name, sql, endpoint, phase, enabled, timeout
- Phased orchestration: entities grouped by phase, run concurrently
  within phase, sequentially across phases
- _fetch_entity() dispatches to fetch_{name}() methods by convention
- Full backward compat: empty entities list falls back to default 4
  (databases, schemas, tables, columns)
- Adding a new entity type is one line in the entities list

What a Snowflake extractor looks like:

    class SnowflakeExtractor(SqlMetadataExtractor):
        entities = [
            EntityDef(name="databases", phase=1),
            EntityDef(name="schemas",   phase=1),
            EntityDef(name="stages",    phase=2),
            EntityDef(name="streams",   phase=2),
        ]
@snykgituser
Copy link
Copy Markdown

snykgituser commented Apr 13, 2026

Snyk checks have failed. 3 issues have been found so far.

Status Scan Engine Critical High Medium Low Total (3)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues
Code Security 0 3 0 0 3 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

return True
finally:
if os.path.exists(tmp):
os.unlink(tmp)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  Path Traversal

Unsanitized input from an HTTP parameter flows into os.unlink, where it is used as a path. This may result in a Path Traversal vulnerability and allow an attacker to remove arbitrary files.

Line 1022 | CWE-23 | Priority score 814 | Learn more about this vulnerability
Data flow: 4 steps

Step 1 - 2

filename: str | None = Form(None),

Step 3 - 4


Commands
  • ⚡ To see AI-powered Snyk Agent Fix suggestions, reply with: @snyk /fix. You'll need to refresh the page 🔄

await _upload_file(key, tmp_path, _storage)
finally:
try:
os.unlink(tmp_path)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  Path Traversal

Unsanitized input from an HTTP parameter flows into os.unlink, where it is used as a path. This may result in a Path Traversal vulnerability and allow an attacker to remove arbitrary files.

Line 1155 | CWE-23 | Priority score 814 | Learn more about this vulnerability
Data flow: 18 steps

Step 1 - 2

filename: str | None = Form(None),

Step 3 - 4 application_sdk/handler/service.py#L1126

Step 5 - 9 application_sdk/handler/service.py#L1127

Step 10 - 16 application_sdk/handler/service.py#L1142

Step 17 - 18


Commands
  • ⚡ To see AI-powered Snyk Agent Fix suggestions, reply with: @snyk /fix. You'll need to refresh the page 🔄

return os.path.getsize(tmp_path)

file_size = await asyncio.to_thread(_drain_to_tmp)
await _upload_file(key, tmp_path, _storage)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  Path Traversal

Unsanitized input from an HTTP parameter flows into pathlib.Path, where it is used as a path. This may result in a Path Traversal vulnerability and allow an attacker to read arbitrary files.

Line 1152 | CWE-23 | Priority score 814 | Learn more about this vulnerability
Data flow: 20 steps

Step 1 - 2

filename: str | None = Form(None),

Step 3 - 4 application_sdk/handler/service.py#L1126

Step 5 - 9 application_sdk/handler/service.py#L1127

Step 10 - 16 application_sdk/handler/service.py#L1142

Step 17 application_sdk/handler/service.py#L1152

Step 18 application_sdk/storage/ops.py#L149

Step 19 - 20

path = Path(local_path)

@Aryamanz29
Copy link
Copy Markdown
Member

v3 relevance check: ✅ Active feature — EntityDef for declarative entity extraction. Note: duplicate PRs #1297 and #1298 were closed. This is the remaining one. @AtMrun is this still the active branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.