feat(templates): add EntityDef for declarative entity extraction#1289
feat(templates): add EntityDef for declarative entity extraction#1289
Conversation
- Add InfrastructureContext ContextVar (infrastructure/context.py) - Create Dapr/InMemory infrastructure in main.py for all run modes - Add secret/storage fields and convenience methods to AppContext - Inject infrastructure into task activities via get_infrastructure() - Replace v2 EventStore in interceptors/events.py and worker.py with v3 _publish_event_via_binding(); remove is_component_registered Dapr health-check calls from the hot path - Add secret/state fields to HandlerContext; inject from ContextVar in handler service - Add DeprecationWarning to all five v2 services modules - Add migration guide (docs/migration-guide-v3.md) - Fix unit test hangs: set ATLAN_ENABLE_OBSERVABILITY_DAPR_SINK=false in conftest.py to prevent metrics flush from connecting to Dapr sidecar; remove stale DaprClient mock targeting defunct eventstore import path
- Add application_sdk/storage/ module: direct obstore I/O (get_bytes, put, delete, exists, list_keys, delete_prefix, normalize_key) with optional store param that auto-resolves from infrastructure context, v2-compatible aliases (get_content, delete_file, list_files), and SHA-256 sidecar-based deduplication in transfer.py - Add App.upload / App.download @task methods backed by storage.transfer; lower-level transfer.upload / transfer.download available for direct use inside existing activities - Generalise FileReference: storage_key → storage_path, add file_count (covers both single-file and directory refs), drop checksum/size_bytes/ content_type (handled transparently via sidecars) - Add UploadInput/UploadOutput/DownloadInput/DownloadOutput contracts replacing SyncLocalToStorageInput/Output and SyncFromStorageInput/Output - Restore transparent FileReference materialise/persist hooks in create_activity_from_task() via file_ref_sync.py - Remove StorageBinding/InMemoryBinding (superseded by obstore module); update deprecation warnings in services/ to point to App.upload/download - Add obstore>=0.9.1 and pyyaml>=6.0.2 to core dependencies
Introduces a typed credential pipeline ported from the experimental SDK,
with a backward-compatible bridge so existing apps can migrate incrementally.
New packages:
- application_sdk/credentials/ — CredentialRef, Credential Protocol,
built-in types (Basic, ApiKey, BearerToken, OAuthClient, Certificate),
Git types (GitSsh, GitToken), Atlan types (AtlanApiToken, AtlanOAuthClient),
CredentialTypeRegistry singleton, CredentialResolver (new + legacy paths),
and credential error hierarchy (CRD error codes)
- application_sdk/test_utils/credentials.py — MockCredentialStore for tests
Integrations:
- handler/contracts.py: Credential renamed to HandlerCredential (alias kept)
- templates/contracts/sql_{metadata,query}.py: add credential_ref field
- app/context.py: add resolve_credential() and resolve_credential_raw()
- sql_{metadata,query}_extractor.py: bridge credential_ref ↔ credential_guid
- docs/migration-guide-v3.md: Step 7 — Migrate to Typed Credentials
88 new unit tests; all existing tests pass.
Migrates three proven patterns from the experimental app-framework: - GitReference: frozen dataclass for git repo coordinates in workflow inputs, with optional CredentialRef and commit > tag > branch checkout precedence. - create_async_atlan_client: async-only factory for pyatlan_v9 AsyncAtlanClient from AtlanApiToken (api_key) or AtlanOAuthClient (oauth params). - AtlanClientMixin: mixin for App subclasses providing cached async Atlan client access via get_or_create_async_atlan_client, reusing validated-phase clients. Both AtlanClientMixin and create_async_atlan_client are re-exported from application_sdk.credentials and application_sdk.app for ergonomic imports.
Replace in-memory put/get_bytes with streaming upload_file/download_file primitives backed by obstore's multipart writer and streaming GET. Adaptive part sizing (_compute_part_size) keeps uploads under S3's 10,000-part limit. - ops.py: add upload_file (open_writer_async), download_file (stream), _compute_part_size; rename put/_get_bytes to private _put/_get_bytes - reference.py: persist_file_reference handles directories (rglob + per-file upload); materialize_file_reference uses list_keys to detect prefix vs single key, streaming download for both cases - transfer.py: _upload_one/_download_one use upload_file/download_file - file_ref_sync.py: _local_sidecar_ok returns False for directories so they always go through full materialize check - __init__.py: export upload_file/download_file; drop put/get_bytes/get_content - context.py: remove upload_bytes/download_bytes (no callers, promoted in-memory pattern); add storage property for direct streaming API access - base.py: use context.storage property instead of context._storage - service.py: /workflows/v1/file streams via asyncio.to_thread + upload_file instead of file.read() + _put - tests: new TestUploadFile/TestDownloadFile in test_ops.py; new TestDirectoryPersistMaterialize and updated sidecar tests in test_file_ref_sync.py
Resolve 4 conflict-marker files: - version.py, pyproject.toml: keep v3.0.0 and pyatlan>=9,<10 - CHANGELOG.md: v3.0.0 entry first, then v2.8.0–v2.7.0 from main - uv.lock: regenerated via `uv lock` after resolving pyproject.toml Migrate observability and atlan_storage from Dapr to v3 storage: - observability.py: replace ObjectStore.upload_file (×2) with v3 upload_file, replace DaprClient state calls with StateStore protocol, replace DaprClient invoke_binding delete with v3 delete; add lazy-cached _deployment_store / _upstream_store class attrs - atlan_storage.py: replace ObjectStore.get_content + DaprClient invoke_binding with download_file + upload_file streaming through a temp file; replace ObjectStore.list_files with list_keys; add lazy-cached store attrs; streams through disk instead of loading entire file into memory - Update test_atlan_storage.py to mock v3 storage functions instead of Dapr; remove stale DaprClient patch from conftest.py
Category E — Move behind execution layer:
- New canonical locations for all Temporal interceptors:
execution/_temporal/interceptors/{events,cleanup,correlation_context,lock,activity_failure_logging}.py
- New canonical locations for lock + activity utilities:
execution/_temporal/{lock_activities,activity_utils}.py
- contracts/events.py: pure Pydantic event models moved out of interceptors/
- execution/_temporal/worker.py: updated to import from canonical locations
- Old interceptors/* and activities/lock_management.py: replaced with
re-export shims emitting DeprecationWarning
Category A — Refactor implementations:
- clients/sql.py, io/json.py, io/parquet.py: removed module-level
`activity.logger = logger` side-effects and `from temporalio import activity`
- activities/common/models.py → common/models.py (with re-export shim)
- activities/common/sql_utils.py → common/sql_utils.py (with re-export shim)
- observability/utils.py: lazy-imports temporalio inside try blocks
instead of at module level
Category C — Add DeprecationWarning to 12 v2 modules:
- workflows/__init__.py, workflows/metadata_extraction/__init__.py,
workflows/metadata_extraction/incremental_sql.py,
workflows/query_extraction/__init__.py
- activities/__init__.py
- worker.py
- clients/workflow.py, clients/temporal.py, clients/atlan.py
- handlers/__init__.py, handlers/base.py, handlers/sql.py
Test updates: updated mock patch targets in 5 test files to point at
canonical module locations post-move. Fixed pre-existing unused imports
in tests/unit/storage/test_transfer.py.
All 1797 unit tests pass. Pre-commit (ruff, ruff-format, isort, pyright) passes.
credentials/oauth.py — new OAuthTokenService: - General-purpose OAuth 2.0 client_credentials token service - Wraps OAuthClientCredential; handles acquire, cache, and refresh - asyncio.Lock prevents concurrent exchanges - 60s pre-expiry refresh buffer; current_expires_at property for callers - OAuthTokenError for failed exchanges - Exported from credentials/__init__.py execution/_temporal/auth.py — TemporalAuthManager refactored: - Delegates HTTP token exchange to OAuthTokenService (eliminates ~40 lines of duplicate httpx code from acquire_initial_token + _do_refresh) - Constructs OAuthTokenService lazily from TemporalAuthConfig - Background refresh loop, sleep calculation, client.api_key update, and event emission all remain in TemporalAuthManager (Temporal-specific) execution/_temporal/interceptors/events.py: - Replaces per-call AtlanAuthClient() with a module-level OAuthTokenService singleton (_get_event_token_service) — fixes token cache being discarded on every event publish clients/atlan_auth.py — DeprecationWarning added, pointing to credentials.OAuthTokenService as the v3 replacement.
…egories B, D, F) Observability context refactor (ADR-0005): - Add ExecutionContext frozen dataclass + ContextVar to observability/context.py - Add ExecutionContextInterceptor (sets ContextVar once at workflow/activity start) - Rewrite get_workflow_context() to read ContextVar — zero temporalio imports - Remove top-level temporalio import from app/context.py; lazy-import inside methods - Register ExecutionContextInterceptor unconditionally first in create_worker() Deprecation warning fixes: - cleanup.py: CleanupResult BaseModel → @DataClass (eliminates pydantic_data_converter warning) - cleanup.py: fix build_output_path import (activities.common.utils → execution._temporal.activity_utils) - contracts/events.py, observability/logger_adaptor.py: class Config → model_config dict - observability/models.py: remove dead Config.parse_obj (never invoked externally) v3 migration (categories B, D, F): - templates: add BaseMetadataExtractor template and base_metadata_extraction contract - testing: migrate scale_data_generator from test_utils; add mocks, fixtures - test_utils: stub deprecated modules pointing to testing equivalents - activities/common/models, metadata_extraction/base: v3 activity model updates - common/models, sql_utils, contracts/types, io: v3 type and utility alignment Tests: - New: test_execution_context.py, test_execution_context_interceptor.py - New: test_base_metadata_extractor.py - Updated: test_logger_adaptor.py (set_execution_context replaces temporalio mocks)
The stderr sink format references {extra[logger_name]} but no default
was set for log calls that bypass AtlanLoggerAdapter.process() or
InterceptHandler.emit(). Mirrors the existing pattern used for
_trace_id_str by calling setdefault in the format callable.
…tor (H5) - Add App.on_complete() called in workflow_run_wrapper finally block (before state teardown) so subclasses can run custom post-run logic - Add App.cleanup_files() @task that deletes tracked FileReference local paths (+ .sha256 sidecars) and convention-based temp directories - Track FileReference objects in _app_state during auto-materialize/persist in create_activity_from_task() via new _track_file_refs() helper - Add CleanupInput/CleanupOutput contracts - Add TRACKED_FILE_REFS_KEY constant - Remove CleanupInterceptor from default worker interceptor chain; deprecate the module with DeprecationWarning - Default InterceptorSettings.enable_cleanup_interceptor to False (interceptor no longer registered; cleanup runs via on_complete() instead) - Fix obstore API: LocalStore(root_path=) -> LocalStore(prefix=), AzureStore(container=) -> AzureStore(container_name=)
…up (H6) - Add StorageCleanupInput / StorageCleanupOutput contracts - Add PROTECTED_STORAGE_PREFIXES constant (guards persistent-artifacts/) - Add cleanup_storage task: deletes tracked file_refs/ objects + .sha256 sidecars; opt-in run-scoped prefix deletion via include_prefix_cleanup - Stream-and-delete with asyncio.Semaphore(20) to cap concurrency - Update on_complete() to run cleanup_files + cleanup_storage concurrently via asyncio.gather; each failure is swallowed independently - Add tests/unit/app/test_cleanup_storage.py (9 cases) - Update test_on_complete.py and test_activities.py for new task
…hServer - entrypoint.sh: replace dapr-run wrapper with daprd directly; enables --dapr-graceful-shutdown-seconds for KEDA scale-to-zero drain; robust PID tracking and ordered shutdown (Python first, then daprd); restore --metrics-port (3100) and --max-request-body-size (1024Mi) - Dockerfile: align env vars to DAPR_* prefix; add DAPR_METRICS_PORT, DAPR_MAX_BODY_SIZE, DAPR_GRACEFUL_SHUTDOWN_SECONDS; keep bash for entrypoint shebang; update ENTRYPOINT to use script directly - helm/atlan-app/: port full Helm chart from experimental-app-sdk with adaptations (image repo default, app.kubernetes.io/part-of label); fix dapr-eventstore gap — ConfigMap and projected volume mount are now conditional on eventstore.url being set - application_sdk/server/health.py: add WorkerHealthServer (asyncio HTTP, no FastAPI dependency) with /health, /ready, /live endpoints for k8s liveness/readiness probes - application_sdk/main.py: wire WorkerHealthServer into worker and combined modes; add health_port to AppConfig (--health-port / ATLAN_HEALTH_PORT, default 8081) - .github/workflows/publish-helm-chart.yml: publish chart to GHCR OCI registry on push to main or helm/v* tags - .github/workflows/pull_request.yaml: add label-gated helm-lint job - tests/unit/server/test_health.py: 14 tests covering all endpoints, 404/405, manual start/stop, activity tracking, convenience function
Ports e2e testing infrastructure from experimental-app-sdk as a first-class package in application_sdk.testing.e2e, and removes the OTel sidecar from the Helm chart in favour of direct DaemonSet emission. Helm OTel simplification: - Remove otel-collector sidecar from handler deployment - Remove configmap-otel.yaml (sidecar config) - Remove otelConfigMapName and workerOtelEndpoint helpers from _helpers.tpl - Both handler and worker now emit OTLP directly to $(K8S_NODE_IP):4317 - K8S_NODE_IP (downward API status.hostIP) now always set on both deployments - Remove otel.collector block from values.yaml Testing infrastructure (application_sdk/testing/e2e/): - AppConfig dataclass for deployment configuration - AppDeployer: helm upgrade --install / uninstall with DeploymentError - kube_http_call: ephemeral port-forward pattern for HTTP calls to services - LogCollector: kubectl-based pod/event log collection (best-effort) - run_workflow / wait_for_workflow: handler API helpers - get_pods / wait_for_pods_ready / get_pod_logs: pod status helpers Test plumbing: - Move autouse mock fixtures from tests/conftest.py to tests/unit/conftest.py so e2e tests get real infrastructure - Add tests/e2e/conftest.py with app_config, deployed_app, handler_call fixtures - 26 new unit tests in tests/unit/testing/e2e/ - Register e2e pytest marker in pyproject.toml - Add httpx to tests optional dependencies - Add label-gated e2e-k8s CI job (kind cluster, helm install, pytest)
Introduces IncrementalSqlMetadataExtractor — a v3 typed-contract template that replaces the v2 IncrementalSQLMetadataExtractionActivities pattern. Key changes: - `templates/contracts/incremental_sql.py`: IncrementalRunContext dataclass (with is_incremental_ready()), IncrementalExtractionInput/Output, IncrementalTaskInput, and typed per-task contracts for all six incremental phases (fetch_incremental_marker, read_current_state, prepare_column_queries, execute_single_column_batch, write_current_state, update_incremental_marker) - `templates/incremental_sql_metadata_extractor.py`: Full v3 implementation with Phase 1-5 run() orchestration, chunked column batch fan-out (MAX_CONCURRENT_COLUMN_BATCHES=3, matching v2 semantics), and thorough docstrings for abstract tasks (especially fetch_tables) guiding implementers - `templates/contracts/sql_metadata.py`: Replace workflow_args dict[str,Any] blobs with typed ExtractionTaskInput base; add FetchProceduresInput/Output; remove any dict[str,Any] smells from parent contracts - `templates/sql_metadata_extractor.py`: Wire typed task inputs; add fetch_procedures task (raises NotImplementedError; not called from base run()) - `common/incremental/helpers.py`, `marker.py`, `state/state_reader.py`, `state/state_writer.py`: Replace workflow_args: Dict[str,Any] parameter blobs with explicit typed parameters - `activities/metadata_extraction/incremental.py`: Add module-level DeprecationWarning pointing to IncrementalSqlMetadataExtractor
…ype tightening, lifecycle integration - Phase 1a: Remove dead LogRecord + base parquet_sink (superseded by AtlanLoggerAdapter.parquet_sink) - Phase 1b/1c: Extract parse_otel_resource_attributes and build_otel_resource to utils.py; remove three identical copies from logger/metrics/traces adaptors - Phase 2a: Tighten AtlanObservability abstract method signatures (Any → T); buffer typed as list[dict[str, object]] - Phase 2b: Tighten observability_decorator.py params from Any to concrete adapter types (TYPE_CHECKING) - Phase 3: MetricType(str, Enum) → MetricType(SerializableEnum) — backwards-compatible, MetricType.COUNTER == "counter" preserved - Phase 4: Move TraceRecord from traces_adaptor.py to models.py; re-export for backwards compat - Phase 5a/5b: Add ClassVar annotations to _instances/_deployment_store/_upstream_store; add _reset_for_testing() to base + metrics + traces - Phase 6a: Rename _flush_all_instances → flush_all (public classmethod) - Phase 6b: Call AtlanObservability.flush_all() from App.on_complete() to flush on workflow completion - Phase 7: Add __all__ re-exports to observability/__init__.py for context + correlation public surface
Add ATLAN_APP_BUILD_ID and ATLAN_APP_DEPLOYMENT_NAME constants (deprecating the old TEMPORAL_* equivalents) and wire them into the Temporal worker setup: - ATLAN_APP_BUILD_ID alone: legacy build-ID mode - ATLAN_APP_BUILD_ID + ATLAN_APP_DEPLOYMENT_NAME: full Worker Deployment versioning via WorkerDeploymentConfig / WorkerDeploymentVersion The worker_start lifecycle event now includes build_id and use_worker_versioning fields so downstream consumers can observe the versioning mode in use.
Add MCPServer.register_tools_from_registry() for v3 task discovery (reads TaskRegistry instead of v2 WorkflowInterface/ActivitiesInterface pairs) and wire it into create_app_handler_service() via a FastAPI lifespan when ENABLE_MCP=true. Both run_handler_mode and run_combined_mode in main.py inherit MCP support automatically.
The Deprecations section only listed 6 of the 40 deprecated v2 module paths. Add the full set, grouped by category: application, worker, workflows, activities, handlers, services, clients, interceptors, test utilities, and within-v3 (CleanupInterceptor).
eventstore.py imported interceptors.models (a shim) instead of the canonical contracts.events — triggering a second warning on top of its own. statestore.py imported services.objectstore — same problem. Fix: import from contracts.events directly in eventstore.py; wrap the objectstore import in warnings.catch_warnings() in statestore.py. Each deprecated module now emits exactly one DeprecationWarning.
…esting entrypoint.sh: - Switch shebang to /bin/sh and set -eu (no pipelines, bash not needed) - Export all DAPR_* vars so Python child process sees them via os.environ (was falling back to InMemory mode because DAPR_HTTP_PORT was unset) - Use uv run --no-sync to invoke Python (ensures virtualenv is active) - Fix daprd flag: --max-request-body-size → --max-body-size (renamed in newer daprd) Dockerfile: - apk del bash alongside curl — no longer needed at runtime (Dapr CLI install uses it at build time only; entrypoint now uses /bin/sh) helm/atlan-app: - Add imagePullSecrets to handler and worker deployments (Kubed copies github-docker-registry into each namespace but pods must reference it) - Default imagePullSecret.create=false, name=github-docker-registry to match what Kubed provides out of the box - Add emptyDir /tmp mount to handler pod (uv needs writable /tmp to start) - Fix dapr-eventstore ConfigMap: always create it so projected volume never fails with "configMap not found" when eventstore.url is unset - Fix storage binding component name: app-storage → objectstore to match what create_store_from_binding() looks up in /app/components application_sdk/infrastructure/_dapr/client.py: - Fix binding metadata field: result.metadata → result.binding_metadata docs/migration-guide-v3.md: - Fix deprecated import in credential testing example - Add Steps 3 (incremental SQL), 6 (worker setup), 10 (Atlan client), 11 (heartbeating), 12 (lifecycle hooks), 13 (test utilities) - Expand Quick Reference table from 9 to 14 rows - Expand "Removed in v3.1.0" to full categorised table (was 6 items)
Signed-off-by: Chris (He/Him) <cgrote@gmail.com>
…ntics - Check per-credential named cache first (was checking validated key first, causing wrong client to be returned for different cred refs) - Claim the validated hand-off key: move to named slot and clear the well-known key so it cannot be consumed by a second caller - Add debug logging throughout: client creation, cache hit, and validated-key reuse - Add OAuth access_token warning comment to create_async_atlan_client
Add a new lint-helm-chart.yml workflow that runs on every push/PR touching helm/atlan-app/**. Exercises three render scenarios: default values, full feature set (TLS, auth, storage, eventstore, imagePullSecret), and KEDA disabled. Also gate publish-helm-chart.yml behind helm lint so a broken chart can never be pushed to GHCR.
Replace the stale v2 architecture doc with a comprehensive v3 overview covering the App + @task mental model, typed contracts, infrastructure abstraction, handler/worker deployment split, storage, observability, and module structure. Add docs/adr/ with 11 Architecture Decision Records ported from the experimental-app-sdk, adapted for application_sdk namespaces: 0001 — per-app handlers vs uber-handler 0002 — per-app workers vs uber-worker 0003 — per-app observability with correlation-based tracing 0004 — build-time type safety (dataclasses, not Pydantic) 0005 — complete abstraction of Temporal and Dapr 0006 — schema-driven contracts with additive evolution rules 0007 — apps as the unit of inter-app coordination (child workflows) 0008 — import-time payload safety validation (2MB Temporal limit) 0009 — separate handler/worker deployments + env var convention 0011 — async-first design; run_in_thread() for blocking code 0013 — structured logging level conventions Adaptations from experimental: OTel sidecar section removed (not present in this SDK); workers export to $(K8S_NODE_IP):4317 central collector; run_in_thread() exposed as self.task_context.run_in_thread().
… e2e test generation New FAIL checks in check_migration.py: - handler-typed-signatures: fires when a Handler subclass method still uses *args/**kwargs instead of typed AuthInput/PreflightInput/MetadataInput - no-unbounded-escape-hatch: fires on any allow_unbounded_fields=True in connector code (escape hatch is SDK-internal only) New WARN check: - no-v2-directory-structure: fires when app/activities/ or app/workflows/ directories are still present after migration MIGRATION_PROMPT.md: - §7: explicit prohibition on connector use of allow_unbounded_fields=True - §9 (new): BaseTest → AppConfig/AppDeployer/run_workflow/wait_for_workflow before/after e2e test reference - §10 (new): step-by-step directory consolidation instructions SKILL.md: - Phase 2b: hard constraint blocks for handler signatures and escape hatch - Phase 2c (new): directory consolidation phase - Phase 4b (new): e2e test generation phase - Phase 5: summary template updated for new phases tests/unit/tools/test_check_migration.py (new): 12 tests covering all 3 new checks including the non-handler false-positive guard for handler-typed-signatures README.md: updated FAIL/WARN tables and workflow diagram Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Always mount eventstore component; use bindings.localstorage no-op sink when url is empty so SDK calls never get "binding not found" - Fix external credentials skip-guard to ignore type/base_url metadata fields, preventing innocent defaults from blocking OAuth credential creation - Rename storage binding data key from app-storage.yaml to objectstore.yaml - Add namespace.disableImageMapper flag for SAP image-mapper webhook clusters; add namespace.extraLabels for additional namespace labels - Rename handler volume tmp -> handler-tmp to avoid ambiguity
Fills the gap in ADR numbering (0010, 0012 were absent): - Rename 0011-async-first-blocking-code → 0010 - Rename 0013-logging-level-guidelines → 0011 - Update all cross-references in docs/guides/architecture.md Removes the specific prohibition of Pydantic in ADR-0004. The decision now prescribes strongly-typed contracts (dataclasses, Pydantic v2, msgspec, etc.) — the implementation mechanism is left to the author provided it supports pyright analysis and Temporal JSON serialisation. Also pins obstore upper bound to <0.10.0 in pyproject.toml to prevent silent breakage from Rust/Python binding API changes between minors.
Co-authored-by: Satabrata Paul <satabrata.paul@atlan.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Vaibhav Chopra <vaibhav.chopra@atlan.com> Co-authored-by: Aryaman <56113566+Aryamanz29@users.noreply.github.com> Co-authored-by: Aryamanz29 <aryamanz29@gmail.com>
…aft multi-file reads (#1186) Co-authored-by: Aryamanz29 <aryamanz29@gmail.com>
…iles (#1225) Co-authored-by: Aryamanz29 <aryamanz29@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Vaibhav Chopra <vaibhav.chopra@atlan.com>
…s dict to v2 SecretStore (#1226) Co-authored-by: Vaibhav Chopra <vaibhav.chopra@atlan.com>
Co-authored-by: Aryaman <56113566+Aryamanz29@users.noreply.github.com>
…actionOutput AE fields (#1182) Co-authored-by: Aryamanz29 <aryamanz29@gmail.com> Co-authored-by: Aryaman <56113566+Aryamanz29@users.noreply.github.com>
#1277) Co-authored-by: Aryaman <56113566+Aryamanz29@users.noreply.github.com>
Remove all deprecated v2 modules that have been superseded by v3 equivalents. This includes activities/, application/, workflows/, handlers/, interceptors/, deprecated client files, worker.py, the deprecated APIServer, and deprecated test utilities. Key changes: - Relocate workflows/outputs/ to application_sdk/outputs/ (active code) - Update non-deprecated imports to use v3 paths (handler.base.Handler, contracts.events, execution._temporal.activity_utils) - Remove deprecated register_tools() from MCP server (v3 uses register_tools_from_registry) - Clean up server/fastapi/models.py (remove WorkflowTrigger classes) - Delete ~23K lines of deprecated source and test code Deferred (still actively used by v3 code): - services/objectstore.py, statestore.py, secretstore.py - test_utils/hypothesis/, test_utils/e2e/
…eprecated-v2-modules
- standards/documentation.md: replace deleted v2 module-to-doc mappings with v3 equivalents (app/, handler/, templates/, execution/, etc.) - standards/exceptions.md: update directory paths (handlers/ -> handler/, workflows/ -> app/task.py) - concepts/clients.md: remove deleted TemporalWorkflowClient, WorkflowClient, get_workflow_client sections; point to execution module
…MetadataExtractor
Introduces declarative entity definitions that replace hardcoded
asyncio.gather of 4 entities in SqlMetadataExtractor.run().
- EntityDef dataclass: name, sql, endpoint, phase, enabled, timeout
- Phased orchestration: entities grouped by phase, run concurrently
within phase, sequentially across phases
- _fetch_entity() dispatches to fetch_{name}() methods by convention
- Full backward compat: empty entities list falls back to default 4
(databases, schemas, tables, columns)
- Adding a new entity type is one line in the entities list
What a Snowflake extractor looks like:
class SnowflakeExtractor(SqlMetadataExtractor):
entities = [
EntityDef(name="databases", phase=1),
EntityDef(name="schemas", phase=1),
EntityDef(name="stages", phase=2),
EntityDef(name="streams", phase=2),
]
⛔ Snyk checks have failed. 3 issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
| return True | ||
| finally: | ||
| if os.path.exists(tmp): | ||
| os.unlink(tmp) |
There was a problem hiding this comment.
Path Traversal
Unsanitized input from an HTTP parameter flows into os.unlink, where it is used as a path. This may result in a Path Traversal vulnerability and allow an attacker to remove arbitrary files.
Line 1022 | CWE-23 | Priority score 814 | Learn more about this vulnerability
Data flow: 4 steps
Step 1 - 2
Step 3 - 4
Commands
- ⚡ To see AI-powered Snyk Agent Fix suggestions, reply with:
@snyk /fix. You'll need to refresh the page 🔄
| await _upload_file(key, tmp_path, _storage) | ||
| finally: | ||
| try: | ||
| os.unlink(tmp_path) |
There was a problem hiding this comment.
Path Traversal
Unsanitized input from an HTTP parameter flows into os.unlink, where it is used as a path. This may result in a Path Traversal vulnerability and allow an attacker to remove arbitrary files.
Line 1155 | CWE-23 | Priority score 814 | Learn more about this vulnerability
Data flow: 18 steps
Step 1 - 2
Step 3 - 4 application_sdk/handler/service.py#L1126
Step 5 - 9 application_sdk/handler/service.py#L1127
Step 10 - 16 application_sdk/handler/service.py#L1142
Step 17 - 18
Commands
- ⚡ To see AI-powered Snyk Agent Fix suggestions, reply with:
@snyk /fix. You'll need to refresh the page 🔄
| return os.path.getsize(tmp_path) | ||
|
|
||
| file_size = await asyncio.to_thread(_drain_to_tmp) | ||
| await _upload_file(key, tmp_path, _storage) |
There was a problem hiding this comment.
Path Traversal
Unsanitized input from an HTTP parameter flows into pathlib.Path, where it is used as a path. This may result in a Path Traversal vulnerability and allow an attacker to read arbitrary files.
Line 1152 | CWE-23 | Priority score 814 | Learn more about this vulnerability
Data flow: 20 steps
Step 1 - 2
Step 3 - 4 application_sdk/handler/service.py#L1126
Step 5 - 9 application_sdk/handler/service.py#L1127
Step 10 - 16 application_sdk/handler/service.py#L1142
Step 17 application_sdk/handler/service.py#L1152
Step 18 application_sdk/storage/ops.py#L149
Step 19 - 20
Summary
EntityDefdataclass (templates/entity.py) for declaring extractable entity types — name, SQL, endpoint, phase, enabled, timeoutSqlMetadataExtractor.run()to use entity-driven phased orchestration instead of hardcoded 4-entityasyncio.gather_fetch_entity()dispatches tofetch_{name}()methods by conventionentitieslist falls back to default 4 (databases, schemas, tables, columns)Before (hardcoded 4 entities)
After (declarative)
Part of: One-Prompt Connectors proposal (Phase 2 of 5)
Test plan
EntityDefdataclass (defaults, frozen, custom fields, API fields)_get_entities()(defaults, custom, disabled filtering)_fetch_entity()(dispatch, custom result key, missing method, credential forwarding)SqlMetadataExtractortests pass unchanged🤖 Generated with Claude Code