Skip to content

feat: support sourceDescriptions in operationId/operationPath#73

Open
displague wants to merge 5 commits intojentic:mainfrom
displague:sourceDescriptions_refs
Open

feat: support sourceDescriptions in operationId/operationPath#73
displague wants to merge 5 commits intojentic:mainfrom
displague:sourceDescriptions_refs

Conversation

@displague
Copy link
Copy Markdown

@displague displague commented Aug 28, 2025

I have multiple Openapi specs in my Arazzo spec and found that arazzo-runner was confused by the sourceDescriptions references.

arazzo-runner - ERROR - Error executing step foo: Operation bar not found in source descriptions

This PR adds support for those. Feel free to close this (AI slop?) and implement it directly, or let me know if there are specific criteria to make this mergeable and I'll accomodate.

Relevant snippets from the spec used in prompting:

  • example given in https://spec.openapis.org/arazzo/latest.html#arazzo-specification-object-example
  • https://spec.openapis.org/arazzo/latest.html#fixed-fields-3
    • operationId | string | The name of an existing, resolvable operation, as defined with a unique operationId and existing within one of the sourceDescriptions. The referenced operation will be invoked by this workflow step. If multiple (non arazzo type) sourceDescriptions are defined, then the operationId MUST be specified using a Runtime Expression (e.g., $sourceDescriptions..) to avoid ambiguity or potential clashes. This field is mutually exclusive of the operationPath and workflowId fields respectively.

    • operationPath | string | A reference to a Source Description Object combined with a JSON Pointer to reference an operation. This field is mutually exclusive of the operationId and workflowId fields respectively. The operation being referenced MUST be described within one of the sourceDescriptions descriptions. A Runtime Expression syntax MUST be used to identify the source description document. If the referenced operation has an operationId defined then the operationId SHOULD be preferred over the operationPath.

  • https://spec.openapis.org/arazzo/latest.html#examples

    Runtime expressions preserve the type of the referenced value. Expressions can be embedded into string values by surrounding the expression with {} curly braces.

I did not include dependsOn...sourceDescriptions or sourceDescriptions...workflowId support in this PR

The referenced operation will be invoked by this workflow step. If multiple (non arazzo type) sourceDescriptions are defined, then the operationId MUST be specified using a [Runtime Expression (https://spec.openapis.org/arazzo/latest.html#runtime-expressions) (e.g., $sourceDescriptions..)

I did not strictly abide by the MUST requirement. The existing lookup behavior is used when $sourceDescriptions is not specificied (presumably first matching operationId from first openapi spec)


🤖 Generated with GPT 5

  • feat: support $sourceDescriptions operationId references
  • feat: support sourceDescriptions with curly braces for operationPath refs

Copilot AI review requested due to automatic review settings August 28, 2025 14:05

This comment was marked as outdated.

@displague
Copy link
Copy Markdown
Author

My local version of python differs and I needed this local change to get pytest to run (excluded from the PR under the presumption that this isn't wanted)

diff --git a/runner/arazzo_runner/blob_store.py b/runner/arazzo_runner/blob_store.py
index a9e34f0..0d84d25 100644
--- a/runner/arazzo_runner/blob_store.py
+++ b/runner/arazzo_runner/blob_store.py
@@ -11,7 +11,7 @@ import logging
 import os
 import time
 import uuid
-from typing import Any, Protocol
+from typing import Any, Protocol, Optional, Dict, List
 
 # Configure logging
 logger = logging.getLogger("arazzo-runner.blob_store")
@@ -20,7 +20,7 @@ logger = logging.getLogger("arazzo-runner.blob_store")
 class BlobStore(Protocol):
     """Protocol for blob storage backends."""
 
-    def save(self, data: bytes, meta: dict[str, Any]) -> str:
+    def save(self, data: bytes, meta: Dict[str, Any]) -> str:
         """Save binary data and return a blob ID."""
         ...
 
@@ -28,7 +28,7 @@ class BlobStore(Protocol):
         """Load binary data by blob ID."""
         ...
 
-    def info(self, blob_id: str) -> dict[str, Any]:
+    def info(self, blob_id: str) -> Dict[str, Any]:
         """Get metadata for a blob."""
         ...
 
@@ -40,7 +40,7 @@ class BlobStore(Protocol):
 class LocalFileBlobStore:
     """File-based blob storage implementation."""
 
-    def __init__(self, root: str | None = None, janitor_after_h: int = 24):
+    def __init__(self, root: Optional[str] = None, janitor_after_h: int = 24):
         """
         Initialize the local file blob store.
 
@@ -182,10 +182,10 @@ class InMemoryBlobStore:
         Args:
             max_size: Maximum number of blobs to keep in memory
         """
-        self.blobs: dict[str, bytes] = {}
-        self.metadata: dict[str, dict[str, Any]] = {}
+        self.blobs: Dict[str, bytes] = {}
+        self.metadata: Dict[str, Dict[str, Any]] = {}
         self.max_size = max_size
-        self.access_order: list[str] = []  # For LRU eviction
+        self.access_order: List[str] = []  # For LRU eviction
 
     def _evict_if_needed(self) -> None:
         """Evict oldest blobs if we've exceeded max_size."""
@@ -194,7 +194,7 @@ class InMemoryBlobStore:
             self.blobs.pop(oldest_id, None)
             self.metadata.pop(oldest_id, None)
 
-    def save(self, data: bytes, meta: dict[str, Any]) -> str:
+    def save(self, data: bytes, meta: Dict[str, Any]) -> str:
         """Save binary data with metadata."""
         self._evict_if_needed()
 
diff --git a/runner/arazzo_runner/models.py b/runner/arazzo_runner/models.py
index e94e504..5585adc 100644
--- a/runner/arazzo_runner/models.py
+++ b/runner/arazzo_runner/models.py
@@ -6,13 +6,21 @@ This module defines the data models and enums used by the Arazzo Runner.
 """
 
 from dataclasses import dataclass, field
-from enum import Enum, StrEnum
-from typing import Any, Optional
+from enum import Enum
+from typing import Any, Optional, Dict, List
+
+# StrEnum was added in Python 3.11. Provide a lightweight fallback for older Pythons.
+try:
+    from enum import StrEnum  # type: ignore
+except Exception:  # pragma: no cover - fallback for older Python
+    class StrEnum(str, Enum):
+        def __str__(self) -> str:  # keep behavior similar to stdlib
+            return str(self.value)
 
 from pydantic import BaseModel, ConfigDict, Field
 
-OpenAPIDoc = dict[str, Any]
-ArazzoDoc = dict[str, Any]
+OpenAPIDoc = Dict[str, Any]
+ArazzoDoc = Dict[str, Any]
 
 
 class StepStatus(Enum):
@@ -69,10 +77,10 @@ class WorkflowExecutionResult:
 
     status: WorkflowExecutionStatus
     workflow_id: str
-    outputs: dict[str, Any] = field(default_factory=dict)
-    step_outputs: dict[str, dict[str, Any]] | None = None
-    inputs: dict[str, Any] | None = None
-    error: str | None = None
+    outputs: Dict[str, Any] = field(default_factory=dict)
+    step_outputs: Optional[Dict[str, Dict[str, Any]]] = None
+    inputs: Optional[Dict[str, Any]] = None
+    error: Optional[str] = None
 
 
 @dataclass
@@ -80,12 +88,12 @@ class ExecutionState:
     """Represents the current execution state of a workflow"""
 
     workflow_id: str
-    current_step_id: str | None = None
-    inputs: dict[str, Any] = None
-    step_outputs: dict[str, dict[str, Any]] = None
-    workflow_outputs: dict[str, Any] = None
-    dependency_outputs: dict[str, dict[str, Any]] = None
-    status: dict[str, StepStatus] = None
+    current_step_id: Optional[str] = None
+    inputs: Optional[Dict[str, Any]] = None
+    step_outputs: Optional[Dict[str, Dict[str, Any]]] = None
+    workflow_outputs: Optional[Dict[str, Any]] = None
+    dependency_outputs: Optional[Dict[str, Dict[str, Any]]] = None
+    status: Optional[Dict[str, StepStatus]] = None
     runtime_params: Optional["RuntimeParams"] = None
 
     def __post_init__(self):
@@ -105,9 +113,9 @@ class ExecutionState:
 class ServerVariable(BaseModel):
     """Represents a variable for server URL template substitution."""
 
-    description: str | None = None
-    default_value: str | None = Field(None, alias="default")
-    enum_values: list[str] | None = Field(None, alias="enum")
+    description: Optional[str] = None
+    default_value: Optional[str] = Field(None, alias="default")
+    enum_values: Optional[List[str]] = Field(None, alias="enum")
 
     model_config = ConfigDict(populate_by_name=True, extra="allow")
 
@@ -116,9 +124,9 @@ class ServerConfiguration(BaseModel):
     """Represents an API server configuration with a templated URL and variables."""
 
     url_template: str = Field(alias="url")
-    description: str | None = None
-    variables: dict[str, ServerVariable] = Field(default_factory=dict)
-    api_title_prefix: str | None = None  # Derived from spec's info.title
+    description: Optional[str] = None
+    variables: Dict[str, ServerVariable] = Field(default_factory=dict)
+    api_title_prefix: Optional[str] = None  # Derived from spec's info.title
 
     model_config = ConfigDict(populate_by_name=True, extra="allow")
 
@@ -128,6 +136,6 @@ class RuntimeParams(BaseModel):
     Container for all runtime parameters that may influence workflow or operation execution.
     """
 
-    servers: dict[str, str] | None = Field(
+    servers: Optional[Dict[str, str]] = Field(
         default=None, description="Server variable overrides for server resolution."
     )

@char0n
Copy link
Copy Markdown
Member

char0n commented Aug 28, 2025

Hi @displague,

Thanks for your contribution! We appreciate you taking the time to work on this feature. We'll review it and get back to you with our thoughts.

Signed-off-by: Marques Johansson <mjohansson@equinix.com>
Signed-off-by: Marques Johansson <mjohansson@equinix.com>
Signed-off-by: Marques Johansson <mjohansson@equinix.com>
@displague displague force-pushed the sourceDescriptions_refs branch from c4fb15b to 7691a5b Compare August 28, 2025 21:58
Signed-off-by: Marques Johansson <mjohansson@equinix.com>
@displague displague requested a review from Copilot August 28, 2025 22:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for Arazzo specification sourceDescriptions references in both operationId and operationPath fields. The changes enable the arazzo-runner to properly resolve operations when multiple OpenAPI specs are defined in an Arazzo workflow, addressing confusion when source descriptions references are used.

  • Added support for $sourceDescriptions.<name>.<operationId> syntax in operationId fields
  • Enhanced operationPath handling to support runtime expressions with sourceDescriptions references
  • Implemented braced runtime expression evaluation for operationPath fields

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
runner/arazzo_runner/executor/operation_finder.py Added sourceDescriptions parsing logic to find_by_id and enhanced operationPath resolution in get_operations_for_workflow
runner/tests/executor/test_operation_finder.py Added comprehensive test coverage for new sourceDescriptions functionality

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

decoded_path = encoded_path.replace("~1", "/").replace("~0", "~")
# Normalize leading slashes: decoded_path may start with multiple slashes
# because encoded_path often begins with a leading '/'. Ensure a single leading slash.
decoded_path = "/" + decoded_path.lstrip("/")
Copy link

Copilot AI Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path normalization logic is duplicated from existing code and could cause unexpected behavior. The comment mentions "multiple slashes" but the logic strips all leading slashes then adds one back, which could incorrectly modify paths that legitimately start with multiple slashes. Consider extracting this to a dedicated path normalization utility function to ensure consistent behavior across the codebase.

Copilot uses AI. Check for mistakes.
Comment on lines +834 to +845
(
name
for name, desc in self.source_descriptions.items()
if desc.get("url") and (
desc.get("url") == resolved_left
or resolved_left.endswith(desc.get("url"))
or desc.get("url") in resolved_left
or resolved_left in desc.get("url")
)
),
None,
)
Copy link

Copilot AI Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL matching logic is overly permissive and could lead to false positives. Using substring matching (in operator) on both directions could match unintended sources. For example, if one source has URL "api.com" and another has "myapi.com", both would match "api.com". Consider using more precise matching logic such as exact matches or proper URL parsing.

Copilot uses AI. Check for mistakes.
Comment on lines +812 to +815
eval_state = ExecutionState(workflow_id="__internal__")

def _eval_braced(m: re.Match) -> str:
expr = m.group(1)
Copy link

Copilot AI Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating an ExecutionState with a hardcoded workflow_id "internal" for expression evaluation may not provide the proper context for runtime expressions. This could lead to incorrect evaluation results if the expressions depend on actual workflow state. Consider passing the actual workflow context or creating a more appropriate evaluation context.

Suggested change
eval_state = ExecutionState(workflow_id="__internal__")
def _eval_braced(m: re.Match) -> str:
expr = m.group(1)
workflow_id = workflow.get("id", "__internal__")
eval_state = ExecutionState(workflow_id=workflow_id)
def _eval_braced(m: re.Match) -> str:

Copilot uses AI. Check for mistakes.
@char0n char0n added enhancement New feature or request runner labels Aug 29, 2025
@char0n char0n self-requested a review August 29, 2025 11:00
@char0n char0n self-assigned this Aug 29, 2025
@char0n
Copy link
Copy Markdown
Member

char0n commented Sep 1, 2025

My local version of python differs and I needed this local change to get pytest to run (excluded from the PR under the presumption that this isn't wanted)

Hi @displague,

This project requires Python 3.11 or higher. You're probably using older version?

@char0n
Copy link
Copy Markdown
Member

char0n commented Sep 1, 2025

I did not strictly abide by the MUST requirement. The existing lookup behavior is used when $sourceDescriptions is not specificied (presumably first matching operationId from first openapi spec)

That's fine. This can be a lenient fallback, first matching operation from one of the sources (in the order as defined in the Arazzo Document) SHOULD be matched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request runner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants