Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 198 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -1317,6 +1317,204 @@ Context about the task.
| Standard | 15-25 | 150,000 | 600 |
| Complex | 30-50 | 300,000 | 1800 |

### Workflow Pages (Integrated Editor Views)

Workflow pages define interactive views that open from workflow steps in the web UI.
All media panels follow the **agent-as-editor model**: the UI is a viewer with
lightweight intent-capture controls. Editing requests are dispatched to the Claude
Code agent, which performs the actual file modifications (editing Motion Canvas
`.tsx` scenes, running `ffmpeg` commands, using ImageMagick/Pillow for images).

Pages are defined in the `pages` array in workflow frontmatter (YAML/TOML) and
displayed as clickable buttons on workflow steps.

#### Page Types

| Type | Component | Use Case |
|------|-----------|----------|
| `data-table` | Editable spreadsheet | Seed data, extracted results, configuration |
| `image` | Image viewer + intent toolbar | Generated images, charts, screenshots |
| `motion-canvas` | Rendered MC output viewer + intent toolbar | Animated graphics, data visualizations |
| `video` | Video player with trim/cut + intent toolbar | Individual video files |
| `video-sequence` | Multi-scene composition timeline | Composed videos (MC scenes + clips) |

#### Architecture: Agent-as-Editor

```
User clicks "Add Text" --> UI captures intent (what, where) -->
dispatches to agent --> agent edits .tsx / runs ffmpeg / uses Pillow -->
file changes --> panel polls mtime --> auto-refresh preview
```

Panels never modify files directly. The **IntentToolbar** (shared across image,
motion-canvas, video, and video-sequence panels) captures structured intents:

| Control | Intent Dispatched |
|---------|-------------------|
| Add Text | `{ action: "add_text", text, position, style: { font, size, color } }` |
| Add Shape | `{ action: "add_shape", shape_id, position, size, animated }` |
| Move | `{ action: "move_element", element_id, new_position }` |
| Resize | `{ action: "resize_element", element_id, new_size }` |
| Delete | `{ action: "delete_element", element_id }` |

Intents are converted to natural language prompts server-side and dispatched to
the agent via `POST /api/workflows/{id}/pages/{page_id}/edit`.

#### Video Composition with Motion Canvas

The `video-sequence` page type uses Motion Canvas as the composition layer.
Multiple scenes (MC animations + video clips) compose into a single video:

```yaml
pages:
- id: final-video
type: video-sequence
title: Final Presentation
step: compose_video
output_path: output/final.mp4
resolution: [1920, 1080]
scenes:
- id: intro
type: motion-canvas
title: Animated Intro
scene_path: scenes/intro.tsx
rendered_path: output/intro.mp4
- id: interview
type: clip
title: Interview Footage
source_path: footage/interview.mp4
trim_start: 5.0
trim_end: 30.0
```

#### Defining Pages in Markdown

```yaml
---
name: topic-research
title: Topic Research Workflow
agent:
model: claude-sonnet-4-20250514
max_turns: 25

pages:
- id: seed-topics
type: data-table
title: Seed Topics
step: research_topics
seed: true
data_path: data/topics.csv
columns:
- name: topic
label: Topic
type: text
required: true
- name: priority
label: Priority
type: select
options: [high, medium, low]

- id: output-chart
type: image
title: Research Chart
step: generate_chart
image_path: output/research-chart.png
editable: true
assets_dir: assets

- id: summary-animation
type: motion-canvas
title: Summary Animation
step: create_animation
scene_path: scenes/summary.tsx
output_path: output/summary.mp4
duration: 10
fps: 30

- id: presentation-video
type: video
title: Presentation
step: render_video
video_path: output/presentation.mp4
trim: true
max_duration: 120
---
```

#### Seed Data Tables

When a page has `seed: true`, it acts as input data for the workflow:

1. User opens the seed data table from the workflow step
2. Adds/edits rows (e.g., adding new research topics)
3. Clicks **Save** to persist changes to `data_path`
4. UI prompts: "Seed data updated. Run the workflow with new data?"
5. Clicking **Run Workflow** starts a new workflow execution with the updated data

#### Column Types

| Type | Input | Description |
|------|-------|-------------|
| `text` | Text input | Default, free-form text |
| `number` | Number input | Numeric values |
| `boolean` | Checkbox | True/false toggle |
| `url` | Text input | URL values |
| `date` | Text input | Date strings |
| `select` | Dropdown | Choose from `options` list |

#### File Refresh After Agent Edits

All media panels use the `useFileWatch` hook which polls the page data endpoint
every 2 seconds. When the file's `mtime` changes, the panel auto-refreshes the
preview. Cache-busting is handled via `?v=<timestamp>` query parameters on
file URLs.

#### API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/workflows/{id}/pages` | GET | List pages for a workflow |
| `/api/workflows/{id}/pages/{page_id}/data` | GET | Get page data (table rows, file metadata with mtime) |
| `/api/workflows/{id}/pages/{page_id}/data` | PUT | Update data-table rows |
| `/api/workflows/{id}/pages/{page_id}/run` | POST | Re-run workflow from seed data |
| `/api/workflows/{id}/pages/{page_id}/edit` | POST | Dispatch editing intents to agent |
| `/api/file/raw?path=<path>` | GET | Serve raw media files (video, image, font, SVG) |
| `/api/assets/shapes` | GET | Get SVG shape library manifest |
| `/api/assets/fonts` | GET | Get custom font library manifest |

#### Custom SVG Shapes

Store SVG shapes in `assets/shapes/` with a `manifest.json`:

```json
{
"shapes": [
{
"id": "arrow-expand",
"category": "arrows",
"title": "Expanding Arrow",
"svg_path": "arrows/arrow-expand.svg",
"animated": true
}
]
}
```

#### Custom Fonts

Store fonts in `assets/fonts/` with a `manifest.json`:

```json
{
"fonts": [
{ "id": "montserrat-bold", "family": "Montserrat", "weight": 700, "path": "Montserrat-Bold.woff2" }
]
}
```

Panels load fonts dynamically via the Font Loading API. The agent references
fonts by family name in MC scenes or by file path in ffmpeg/ImageMagick commands.

---

## Workflow Observability API
Expand Down
183 changes: 183 additions & 0 deletions src/kurt/web/api/intent_dispatch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
"""Intent-to-prompt conversion and dispatch for workflow page editing.

Converts structured editing intents from the UI into natural language prompts
for the Claude Code agent. Supports two dispatch modes:
- Chat dispatch: inject into existing Claude WebSocket session
- Background dispatch: start a new agent workflow execution
"""

from __future__ import annotations

from typing import Any, Optional


def build_edit_prompt(page: dict, intents: list[dict]) -> str:
"""Convert structured editing intents into a natural language prompt.

Args:
page: Page config dict (type, scene_path, video_path, image_path, etc.)
intents: List of intent dicts with action, position, text, etc.

Returns:
Natural language prompt string for the agent.
"""
page_type = page.get("type", "")
target_file = _get_target_file(page)

lines = []
lines.append(f"Edit the file at `{target_file}`:")
lines.append("")

for intent in intents:
line = _format_intent(intent, page_type)
if line:
lines.append(f"- {line}")

lines.append("")

# Add type-specific context
if page_type == "motion-canvas":
lines.append("This is a Motion Canvas scene file (.tsx). Use Motion Canvas 2D API.")
lines.append("Import shapes from '@motion-canvas/2d' and use generator functions for animations.")
assets_dir = page.get("assets_dir")
if assets_dir:
lines.append(f"Custom assets (shapes, fonts) are in `{assets_dir}/`.")
elif page_type == "video":
lines.append("Use ffmpeg CLI commands to apply the edits to the video file.")
lines.append("Preserve the original file as a backup before modifying.")
elif page_type == "video-sequence":
lines.append("This is a video sequence project using Motion Canvas as the composition layer.")
lines.append("Each scene is a .tsx file. Video clips are wrapped as MC Video elements.")
elif page_type == "image":
lines.append("Use ImageMagick or Python Pillow to apply the edits to the image file.")
lines.append("Preserve the original file as a backup before modifying.")
assets_dir = page.get("assets_dir")
if assets_dir:
lines.append(f"Custom fonts are in `{assets_dir}/fonts/` and shapes in `{assets_dir}/shapes/`.")

return "\n".join(lines)


def _get_target_file(page: dict) -> str:
"""Get the primary target file path for a page type."""
page_type = page.get("type", "")
if page_type == "motion-canvas":
return page.get("scene_path", "scene.tsx")
elif page_type == "video":
return page.get("video_path", "video.mp4")
elif page_type == "video-sequence":
return page.get("output_path", "output/final.mp4")
elif page_type == "image":
return page.get("image_path", "image.png")
return "unknown"


def _format_intent(intent: dict, page_type: str) -> str:
"""Format a single intent into a human-readable instruction."""
action = intent.get("action", "")

if action == "add_text":
text = intent.get("text", "")
pos = intent.get("position", {})
style = intent.get("style", {})
parts = [f"Add text '{text}'"]
if pos:
parts.append(f"at position ({pos.get('x', 0)}, {pos.get('y', 0)})")
if style.get("font"):
parts.append(f"using font '{style['font']}'")
if style.get("size"):
parts.append(f"size {style['size']}")
if style.get("color"):
parts.append(f"color {style['color']}")
time_range = intent.get("time_range")
if time_range and page_type in ("video", "motion-canvas", "video-sequence"):
parts.append(f"visible from {time_range.get('start', 0)}s to {time_range.get('end', 0)}s")
return " ".join(parts)

elif action == "add_shape":
shape_id = intent.get("shape_id", "shape")
pos = intent.get("position", {})
size = intent.get("size", {})
animated = intent.get("animated", False)
parts = [f"Add SVG shape '{shape_id}'"]
if pos:
parts.append(f"at ({pos.get('x', 0)}, {pos.get('y', 0)})")
if size:
parts.append(f"size {size.get('width', 100)}x{size.get('height', 100)}")
if animated:
parts.append("with entrance animation")
return " ".join(parts)

elif action == "move_element":
element_id = intent.get("element_id", "element")
pos = intent.get("position", {})
return f"Move element '{element_id}' to ({pos.get('x', 0)}, {pos.get('y', 0)})"

elif action == "resize_element":
element_id = intent.get("element_id", "element")
size = intent.get("size", {})
return f"Resize element '{element_id}' to {size.get('width', 100)}x{size.get('height', 100)}"

elif action == "delete_element":
element_id = intent.get("element_id", "element")
return f"Delete element '{element_id}'"

elif action == "trim":
time_range = intent.get("time_range", {})
return f"Trim to {time_range.get('start', 0)}s - {time_range.get('end', 0)}s"

elif action == "cut":
time_range = intent.get("time_range", {})
return f"Cut segment from {time_range.get('start', 0)}s to {time_range.get('end', 0)}s"

return f"Unknown action: {action}"


def dispatch_edit(
workflow_id: str,
page: dict,
prompt: str,
session_id: Optional[str] = None,
) -> dict[str, Any]:
"""Dispatch an editing prompt to the agent.

Args:
workflow_id: DBOS workflow ID
page: Page config dict
prompt: Natural language editing prompt
session_id: If provided, inject into existing Claude session

Returns:
dict with dispatch result (status, workflow_id or session info)
"""
if session_id:
# Mode A: Chat dispatch - inject into existing session
# This would use the StreamSession to send a message
return {
"status": "dispatched",
"mode": "chat",
"session_id": session_id,
"prompt": prompt,
}
else:
# Mode B: Background dispatch - start new agent workflow
try:
from kurt.workflows.agents import run_definition
from kurt.workflows.agents.registry import get_definition_for_workflow

definition = get_definition_for_workflow(workflow_id)
if not definition:
return {"status": "error", "detail": "Workflow definition not found"}

result = run_definition(
definition["name"],
inputs={"task": prompt},
background=True,
)
return {
"status": "started",
"mode": "background",
"workflow_id": result.get("workflow_id"),
}
except Exception as e:
return {"status": "error", "detail": str(e)}
Loading
Loading