-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Issue Description
During kurt content fetch operations, the CLI shows only a spinning progress bar without any detailed information about what's happening:
⠏ Fetching content ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0%
The detailed logs (document ID, title, size, errors) only appear after all fetches complete. This provides poor visibility during long-running operations and makes it hard to:
- See which documents are being processed
- Identify issues as they occur
- Understand progress beyond just the spinner
Current Behavior
During fetch:
⠏ Fetching content ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0%
(no logs visible)
After fetch completes:
✓ Fetched [04303ee5] Article Title (45.3KB)
✓ Fetched [12ab34cd] Another Article (23.1KB)
✗ Fetch failed [56ef78gh] Connection timeout...
Expected Behavior
Show live logs during the fetch operation in a scrolling window:
⠏ Fetching content ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40%
Recent activity:
✓ Fetched [04303ee5] Article Title (45.3KB)
⠋ Fetching [12ab34cd] from https://example.com/article-2
✓ Fetched [12ab34cd] Another Article (23.1KB)
⠋ Generating embedding for [56ef78gh]
✗ Fetch failed [56ef78gh] Connection timeout
... (scrolling window of last 10 operations)
Root Cause
The workflow logs are written using logger.info() but aren't piped to the LiveProgressDisplay in real-time:
Workflow logging (workflow.py:242-283):
logger.info(f"Resolving document: {identifier}")
logger.info(f"Fetching content for {doc_id} from {doc_info['source_url']}")
logger.info(f"Generating embedding for {doc_id}")
logger.info(f"Saving document {doc_id} to database")
logger.info(f"Extracting links for {doc_id}")These logs exist but are only displayed after completion (fetch.py:544-560).
Proposed Solution
Option 1: DBOS Event Streaming (Recommended)
Use DBOS's built-in event system to stream progress updates to the CLI:
# In workflow.py
@DBOS.step()
def fetch_content_step(...):
DBOS.set_event("status", "fetching")
DBOS.set_event("url", source_url)
# ... fetch ...
DBOS.set_event("status", "completed")
DBOS.set_event("size_kb", len(content) / 1024)
# In fetch.py
with LiveProgressDisplay(console, max_log_lines=10) as display:
# Poll DBOS events and update display
for event in poll_workflow_events(workflow_id):
if event["status"] == "fetching":
display.log(f"⠋ Fetching from {event['url']}")
elif event["status"] == "completed":
display.log(f"✓ Fetched ({event['size_kb']:.1f}KB)")Option 2: Progress Callback
Pass a callback function to the workflow to report progress:
def fetch_batch_workflow(identifiers, progress_callback=None):
for identifier in identifiers:
if progress_callback:
progress_callback({"stage": "fetching", "doc_id": identifier})
result = fetch_document_workflow(identifier)
if progress_callback:
progress_callback({"stage": "completed", "result": result})Option 3: Log Handler Hook
Create a custom log handler that pipes workflow logs to the display:
class LiveDisplayLogHandler(logging.Handler):
def __init__(self, display):
self.display = display
def emit(self, record):
if "Fetched" in record.msg:
self.display.log(record.msg, style="dim green")
# Attach to workflow logger
workflow_logger.addHandler(LiveDisplayLogHandler(display))Benefits
- Better UX: Users can see progress in real-time
- Faster debugging: Errors visible immediately, not after batch completes
- Progress visibility: See which documents are slow/stuck
- Confidence: Users know the system is working, not frozen
Impact
- Currently: No visibility during fetch operations (looks frozen for large batches)
- Improved: Live feedback with document-level progress and error visibility
Related Files
src/kurt/content/fetch/workflow.py:240-370- Workflow loggingsrc/kurt/commands/content/fetch.py:521-563- CLI display logicsrc/kurt/commands/content/_live_display.py:198-300- LiveProgressDisplay class
Similar Issue
The same problem likely affects other long-running operations:
kurt content index(metadata extraction)kurt content mapwith--max-depth(crawling)- Background workflows when monitored via CLI