Skip to content

UX: Show live detailed logs during fetch operations #37

@boringdata

Description

@boringdata

Issue Description

During kurt content fetch operations, the CLI shows only a spinning progress bar without any detailed information about what's happening:

⠏ Fetching content ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0%

The detailed logs (document ID, title, size, errors) only appear after all fetches complete. This provides poor visibility during long-running operations and makes it hard to:

  • See which documents are being processed
  • Identify issues as they occur
  • Understand progress beyond just the spinner

Current Behavior

During fetch:

⠏ Fetching content ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0%
(no logs visible)

After fetch completes:

✓ Fetched [04303ee5] Article Title (45.3KB)
✓ Fetched [12ab34cd] Another Article (23.1KB)
✗ Fetch failed [56ef78gh] Connection timeout...

Expected Behavior

Show live logs during the fetch operation in a scrolling window:

⠏ Fetching content ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  40%

Recent activity:
✓ Fetched [04303ee5] Article Title (45.3KB)
⠋ Fetching [12ab34cd] from https://example.com/article-2
✓ Fetched [12ab34cd] Another Article (23.1KB)
⠋ Generating embedding for [56ef78gh]
✗ Fetch failed [56ef78gh] Connection timeout
... (scrolling window of last 10 operations)

Root Cause

The workflow logs are written using logger.info() but aren't piped to the LiveProgressDisplay in real-time:

Workflow logging (workflow.py:242-283):

logger.info(f"Resolving document: {identifier}")
logger.info(f"Fetching content for {doc_id} from {doc_info['source_url']}")
logger.info(f"Generating embedding for {doc_id}")
logger.info(f"Saving document {doc_id} to database")
logger.info(f"Extracting links for {doc_id}")

These logs exist but are only displayed after completion (fetch.py:544-560).

Proposed Solution

Option 1: DBOS Event Streaming (Recommended)

Use DBOS's built-in event system to stream progress updates to the CLI:

# In workflow.py
@DBOS.step()
def fetch_content_step(...):
    DBOS.set_event("status", "fetching")
    DBOS.set_event("url", source_url)
    # ... fetch ...
    DBOS.set_event("status", "completed")
    DBOS.set_event("size_kb", len(content) / 1024)

# In fetch.py
with LiveProgressDisplay(console, max_log_lines=10) as display:
    # Poll DBOS events and update display
    for event in poll_workflow_events(workflow_id):
        if event["status"] == "fetching":
            display.log(f"⠋ Fetching from {event['url']}")
        elif event["status"] == "completed":
            display.log(f"✓ Fetched ({event['size_kb']:.1f}KB)")

Option 2: Progress Callback

Pass a callback function to the workflow to report progress:

def fetch_batch_workflow(identifiers, progress_callback=None):
    for identifier in identifiers:
        if progress_callback:
            progress_callback({"stage": "fetching", "doc_id": identifier})
        result = fetch_document_workflow(identifier)
        if progress_callback:
            progress_callback({"stage": "completed", "result": result})

Option 3: Log Handler Hook

Create a custom log handler that pipes workflow logs to the display:

class LiveDisplayLogHandler(logging.Handler):
    def __init__(self, display):
        self.display = display
        
    def emit(self, record):
        if "Fetched" in record.msg:
            self.display.log(record.msg, style="dim green")

# Attach to workflow logger
workflow_logger.addHandler(LiveDisplayLogHandler(display))

Benefits

  • Better UX: Users can see progress in real-time
  • Faster debugging: Errors visible immediately, not after batch completes
  • Progress visibility: See which documents are slow/stuck
  • Confidence: Users know the system is working, not frozen

Impact

  • Currently: No visibility during fetch operations (looks frozen for large batches)
  • Improved: Live feedback with document-level progress and error visibility

Related Files

  • src/kurt/content/fetch/workflow.py:240-370 - Workflow logging
  • src/kurt/commands/content/fetch.py:521-563 - CLI display logic
  • src/kurt/commands/content/_live_display.py:198-300 - LiveProgressDisplay class

Similar Issue

The same problem likely affects other long-running operations:

  • kurt content index (metadata extraction)
  • kurt content map with --max-depth (crawling)
  • Background workflows when monitored via CLI

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions