fix: add MCP tool timeout + running state to prevent silent stall after hub_repo_details (#127) by parthwhy · Pull Request #190 · huggingface/ml-intern

parthwhy · 2026-04-29T19:25:40Z

Problem

When running ml-intern headless from CLI, the agent silently stalls
for 5+ minutes after a hub_repo_details tool call with zero progress
output. The user has no way to know if the agent is working or frozen.

Fixes #127

Root Cause

Two overlapping bugs:

1. No timeout on MCP tool calls (agent/core/tools.py)
hub_repo_details is an MCP tool served by huggingface.co/mcp. The
call await self.mcp_client.call_tool(tool_name, arguments) had zero
timeout — if the HF Hub API is slow or rate-limiting, this await hangs
indefinitely, blocking the entire agent loop.

2. No progress event emitted during tool execution (agent/core/agent_loop.py)
After tool_call fires (showing ▸ hub_repo_details in terminal),
_exec_tool went silent until results arrived. No tool_state_change: running event was emitted for non-approval tools, so the CLI showed
nothing for the entire duration of the stall.

Changes

agent/core/tools.py

Added import asyncio
Added MCP_TOOL_TIMEOUT = 30 constant (easy to tune)
Wrapped mcp_client.call_tool() with asyncio.wait_for(timeout=MCP_TOOL_TIMEOUT)
Added except asyncio.TimeoutError handler that returns a clear error
message to the agent so it can retry or continue instead of hanging

agent/core/agent_loop.py

Emit tool_state_change: running event inside _exec_tool immediately
before call_tool() so the CLI/frontend shows the tool is actively
executing

Before / After

Before:
▸ hub_repo_details {"repo_ids": ["openai/whisper-small"]}
[5 minutes of silence → user hits Ctrl+C]

After:
▸ hub_repo_details {"repo_ids": ["openai/whisper-small"]} [running]
[if slow]: Tool 'hub_repo_details' timed out after 30s — agent continues

Testing

Reproduced locally with the command from the issue:
ml-intern "Fine-tune a small Whisper model for Arabic speech recognition
on a GPU under 5 dollars and compare the fine-tuned model's metrics to
the original model."
Agent no longer stalls silently after hub_repo_details.

…e#127)

fglogan · 2026-05-03T14:57:37Z

closed per maintainer request

fix: add timeout + running state for MCP tool calls (fixes huggingfac…

694239b

…e#127)

parthwhy mentioned this pull request Apr 29, 2026

Agent appears to stall for minutes after hub_repo_details with no progress logs #127

Open

lewtun mentioned this pull request May 4, 2026

ML Intern backlog prioritization report - 2026-05-04 #223

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add MCP tool timeout + running state to prevent silent stall after hub_repo_details (#127)#190

fix: add MCP tool timeout + running state to prevent silent stall after hub_repo_details (#127)#190
parthwhy wants to merge 1 commit intohuggingface:mainfrom
parthwhy:fix/hub-repo-details-stall-127

parthwhy commented Apr 29, 2026

Uh oh!

fglogan commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

parthwhy commented Apr 29, 2026

Problem

Root Cause

Changes

Before / After

Testing

Uh oh!

fglogan commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants