Hard to troubleshoot, agent stays running for hour

## Question

What is the correct way to troubleshoot things in this system, I have a big job going on, it was in the middle of it, then I submitted a couple more job, and everything stop working and task were running but nothing was getting done.  There is a need for more visibility, I try to see if LiteLLM was still working it was frozen, restarting it didn't help, I then check the SWE-AF agents there was no easy way to see if they were doing anything.  I restarted them but the system didn't unblock.

I restarted agentfield, and still, they show up as running and nothing were really getting things done.  I had to delete the request but now the question is, how do I resume that work?

I think a queue system, with a max execution in concurrent is needed, and some way to understand what is the problem is it the agent, the llm, maybe have a agent health system, and an llm health system.  Even the agent node it not consistently showing up as up/down properly.

It feels like the UI was put on with AI but not really tested/used in real world scenario.  I do feel this project to be very interesting and would even like to a participate.

## Context



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hard to troubleshoot, agent stays running for hour #316

Question

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hard to troubleshoot, agent stays running for hour #316

Description

Question

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions