Skip to content

ERROR - engine.py:781 - Task <EngineMainLoop> failed #3459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
KDD2018 opened this issue Apr 21, 2025 · 2 comments
Open

ERROR - engine.py:781 - Task <EngineMainLoop> failed #3459

KDD2018 opened this issue Apr 21, 2025 · 2 comments
Assignees

Comments

@KDD2018
Copy link

KDD2018 commented Apr 21, 2025

I get an error when I did an inference with InternVL3-78B-AWQ model. How to solve it ?

lmdeploy==0.7.3

`model_path = "/home/ai-admin/llm-models/InternVL3-78B-AWQ"
image_path = "./downloads"

prompts = []
for file in os.listdir(image_path)[:2]:
image = load_image(os.path.join(image_path, file))
prompts.append(('请详细描述图片内容。', image))

pipe = pipeline(
model_path,
backend_config=PytorchEngineConfig(session_len=16384, tp=2),
chat_template_config=ChatTemplateConfig(model_name='internvl2_5')
)
response = pipe(prompts)`

ERROR:

`
2025-04-21 10:43:58,013 - lmdeploy - ERROR - engine.py:781 - Task failed
Traceback (most recent call last):
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 827, in async_loop
await self._async_loop_main(resp_que=resp_que, has_runable_event=has_runable_event)
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 762, in _async_loop_main
out = await self.executor.get_output_async()
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 339, in get_output_async
return await self.remote_outs.get()
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/asyncio/queues.py", line 159, in get
await getter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 776, in __task_callback
task.result()
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 829, in async_loop
self._loop_finally()
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 793, in _loop_finally
self.executor.release()
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 308, in release
self.collective_rpc('exit')
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 243, in collective_rpc
return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout)
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 243, in
return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout)
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/ray/actor.py", line 1549, in getattr
raise AttributeError(
AttributeError: 'ActorHandle' object has no attribute 'exit'
unhandled exception during worker thread shutdown
task: <Task finished name='EngineMainLoop' coro=<Engine.async_loop() done, defined at /home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py:795> exception=AttributeError("'ActorHandle' object has no attribute 'exit'")>
Traceback (most recent call last):
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 827, in async_loop
await self._async_loop_main(resp_que=resp_que, has_runable_event=has_runable_event)
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 762, in _async_loop_main
out = await self.executor.get_output_async()
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 339, in get_output_async
return await self.remote_outs.get()
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/asyncio/queues.py", line 159, in get
await getter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 776, in __task_callback
task.result()
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 829, in async_loop
self._loop_finally()
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 793, in _loop_finally
self.executor.release()
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 308, in release
self.collective_rpc('exit')
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 243, in collective_rpc
return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout)
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 243, in
return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout)
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/ray/actor.py", line 1549, in getattr
raise AttributeError(
AttributeError: 'ActorHandle' object has no attribute 'exit'
(RayWorkerWrapper pid=451466) loc("/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/kernels/cuda/pagedattention.py":207:11): error: operation scheduled before its operands
Future exception was never retrieved
future: <Future finished exception=AttributeError("'ActorHandle' object has no attribute 'exit'")>
Traceback (most recent call last):
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 827, in async_loop
await self._async_loop_main(resp_que=resp_que, has_runable_event=has_runable_event)
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 762, in _async_loop_main
out = await self.executor.get_output_async()
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 339, in get_output_async
return await self.remote_outs.get()
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/asyncio/queues.py", line 159, in get
await getter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 776, in __task_callback
task.result()
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 829, in async_loop
self._loop_finally()
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 793, in _loop_finally
self.executor.release()
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 308, in release
self.collective_rpc('exit')
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 243, in collective_rpc
return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout)
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 243, in
return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout)
File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/ray/actor.py", line 1549, in getattr
raise AttributeError(
AttributeError: 'ActorHandle' object has no attribute 'exit'
`

@ccccwb
Copy link

ccccwb commented Apr 22, 2025

I encountered a similar problem

lmdeploy - ERROR - engine.py:950 - Task failed
Traceback (most recent call last):
File "llm_CoT/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 945, in __task_callback
task.result()
File "llm_CoT/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 998, in async_loop
await self._async_loop_main(resp_que=resp_que,
File "llm_CoT/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 922, in _async_loop_main
forward_inputs, next_running = await inputs_maker.send_next_inputs()

@hufangjian
Copy link

hufangjian commented Apr 28, 2025

Get the same error
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 776, in __task_callback
task.result()
File "/usr/local/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 829, in async_loop
self._loop_finally()
File "/usr/local/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 793, in _loop_finally
self.executor.release()
File "/usr/local/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 308, in release
self.collective_rpc('exit')
File "/usr/local/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 243, in collective_rpc
return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout)
File "/usr/local/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 243, in
return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout)
File "/usr/local/lib/python3.10/site-packages/ray/actor.py", line 1549, in getattr
raise AttributeError(
AttributeError: 'ActorHandle' object has no attribute 'exit'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants