Skip to content

ninja -v指令出错导致transformer_inference.so文件缺失 #12

@Debouter

Description

@Debouter

Hi~
我在运行demo.py时出现了以下Error:

Traceback (most recent call last):
  File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
    subprocess.run(
  File "/mnt/petrelfs/klk/anaconda3/envs/ds/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
    ......
ImportError: /mnt/petrelfs/klk/.cache/torch_extensions/py310_cu118/transformer_inference/transformer_inference.so: cannot open shared object file: No such file or directory

我初步认为这是ninja -v指令执行存在问题,导致共享目标文件transformer_inference.so没有生成。

我已经尝试了网上解决Command '['ninja', '-v']' returned non-zero exit status 1的各种方法,例如安装或禁用ninja库、降低pytorch版本等,但都无法解决这个问题。

我使用的环境如下:

  • python==3.10.12
  • torch/cuda/deepspeed版本均与你的环境一致

请问你是否遇到过这个问题?如果没有的话可否分享一下你的transformer_inference.so文件,该文件大概在路径<user_path>/.cache/torch_extensions/pyXX_cuXX/transformer_inference处。

谢谢!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions