Skip to content

ImportError: undefined symbol: _ZN3c104cuda9SetDeviceEa on environments with CUDA 12.x #4

@KrisLuo-cyber

Description

@KrisLuo-cyber

1. Description
I am encountering a critical ImportError when trying to run the deepconf-baseline.py and deepconf-online.py script after setting up the environment and compiling the project according to the official instructions.

The error is an undefined symbol: _ZN3c104cuda9SetDeviceEa originating from the vllm._C.abi3.so shared library. This symbol demangles to c10::cuda::SetDevice, which is a core function within the PyTorch CUDA library.

2. Environment
Configuration 1:
GPU: NVIDIA GeForce RTX 4090
CUDA Toolkit Version (Driver): 12.4

Configuration 2:
GPU: NVIDIA RTX 5090
CUDA Toolkit Version (Driver): 12.8

3. Details

(deepconf-online) cicc@cicc-Test-Platform:/home/projects/deepconf/deepconf$ python deepconf-baseline.py --qid 0 --rid 0
INFO 09-08 17:04:55 [__init__.py:241] Automatically detected platform cuda.
Traceback (most recent call last):
  File "/home/projects/deepconf/deepconf/deepconf-baseline.py", line 7, in <module>
    from vllm import LLM, SamplingParams
  File "/home/projects/deepconf/vllm/vllm/__init__.py", line 64, in __getattr__
    module = import_module(module_name, __package__)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cicc/anaconda3/envs/deepconf-online/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projects/deepconf/vllm/vllm/entrypoints/llm.py", line 20, in <module>
    from vllm.config import (CompilationConfig, ModelDType, TokenizerMode,
  File "/home/projects/deepconf/vllm/vllm/config/__init__.py", line 36, in <module>
    from vllm.config.parallel import DistributedExecutorBackend, ParallelConfig
  File "/home/projects/deepconf/vllm/vllm/config/parallel.py", line 17, in <module>
    from vllm.platforms import current_platform
  File "/home/projects/deepconf/vllm/vllm/platforms/__init__.py", line 273, in __getattr__
    _current_platform = resolve_obj_by_qualname(
                        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projects/deepconf/vllm/vllm/utils/__init__.py", line 2568, in resolve_obj_by_qualname
    module = importlib.import_module(module_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cicc/anaconda3/envs/deepconf-online/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projects/deepconf/vllm/vllm/platforms/cuda.py", line 18, in <module>
    import vllm._C  # noqa
    ^^^^^^^^^^^^^^
ImportError: /home/projects/deepconf/vllm/vllm/_C.abi3.so: undefined symbol: _ZN3c104cuda9SetDeviceEa

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions