Skip to content

build fails with unknown type name errors #29801

@sebastian-de

Description

@sebastian-de

Description

I'm trying to build JAX v0.6.2 with ROCm 6.4.1

build command:

python build/build.py build --wheels=jaxlib,rocm-plugin,rocm-pjrt \
  --clang_path="${ROCM_PATH}/lib/llvm/bin/clang" \
  --target_cpu_features=release \
  --rocm_path="${ROCM_PATH}"

this results in the following error:

[...]
INFO: Analyzed target //jaxlib/tools:build_wheel (296 packages loaded, 40933 targets configured).
ERROR: /home/user/.cache/bazel/_bazel_user/8eb645d2a1da7caf1a5eb11d9777bbb1/external/gloo/BUILD.bazel:40:11: Compiling gloo/types.cc failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing CppCompile command (from target @@gloo//:gloo) 
  (cd /home/user/.cache/bazel/_bazel_user/8eb645d2a1da7caf1a5eb11d9777bbb1/execroot/__main__ && \
  exec env - \
    CLANG_COMPILER_PATH=/opt/rocm/lib/llvm/bin/clang \
    PATH=/home/user/.local/bin:/home/user/bin:/home/user/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/home/user/.local/share/flatpak/exports/bin:/var/lib/flatpak/exports/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/opt/rocm/bin:/usr/lib/rustup/bin \
    PWD=/proc/self/cwd \
    ROCM_PATH=/opt/rocm \
    TF_HIPCC_CLANG=1 \
    TF_ROCM_AMDGPU_TARGETS=gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1030,gfx1100,gfx1101,gfx1200,gfx1201 \
    TF_ROCM_CLANG=1 \
  external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++17' -MD -MF bazel-out/k8-opt/bin/external/gloo/_objs/gloo/types.pic.d '-frandom-seed=bazel-out/k8-opt/bin/external/gloo/_objs/gloo/types.pic.o' -fPIC -iquote external/gloo -iquote bazel-out/k8-opt/bin/external/gloo -isystem external/gloo -isystem bazel-out/k8-opt/bin/external/gloo '-fvisibility=hidden' -Wno-sign-compare -Wno-unknown-warning-option -Wno-stringop-truncation -Wno-array-parameter '-DMLIR_PYTHON_PACKAGE_PREFIX=jaxlib.mlir.' '-DNB_DOMAIN=jax' -Wno-gnu-offsetof-extensions -Qunused-arguments '-Werror=mismatched-tags' '-Wno-error=c23-extensions' -mavx -Wno-gnu-offsetof-extensions -Qunused-arguments -Wl,--enable-new-dtags '--rocm-path=/opt/rocm' -frtlib-add-rpath -Wno-gnu-offsetof-extensions -Qunused-arguments '-Werror=mismatched-tags' '-Wno-error=c23-extensions' -mavx -Wno-gnu-offsetof-extensions -Qunused-arguments -Wl,--enable-new-dtags '--rocm-path=/opt/rocm' -frtlib-add-rpath '-std=c++17' -fexceptions -Wno-unused-variable -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' '-DTENSORFLOW_USE_ROCM=1' -D__HIP_PLATFORM_AMD__ -DEIGEN_USE_HIP -DUSE_ROCM -no-canonical-prefixes -c external/gloo/gloo/types.cc -o bazel-out/k8-opt/bin/external/gloo/_objs/gloo/types.pic.o)
# Configuration: 33fc69229a7839c643a01674a3dee801a1c20e2d9ae5ad4742563610c2e95572
# Execution platform: @@local_execution_config_platform//:platform
In file included from external/gloo/gloo/types.cc:9:
external/gloo/gloo/types.h:66:11: error: unknown type name 'uint8_t'
   66 | constexpr uint8_t kGatherSlotPrefix = 0x01;
      |           ^
external/gloo/gloo/types.h:67:11: error: unknown type name 'uint8_t'
   67 | constexpr uint8_t kAllgatherSlotPrefix = 0x02;
      |           ^
external/gloo/gloo/types.h:68:11: error: unknown type name 'uint8_t'
   68 | constexpr uint8_t kReduceSlotPrefix = 0x03;
      |           ^
external/gloo/gloo/types.h:69:11: error: unknown type name 'uint8_t'
   69 | constexpr uint8_t kAllreduceSlotPrefix = 0x04;
      |           ^
external/gloo/gloo/types.h:70:11: error: unknown type name 'uint8_t'
   70 | constexpr uint8_t kScatterSlotPrefix = 0x05;
      |           ^
external/gloo/gloo/types.h:71:11: error: unknown type name 'uint8_t'
   71 | constexpr uint8_t kBroadcastSlotPrefix = 0x06;
      |           ^
external/gloo/gloo/types.h:72:11: error: unknown type name 'uint8_t'
   72 | constexpr uint8_t kBarrierSlotPrefix = 0x07;
      |           ^
external/gloo/gloo/types.h:73:11: error: unknown type name 'uint8_t'
   73 | constexpr uint8_t kAlltoallSlotPrefix = 0x08;
      |           ^
external/gloo/gloo/types.h:77:21: error: unknown type name 'uint8_t'
   77 |   static Slot build(uint8_t prefix, uint32_t tag);
      |                     ^
external/gloo/gloo/types.h:77:37: error: unknown type name 'uint32_t'
   77 |   static Slot build(uint8_t prefix, uint32_t tag);
      |                                     ^
external/gloo/gloo/types.h:79:12: error: unknown type name 'uint64_t'
   79 |   operator uint64_t() const {
      |            ^
external/gloo/gloo/types.h:86:17: error: unknown type name 'uint64_t'
   86 |   explicit Slot(uint64_t base, uint64_t delta) : base_(base), delta_(delta) {}
      |                 ^
external/gloo/gloo/types.h:86:32: error: unknown type name 'uint64_t'
   86 |   explicit Slot(uint64_t base, uint64_t delta) : base_(base), delta_(delta) {}
      |                                ^
external/gloo/gloo/types.h:88:9: error: unknown type name 'uint64_t'
   88 |   const uint64_t base_;
      |         ^
external/gloo/gloo/types.h:89:9: error: unknown type name 'uint64_t'
   89 |   const uint64_t delta_;
      |         ^
external/gloo/gloo/types.h:97:3: error: unknown type name 'uint16_t'
   97 |   uint16_t x;
      |   ^
external/gloo/gloo/types.cc:16:18: error: unknown type name 'uint8_t'
   16 | Slot Slot::build(uint8_t prefix, uint32_t tag) {
      |                  ^
external/gloo/gloo/types.cc:16:34: error: unknown type name 'uint32_t'
   16 | Slot Slot::build(uint8_t prefix, uint32_t tag) {
      |                                  ^
external/gloo/gloo/types.cc:17:3: error: unknown type name 'uint64_t'
   17 |   uint64_t u64prefix = ((uint64_t)prefix) << 56;
      |   ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
Target //jaxlib/tools:build_wheel failed to build
INFO: Elapsed time: 23.931s, Critical Path: 10.31s
INFO: 163 processes: 78 internal, 85 local.
ERROR: Build did NOT complete successfully
ERROR: Build failed. Not running target
Traceback (most recent call last):
  File "/home/user/src/PKBUILDS/python-jax-rocm-aur/src/jax-rocm-jax-v0.6.0/build/build.py", line 778, in <module>
    asyncio.run(main())
    ~~~~~~~~~~~^^^^^^^^
  File "/usr/lib/python3.13/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ~~~~~~~~~~^^^^^^
  File "/usr/lib/python3.13/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/usr/lib/python3.13/asyncio/base_events.py", line 725, in run_until_complete
    return future.result()
           ~~~~~~~~~~~~~^^
  File "/home/user/src/PKBUILDS/python-jax-rocm-aur/src/jax-rocm-jax-v0.6.0/build/build.py", line 723, in main
    raise RuntimeError(f"Command failed with return code {result.return_code}")
RuntimeError: Command failed with return code 1
==> ERROR: A failure occurred in build().
    Aborting...

The issue seems to be a missing import in gloo, I opened a PR upstream: pytorch/gloo#452

System info (python version, jaxlib version, accelerator, etc.)

ERROR:2025-06-27 10:30:10,358:jax._src.xla_bridge:647: Jax plugin configuration error: Exception when calling jax_plugins.xla_rocm60.initialize()
Traceback (most recent call last):
  File "/home/sepp/src/PKBUILDS/python-jax-rocm-aur/venv-jax/lib/python3.12/site-packages/jax/_src/xla_bridge.py", line 645, in discover_pjrt_plugins
    plugin_module.initialize()
  File "/home/sepp/src/PKBUILDS/python-jax-rocm-aur/venv-jax/lib/python3.12/site-packages/jax_plugins/xla_rocm60/__init__.py", line 137, in initialize
    c_api = xb.register_plugin(
            ^^^^^^^^^^^^^^^^^^^
  File "/home/sepp/src/PKBUILDS/python-jax-rocm-aur/venv-jax/lib/python3.12/site-packages/jax/_src/xla_bridge.py", line 744, in register_plugin
    c_api = xla_client.load_pjrt_plugin_dynamically(plugin_name, library_path)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sepp/src/PKBUILDS/python-jax-rocm-aur/venv-jax/lib/python3.12/site-packages/jaxlib/xla_client.py", line 165, in load_pjrt_plugin_dynamically
    return _xla.load_pjrt_plugin(plugin_name, library_path, c_api=None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to open /home/sepp/src/PKBUILDS/python-jax-rocm-aur/venv-jax/lib/python3.12/site-packages/jax_plugins/xla_rocm60/xla_rocm_plugin.so: libamd_comgr.so.2: cannot open shared object file: No such file or directory
jax:    0.5.0
jaxlib: 0.5.0
numpy:  2.3.1
python: 3.12.11 (main, Jun 24 2025, 14:35:57) [GCC 15.1.1 20250425]
device info: cpu-1, 1 local devices"
process_count: 1
platform: uname_result(system='Linux', node='dosa', release='6.15.3-arch1-1', version='#1 SMP PREEMPT_DYNAMIC Thu, 19 Jun 2025 14:41:19 +0000', machine='x86_64')

This points at another issue with the JAX packages from pypi (installed via pip install jax[rocm]): ROCm 6.4.1 ships with libamd_comgr.so.3

Metadata

Metadata

Assignees

Labels

AMD GPUIssues pertaining to AMD GPUs (ROCM)bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions