Skip to content

Initializing and listing devices crashes #1320

@wsmoses

Description

@wsmoses

Describe the bug

ubuntu@ip-172-31-40-172:~/Reactant.jl$ TF_CPP_MIN_VLOG_LEVEL=3 TF_CPP_MIN_LOG_LEVEL=0 TF_CPP_MAX_VLOG_LEVEL=3 julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.9 (2026-02-06)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Libdl
       # Load the Python library globally

julia> Libdl.dlopen("/usr/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.so", Libdl.RTLD_GLOBAL)
       # Initialize the Python interpreter
Ptr{Nothing} @0x000000003f80df30

julia> ccall(:Py_Initialize, Cvoid, ())

julia> println("Python initialized successfully")
Python initialized successfully

julia> using Reactant

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1777926420.394462    9050 ffi_registry.cc:128] Register XLA FFI handler for 'enzymexla_compile_gpu'; platform=CUDA (canonical=cuda), stages=[instantiate, prepare, initialize, execute], metadata={api_version: 0.3, traits: [], state: unknown}, registry=0x7974da8daf18
I0000 00:00:1777926420.394510    9050 ffi_registry.cc:128] Register XLA FFI handler for 'enzymexla_compile_gpu_with_error'; platform=CUDA (canonical=cuda), stages=[instantiate, prepare, initialize, execute], metadata={api_version: 0.3, traits: [], state: unknown}, registry=0x7974da8daf18
I0000 00:00:1777926420.394530    9050 ffi_registry.cc:128] Register XLA FFI handler for 'xla_throw_error'; platform=Host (canonical=host), stages=[execute], metadata={api_version: 0.3, traits: [], state: unknown}, registry=0x7974da8daf18
I0000 00:00:1777926420.394538    9050 ffi_registry.cc:128] Register XLA FFI handler for 'xla_always_throw_error'; platform=Host (canonical=host), stages=[execute], metadata={api_version: 0.3, traits: [], state: unknown}, registry=0x7974da8daf18
I0000 00:00:1777926420.394546    9050 ffi_registry.cc:128] Register XLA FFI handler for 'reactant_julia_callback'; platform=Host (canonical=host), stages=[execute], metadata={api_version: 0.3, traits: [], state: unknown}, registry=0x7974da8daf18

julia> Reactant.devices()
I0000 00:00:1777926420.647839    9050 parse_flags_from_env.cc:214] For env var XLA_FLAGS found arguments:
I0000 00:00:1777926420.647871    9050 parse_flags_from_env.cc:216]   argv[0] = <argv[0]>
I0000 00:00:1777926420.648973    9050 cpu_client.cc:330] PjRtCpuClient created.
2026-05-04 20:27:01.131647: I ./neuron/hlo_validator/hlo_validator_runner.h:220] Registering Verifier.....CollectivesComputeOrderVerifier

2026-05-04 20:27:01.131686: I ./neuron/hlo_validator/hlo_validator_runner.h:222] Registering Verifier.....HloSummarizer

2026-05-04 20:27:01.131694: I ./neuron/hlo_validator/hlo_validator_runner.h:223] Registering Verifier.....FsdpTpPassVerifier

2026-05-04 20:27:01.135261: I ./neuron/hlo_validator/hlo_validator_runner.h:220] Registering Verifier.....CollectivesComputeOrderVerifier

2026-05-04 20:27:01.135284: I ./neuron/hlo_validator/hlo_validator_runner.h:222] Registering Verifier.....HloSummarizer

2026-05-04 20:27:01.135296: I ./neuron/hlo_validator/hlo_validator_runner.h:223] Registering Verifier.....FsdpTpPassVerifier

I0000 00:00:1777926421.135374    9050 pjrt_api.cc:118] GetPjrtApi was found for trainium at /home/ubuntu/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/pjrt_plugin_trainium/libneuronpjrt.so
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1777926421.135585    9050 neuronpjrt.cc:1569] Check failed: hook == nullptr (0x41181080 vs. (null)) code=3: RunNullaryVoidImpl: error condition !(py_module != __null): 
*** Check failure stack trace: ***
    @     0x79745242cbb1  absl::lts_20230802::log_internal::LogMessage::SendToLog()
    @     0x79745242d089  absl::lts_20230802::log_internal::LogMessageFatal::~LogMessageFatal()
    @     0x79744a8eda88  neuron::GetPjrtApi()
    @     0x7974cdbdd0a7  pjrt::LoadPjrtPlugin()
    @     0x7974cb290f34  LoadPjrtPlugin
    @     0x7974cb2912dc  MakeClientUsingPluginAPI
    @     0x79751450d502  (unknown)
    @     0x79751450d85d  (unknown)
    @     0x79751450dab3  (unknown)
    @     0x79751450db93  (unknown)
    @     0x79751450242f  (unknown)
    @     0x797514503de5  (unknown)
    @     0x797514504daf  (unknown)
    @     0x797514504df1  (unknown)
    @     0x7974a756df16  julia_initialize_default_clientsNOT._46697
    @     0x797514501aa1  (unknown)
    @     0x797514501b40  (unknown)
    @     0x797520870335  do_call
    @     0x79752086fdfd  eval_value
    @     0x797520870f68  eval_body
    @     0x797520871b0e  jl_interpret_toplevel_thunk
    @     0x79752088e2ce  jl_toplevel_eval_flex
    @     0x79752088ec1a  jl_toplevel_eval_flex
    @     0x79752088fc86  ijl_toplevel_eval_in
    @     0x7974eaa90ba8  japi1_eval_user_input_10119.2
    @     0x7974ea34f051  julia_repl_backend_loop_10154.2
    @     0x7974ea992908  japi1_YY.start_repl_backendYY.59_10151.1
    @     0x7974eaa3b16f  japi1_start_repl_backend_10734.2
    @     0x7974ea6098f8  julia_YY.run_replYY.76_10235.1
    @     0x7974ea7f46c3  julia_run_repl_10226.1
    @     0x7974ea763c33  jfptr_run_repl_10227.1
    @     0x7974ea586da5  julia_YY.1152_14894.1
    @     0x7974ea9a9c88  jfptr_YY.1152_14895.1
    @     0x79752085f85a  jl_f__call_latest
    @     0x79750c9f5912  julia_run_main_repl_73611.2
    @     0x79750bff7b85  julia__start_73651.2
    @     0x79750ad8ef44  jfptr__start_73652.1
    @     0x7975208c5066  true_main
    @     0x7975208c5aff  jl_repl_entrypoint
    @           0x401089  main

[9050] signal 6 (-6): Aborted
in expression starting at REPL[6]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_ZN4absl12lts_2023080212log_internal10LogMessage21FailWithoutStackTraceEv at /home/ubuntu/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/pjrt_plugin_trainium/libneuronpjrt.so (unknown line)
_ZN4absl12lts_2023080212log_internal10LogMessage3DieEv at /home/ubuntu/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/pjrt_plugin_trainium/libneuronpjrt.so (unknown line)
_ZN4absl12lts_2023080212log_internal10LogMessage9SendToLogEv at /home/ubuntu/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/pjrt_plugin_trainium/libneuronpjrt.so (unknown line)
_ZN4absl12lts_2023080212log_internal15LogMessageFatalD1Ev at /home/ubuntu/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/pjrt_plugin_trainium/libneuronpjrt.so (unknown line)
_ZN6neuron10GetPjrtApiEv.cold at /home/ubuntu/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/pjrt_plugin_trainium/libneuronpjrt.so (unknown line)
_ZN4pjrt14LoadPjrtPluginESt17basic_string_viewIcSt11char_traitsIcEES3_ at /home/ubuntu/.julia/artifacts/a50a9e76a1693cebca768009b78a09562f711ea9/lib/libReactantExtra.so (unknown line)
LoadPjrtPlugin at /home/ubuntu/.julia/artifacts/a50a9e76a1693cebca768009b78a09562f711ea9/lib/libReactantExtra.so (unknown line)
MakeClientUsingPluginAPI at /home/ubuntu/.julia/artifacts/a50a9e76a1693cebca768009b78a09562f711ea9/lib/libReactantExtra.so (unknown line)
MakeClientUsingPluginAPI at /home/ubuntu/Reactant.jl/src/mlir/libMLIR_h.jl:14299
MakeClientUsingPluginAPI at /home/ubuntu/Reactant.jl/src/xla/PJRT/Client.jl:101
#make_pjrt_client#1 at /home/ubuntu/Reactant.jl/src/accelerators/Trainium.jl:40 [inlined]
make_pjrt_client at /home/ubuntu/Reactant.jl/src/accelerators/Trainium.jl:26
unknown function (ip: 0x79751450db92)
#make_client#3 at /home/ubuntu/Reactant.jl/src/accelerators/Registration.jl:71
make_client at /home/ubuntu/Reactant.jl/src/accelerators/Registration.jl:60 [inlined]
#initialize_backends#4 at /home/ubuntu/Reactant.jl/src/accelerators/Registration.jl:107
initialize_backends at /home/ubuntu/Reactant.jl/src/accelerators/Registration.jl:79
unknown function (ip: 0x797514504df0)
initialize_default_clients! at /home/ubuntu/Reactant.jl/src/xla/XLA.jl:284
getproperty at /home/ubuntu/Reactant.jl/src/xla/XLA.jl:96 [inlined]
default_backend at /home/ubuntu/Reactant.jl/src/xla/XLA.jl:164 [inlined]
devices at /home/ubuntu/Reactant.jl/src/Devices.jl:9
unknown function (ip: 0x797514501b3f)
jl_apply at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
do_call at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_value at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/interpreter.c:666
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/interpreter.c:824
jl_toplevel_eval_flex at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_toplevel_eval_flex at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:430 [inlined]
eval_user_input at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:261
repl_backend_loop at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:368
#start_repl_backend#59 at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:343
start_repl_backend at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:340
#run_repl#76 at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:500
run_repl at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:486
jfptr_run_repl_10227.1 at /home/ubuntu/.julia/juliaup/julia-1.11.9+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_CBMnm.so (unknown line)
#1152 at ./client.jl:439
jfptr_YY.1152_14895.1 at /home/ubuntu/.julia/juliaup/julia-1.11.9+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_CBMnm.so (unknown line)
jl_apply at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
run_main_repl at ./client.jl:423
repl_main at ./client.jl:560 [inlined]
_start at ./client.jl:534
jfptr__start_73652.1 at /home/ubuntu/.julia/juliaup/julia-1.11.9+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
true_main at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/cli/loader_exe.c:58
unknown function (ip: 0x797521429d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 2223755 (Pool: 2223586; Big: 169); GC: 4
Aborted (core dumped)

cc @ptiede

Model Name

N/A

Describe the workload type

setup

Instance Type

trn1.2xlarge

Release version

ubuntu@ip-172-31-40-172:~/Reactant.jl$ apt list --installed | grep -i -e neuron
pip list | grep -i -e neuron -e torch -e transformers -e jax

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

aws-neuronx-collectives/unknown,now 2.31.24.0-1a31ba186 amd64 [installed]
aws-neuronx-dkms/unknown,now 2.27.4.0 all [installed]
aws-neuronx-runtime-lib/unknown,now 2.31.24.0-0b044f4ce amd64 [installed]
aws-neuronx-tools/unknown,now 2.29.22.0-b486b0ade amd64 [installed]
Command 'pip' not found, but can be installed with:
sudo apt install python3-pip

Reproduction Steps

Download julia 1.11, git clone EnzymeAD/Reactant.jl#2852, follow the step above

Regression Issue

  • Select this option if this issue appears to be a regression.

Possible Solution

No response

Logs/Context/Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions