-
Notifications
You must be signed in to change notification settings - Fork 584
Open
Labels
Description
Backend
VL (Velox)
Bug description
When I run, for example, nexmark q4 with state backend type rocksdb, the job fails. It happens because for some reason task manager drops. Logs of taskmanager just interrupted, which indicates unexpected crash. However, there is some info in stdout:
WARNING: Unknown module: org.apache.arrow.memory.core specified to --add-opens
Created work directory /tmp/velox4j-307006935592892106.
Found required libraries in container velox4j-lib/Linux/amd64.
Copying library velox4j-lib/Linux/amd64/libevent-2.0.so.5 to /tmp/velox4j-307006935592892106/lib/libevent-2.0.so.5...
Copying library velox4j-lib/Linux/amd64/libvelox.so to /tmp/velox4j-307006935592892106/lib/libvelox.so...
Copying library velox4j-lib/Linux/amd64/libcrypto.so.1.1 to /tmp/velox4j-307006935592892106/lib/libcrypto.so.1.1...
Copying library velox4j-lib/Linux/amd64/libfmt.so.9 to /tmp/velox4j-307006935592892106/lib/libfmt.so.9...
Copying library velox4j-lib/Linux/amd64/libaio.so.1 to /tmp/velox4j-307006935592892106/lib/libaio.so.1...
Copying library velox4j-lib/Linux/amd64/libzstd.so.1 to /tmp/velox4j-307006935592892106/lib/libzstd.so.1...
Copying library velox4j-lib/Linux/amd64/libgflags.so.2.2 to /tmp/velox4j-307006935592892106/lib/libgflags.so.2.2...
Copying library velox4j-lib/Linux/amd64/libfolly.so.0.58.0-dev to /tmp/velox4j-307006935592892106/lib/libfolly.so.0.58.0-dev...
Copying library velox4j-lib/Linux/amd64/libssl.so.1.1 to /tmp/velox4j-307006935592892106/lib/libssl.so.1.1...
Copying library velox4j-lib/Linux/amd64/libicuuc.so.66 to /tmp/velox4j-307006935592892106/lib/libicuuc.so.66...
Copying library velox4j-lib/Linux/amd64/libsodium.so.23 to /tmp/velox4j-307006935592892106/lib/libsodium.so.23...
Copying library velox4j-lib/Linux/amd64/libsasl2.so.2 to /tmp/velox4j-307006935592892106/lib/libsasl2.so.2...
Copying library velox4j-lib/Linux/amd64/liblzma.so.5 to /tmp/velox4j-307006935592892106/lib/liblzma.so.5...
Copying library velox4j-lib/Linux/amd64/libboost_program_options.so.1.81.0 to /tmp/velox4j-307006935592892106/lib/libboost_program_options.so.1.81.0...
Copying library velox4j-lib/Linux/amd64/libboost_filesystem.so.1.81.0 to /tmp/velox4j-307006935592892106/lib/libboost_filesystem.so.1.81.0...
Copying library velox4j-lib/Linux/amd64/libdouble-conversion.so.3 to /tmp/velox4j-307006935592892106/lib/libdouble-conversion.so.3...
Copying library velox4j-lib/Linux/amd64/libboost_context.so.1.81.0 to /tmp/velox4j-307006935592892106/lib/libboost_context.so.1.81.0...
Copying library velox4j-lib/Linux/amd64/librdkafka.so.1 to /tmp/velox4j-307006935592892106/lib/librdkafka.so.1...
Copying library velox4j-lib/Linux/amd64/libstdc++.so.6 to /tmp/velox4j-307006935592892106/lib/libstdc++.so.6...
Copying library velox4j-lib/Linux/amd64/libunwind.so.8 to /tmp/velox4j-307006935592892106/lib/libunwind.so.8...
Copying library velox4j-lib/Linux/amd64/libz.so.1 to /tmp/velox4j-307006935592892106/lib/libz.so.1...
Copying library velox4j-lib/Linux/amd64/liblz4.so.1 to /tmp/velox4j-307006935592892106/lib/liblz4.so.1...
Copying library velox4j-lib/Linux/amd64/libicudata.so.66 to /tmp/velox4j-307006935592892106/lib/libicudata.so.66...
Copying library velox4j-lib/Linux/amd64/libglog.so.0 to /tmp/velox4j-307006935592892106/lib/libglog.so.0...
Copying library velox4j-lib/Linux/amd64/libvelox4j.so to /tmp/velox4j-307006935592892106/lib/libvelox4j.so...
Copying library velox4j-lib/Linux/amd64/libbz2.so.1.0 to /tmp/velox4j-307006935592892106/lib/libbz2.so.1.0...
Copying library velox4j-lib/Linux/amd64/libsnappy.so.1 to /tmp/velox4j-307006935592892106/lib/libsnappy.so.1...
Copying library velox4j-lib/Linux/amd64/libcppkafka.so.0.4.1 to /tmp/velox4j-307006935592892106/lib/libcppkafka.so.0.4.1...
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0319 14:33:22.678424 2944230 JniLoader.cc:29] Initializing Velox4J...
I0319 14:33:22.679422 2944230 JniLoader.cc:43] Velox4J initialized.
All required libraries were successfully loaded.
I0319 14:33:22.850020 2944230 HiveConnector.cpp:56] Hive connector connector-hive created with maximum of 20000 cached file handles with expiration of 0ms.
E0319 14:33:41.576027 2945031 MemoryManager.cc:264] [Velox4J MemoryManager DTOR] Memory leak found on Velox memory pool: Memory Pool[Decoding Memory Pool LEAF root[root] parent[root] MALLOC track-usage thread-safe]<unlimited max capacity capacity 128.00MB used 512B available 1023.50KB reservation [used 512B, reserved 1.00MB, min 0B] counters [allocs 4, frees 0, reserves 0, releases 0, collisions 0])>. Please make sure your code released all opened resources already.
E0319 14:33:41.576107 2945031 MemoryManager.cc:264] [Velox4J MemoryManager DTOR] Memory leak found on Velox memory pool: Memory Pool[root AGGREGATE root[root] parent[null] MALLOC track-usage thread-safe]<unlimited max capacity capacity 128.00MB used 512B available 0B reservation [used 0B, reserved 1.00MB, min 0B] counters [allocs 0, frees 0, reserves 0, releases 0, collisions 0])>. Please make sure your code released all opened resources already.
E0319 14:33:41.576125 2945031 MemoryPool.cpp:461] [MEM] Memory leak (Used memory): Memory Pool[Decoding Memory Pool LEAF root[root] parent[root] MALLOC track-usage thread-safe]<unlimited max capacity capacity 128.00MB used 512B available 1023.50KB reservation [used 512B, reserved 1.00MB, min 0B] counters [allocs 4, frees 0, reserves 0, releases 0, collisions 0])>
E0319 14:33:41.576150 2945031 Exceptions.h:66] Line: ~/velox4j/src/main/cpp/main/velox4j/memory/MemoryManager.cc:92, Function:removePool, Expression: pool->reservedBytes() == 0 (1048576 vs. 0), Source: RUNTIME, ErrorCode: INVALID_STATE
W0319 14:33:41.576576 2945031 ExceptionTracer.cpp:187] Invalid trace stack for exception of type: facebook::velox::VeloxRuntimeError
terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
what(): Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: (1048576 vs. 0)
Retriable: False
Expression: pool->reservedBytes() == 0
Function: removePool
File: ~/velox4j/src/main/cpp/main/velox4j/memory/MemoryManager.cc
Line: 92
Stack trace:
# 0 _ZN8facebook5velox7process10StackTraceC1Ei
# 1 _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2 _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
# 3 _ZN7velox4j12_GLOBAL__N_120ListenableArbitrator10removePoolEPN8facebook5velox6memory10MemoryPoolE
# 4 _ZN8facebook5velox6memory13MemoryManager8dropPoolEPNS1_10MemoryPoolE
# 5 _ZN8facebook5velox6memory14MemoryPoolImplD2Ev
# 6 _ZN7velox4j13MemoryManager11tryDestructEv
# 7 _ZN7velox4j13MemoryManagerD2Ev
# 8 _ZNSt10_HashtableIjSt4pairIKjSt10shared_ptrIvEESaIS4_ENSt8__detail10_Select1stESt8equal_toIjESt4hashIjENS6_18_Mod_range_hashingENS6_20_Default_ranged_hashENS6_20_Prime_rehash_policyENS6_17_Hashtable_traitsILb0ELb0ELb1EEEE8_M_eraseEmPNS6_15_Hash_node_baseEPNS6_10_Hash_nodeIS4_Lb0EEE
# 9 _ZN7velox4j11ResourceMapISt10shared_ptrIvEE5eraseEj
# 10 _ZN7velox4j11ObjectStore15releaseInternalEj
# 11 _ZN7velox4j12_GLOBAL__N_116releaseCppObjectEP7JNIEnv_P8_jobjectl
# 12 0x00007f92d87cea0f
# 13 0x00007f92d87c9272
# 14 0x00007f92d87c9272
# 15 0x00007f92d87c9272
# 16 0x00007f92d87c92b7
# 17 0x00007f92d87c9272
# 18 0x00007f92d87c92b7
# 19 0x00007f92d87c9272
# 20 0x00007f92d87c92b7
# 21 0x00007f92d87c9272
# 22 0x00007f92d87c92b7
# 23 0x00007f92d87c9272
# 24 0x00007f92d87c9272
# 25 0x00007f92d87c9272
# 26 0x00007f92d87c92b7
# 27 0x00007f92d87c9272
# 28 0x00007f92d87c92b7
# 29 0x00007f92d87c9272
# 30 0x00007f92d87c9272
# 31 0x00007f92d87c9272
# 32 0x00007f92d87c92b7
# 33 0x00007f92d87bfcc8
# 34 _ZN9JavaCalls11call_helperEP9JavaValueRK12methodHandleP17JavaCallArgumentsP6Thread
# 35 _ZN9JavaCalls12call_virtualEP9JavaValue6HandleP5KlassP6SymbolS6_P6Thread
# 36 _ZL12thread_entryP10JavaThreadP6Thread
# 37 _ZN10JavaThread17thread_main_innerEv
# 38 _ZN6Thread8call_runEv
# 39 _ZL19thread_native_entryP6Thread
# 40 start_thread
# 41 cloneGluten version
No response
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
Reactions are currently unavailable