-
Notifications
You must be signed in to change notification settings - Fork 99
Closed
Description
While previously I have run bladebit CUDA with my Tesla P4, after noticing a few other people reporting issues with the card I tried again and was able to consistently recreate the crash. For this first failure I was using the Ubuntu binary from https://github.com/Chia-Network/bladebit/actions/runs/4129720923/jobs/7135639600#step:3:5.
https://gist.github.com/altendky/3ad52845cbb71c106dbe276f3d95bba1
Completed table 1 in 29.27 seconds with 3429027681 / 4294803672 entries ( 79.84% ).
Compressing tables 2 and 3...
Step 1 completed step in 4.59 seconds.
CUDA error: 700 (0x2bc) cudaErrorIllegalAddress : an illegal memory access was encountered
*** Panic!!! *** Fatal Error:
CUDA error cudaErrorIllegalAddress : an illegal memory access was encountered.
./bladebit_cuda(+0xcf8cb)[0x564cf43288cb]
./bladebit_cuda(+0xcf0af)[0x564cf43280af]
./bladebit_cuda(+0x5217a)[0x564cf42ab17a]
./bladebit_cuda(+0x52443)[0x564cf42ab443]
./bladebit_cuda(+0x36e6d)[0x564cf428fe6d]
./bladebit_cuda(+0x2e7f0)[0x564cf42877f0]
./bladebit_cuda(+0x1c98b)[0x564cf427598b]
./bladebit_cuda(+0x18245)[0x564cf4271245]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f30b9f79083]
./bladebit_cuda(+0x1974e)[0x564cf427274e]
After Harold requested debug info, I made #271 to get debug builds with the following results from https://github.com/Chia-Network/bladebit/actions/runs/4149269955
https://gist.github.com/altendky/25ef339f5cfd28345dd641bdd9a1e4bb
Completed table 1 in 505.43 seconds with 3429368445 / 4294952657 entries ( 79.85% ).
Compressing tables 2 and 3...
Step 1 completed step in 40.28 seconds.
Assertion Failed @ /home/runner/work/bladebit/bladebit/cuda/GpuStreams.cpp:571 UploadArray().
fish: “./bladebit_cuda -f b0a374845f4f…” terminated by signal SIGTRAP (Trace or breakpoint trap)
Line 571 in 62af659
| ASSERT( self->outgoingSequence - self->lockSequence < 2 ); |
void GpuUploadBuffer::UploadArray( const void* hostBuffer, uint32 length, uint32 elementSize, uint32 srcStride,
uint32 countStride, const uint32* counts, cudaStream_t workStream )
{
ASSERT( hostBuffer );
ASSERT( self->outgoingSequence - self->lockSequence < 2 );
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels