Hi! Was trying to run this repo on Arch Linux POWER on a G5 and ran into the following issue.
Having converted the model and tokenizer using ullm_eswap, the following error occurs:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7cca038 in strlen () from /usr/lib/libc.so.6
(gdb) where
#0 0x00007ffff7cca038 in strlen () from /usr/lib/libc.so.6
#1 0x00007ffff7c6dc28 in ?? () from /usr/lib/libc.so.6
#2 0x00007ffff7c8a190 in ?? () from /usr/lib/libc.so.6
#3 0x00007ffff7c68530 in sprintf () from /usr/lib/libc.so.6
#4 0x00000001000070e4 in UllmLlama2Encode (config=<optimized out>, state=0x7ffffffff278, bos=1 '\001', eos=0 '\000', tokens=0x1000b50c0, n_tokens=<optimized out>) at ullm/llama2.c:614
#5 UllmLlama2Generate (config=0x7ffffffff388, state=0x7ffffffff278) at ullm/llama2.c:842
Investigating a bit further, the cause of this is that the model's vocabulary doesn't read correctly due to this:
(gdb) n
460 t->vocab[i] = (char *)UllmMemoryAlloc(len + 1);
(gdb) p len
$37 = 5
(gdb) c
Continuing.
Breakpoint 5, UllmLlama2BuildTokenizer (config=0x7ffffffff1f8, state=0x7ffffffff238) at ullm/llama2.c:458
458 ULLM_GOTO_IF_ERROR(cleanup, status, UllmFileRead(&tokenizer_file,
(gdb) p len
$38 = 5
(gdb) n
460 t->vocab[i] = (char *)UllmMemoryAlloc(len + 1);
(gdb) p len
$39 = 83886080
(gdb)
At https://github.com/aarossig/ullm/blob/main/ullm/llama2.c#L456-L458, the first token length reads correctly as 5, but the following token length reads as 83886080 which is 5 but endian-swapped. Could this be a bug with the endian-swapper that doesn't manifest on Mac OS?
Hi! Was trying to run this repo on Arch Linux POWER on a G5 and ran into the following issue.
Having converted the model and tokenizer using ullm_eswap, the following error occurs:
Investigating a bit further, the cause of this is that the model's vocabulary doesn't read correctly due to this:
At https://github.com/aarossig/ullm/blob/main/ullm/llama2.c#L456-L458, the first token length reads correctly as 5, but the following token length reads as 83886080 which is 5 but endian-swapped. Could this be a bug with the endian-swapper that doesn't manifest on Mac OS?