Error reading endianness-swapped model on PPC64 Linux

Hi! Was trying to run this repo on Arch Linux POWER on a G5 and ran into the following issue.
Having converted the model and tokenizer using ullm_eswap, the following error occurs:
```
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7cca038 in strlen () from /usr/lib/libc.so.6
(gdb) where
#0  0x00007ffff7cca038 in strlen () from /usr/lib/libc.so.6
#1  0x00007ffff7c6dc28 in ?? () from /usr/lib/libc.so.6
#2  0x00007ffff7c8a190 in ?? () from /usr/lib/libc.so.6
#3  0x00007ffff7c68530 in sprintf () from /usr/lib/libc.so.6
#4  0x00000001000070e4 in UllmLlama2Encode (config=<optimized out>, state=0x7ffffffff278, bos=1 '\001', eos=0 '\000', tokens=0x1000b50c0, n_tokens=<optimized out>) at ullm/llama2.c:614
#5  UllmLlama2Generate (config=0x7ffffffff388, state=0x7ffffffff278) at ullm/llama2.c:842
```

Investigating a bit further, the cause of this is that the model's vocabulary doesn't read correctly due to this:
```
(gdb) n
460	    t->vocab[i] = (char *)UllmMemoryAlloc(len + 1);
(gdb) p len
$37 = 5
(gdb) c
Continuing.

Breakpoint 5, UllmLlama2BuildTokenizer (config=0x7ffffffff1f8, state=0x7ffffffff238) at ullm/llama2.c:458
458	    ULLM_GOTO_IF_ERROR(cleanup, status, UllmFileRead(&tokenizer_file,
(gdb) p len
$38 = 5
(gdb) n
460	    t->vocab[i] = (char *)UllmMemoryAlloc(len + 1);
(gdb) p len
$39 = 83886080
(gdb) 
```
At https://github.com/aarossig/ullm/blob/main/ullm/llama2.c#L456-L458, the first token length reads correctly as 5, but the following token length reads as 83886080 which is 5 but endian-swapped. Could this be a bug with the endian-swapper that doesn't manifest on Mac OS?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error reading endianness-swapped model on PPC64 Linux #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Error reading endianness-swapped model on PPC64 Linux #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions