Hi,
I am trying to reproduce your Llama-3.1-8B results from the paper. I am following the exact steps from the README with your Docker and vLLM engine, and this is what I am seeing for 131072 sequence length:
| |
vt |
avg (cwe, fwe) |
avg (qa_1, qa_2) |
| paper |
70.4 |
36.2 |
58.8 |
| repro |
88.36 |
avg (0.04, 53.4) = 26.72 |
avg (71.4, 42.6) = 57 |
Could you perhaps share a bit more details on how your paper setup differs from the setup outlined in the README here?