fix a bug when calculating `neuron_cap` before invoking the solver by KiritoHugh · Pull Request #231 · Tiiny-AI/PowerInfer

KiritoHugh · 2024-12-10T08:33:43Z

For example,
ReluLLaMA-7B; NVIDIA GeForce RTX 2080 Ti 11264MiB; ffn_up,ffn_gate,ffn_down_t all are[4096,11008];
A neuron should be [4096,1] not [1,11008].

when env CUDA_VISIBLE_DEVICES=0 ./build/bin/main -m ./ReluLLaMA-7B/llama-7b-relu.powerinfer.gguf -n 128 -t 8 -p "Once upon a time" :

before revising:
slice_size=22016
vram_bytes_per_slice=99072
vram_allocatable_bytes=4212178944
neuron_cap=170064
after revising:
slice_size=8192
vram_bytes_per_slice=24576
vram_allocatable_bytes=4212178944
neuron_cap=171394

For example, in ReluLLaMA-7B, NVIDIA GeForce RTX 2080 Ti 11264MiB; ffn_up,ffn_gate,ffn_down all are [4096,11008]; `env CUDA_VISIBLE_DEVICES=0 ./build/bin/main -m ./ReluLLaMA-7B/llama-7b-relu.powerinfer.gguf -n 128 -t 8 -p "Once upon a time"` - before revising: `slice_size=22016` `vram_bytes_per_slice=99072` `vram_allocatable_bytes=4212178944` `neuron_cap=170064` - after revising: `slice_size=8192` `vram_bytes_per_slice=24576` `vram_allocatable_bytes=4212178944` `neuron_cap=171394`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix a bug when calculating `neuron_cap` before invoking the solver#231

fix a bug when calculating `neuron_cap` before invoking the solver#231
KiritoHugh wants to merge 1 commit intoTiiny-AI:mainfrom
KiritoHugh:main

KiritoHugh commented Dec 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KiritoHugh commented Dec 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant