-
Notifications
You must be signed in to change notification settings - Fork 56
Open
Description
Hi,
Wonderful work u've done.
Some Q about the speed up for the structure pruning.
Actually, I pruned a llama2-7b into 0.3 sparsity, and test the forward time. Strangely, it cost more time to complete the forward time.
And then I test the attention forward and the feed forward then, the time consuming is below
first is the dense model, later is the 0.3 sparsity model, it takes more time to ffn,
don't u have this problem?
then I test on the next toy
so strange about my server!
Metadata
Metadata
Assignees
Labels
No labels

