Is it possible to use this in llama 2? I'm interested in improving the inference speed so the accuracy loss doesn't matter right now