1.集成flash attention CPU CUDA [flash attention inference](https://github.com/Bruce-Lee-LY/flash_attention_inference) Metal 2.添加单元测试
1.集成flash attention
CPU
CUDA flash attention inference
Metal
2.添加单元测试