rt. Is there performance (token rate, cpu, mem etc.) comparison between mllm v1 and v2 (npu) on llm, especially token rate, ignore great mem usage in v1.
I'd like to know whether cpu-npu heterogeneous can achieve better performance than aot (almost only npu).
thanks a lot.