regarding performance between mllm v1 and v2, with Snapdragon npu backend

rt. Is there performance (token rate, cpu, mem etc.) comparison  between mllm v1 and v2 (npu) on llm, especially token rate, ignore great mem usage in v1.
I'd like to know whether cpu-npu heterogeneous can achieve better performance than aot (almost only npu).
thanks a lot.