MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Implement this [paper](https://arxiv.org/abs/2407.02490).

Similar to `class KVCacheFastGen` in that it involves a profiling step.