Skip to content

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention #29

@griff4692

Description

@griff4692

Implement this paper.

Similar to class KVCacheFastGen in that it involves a profiling step.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions