支持pagedattention , continuous batch 提供基础支持 , KVcachemanger 用于 KVcache管理，支持调度器调度 by jinTang-coder · Pull Request #24 · zjhellofss/KuiperLLama

jinTang-coder · 2026-02-03T07:42:08Z

支持pagedattention , Scheduler ,KVcachemanger ，schelduler
mainpaged.cpp：单批次分页推理示例

main_continuous.cpp：连续批处理推理示例

main_unified.cpp：统一推理接口示例

更新 demo CMakeLists.txt

新增 Request 请求结构

Model 基类添加分页推理接口

LLaMA3 适配 PagedAttention 和连续批处理

PagedAttention kernel：支持分页式 KV Cache 的注意力计算

GEMM kernel：矩阵乘法优化

更新 kernels_interface 接口|

实现请求调度机制，包含：

Sequence 序列状态管理

SchedulerConfig 调度配置

Scheduler 调度器核心逻辑（Prefill/Decode 阶段调度、抢占驱逐策略）
管理批处理请求的元数据信息，支持 Prefill 和 Decode 阶段

eat: 添加 KVCacheManager KV缓存管理器

实现基于 BlockAllocator 的 KV Cache 管理，支持分页式缓存分配与释放

实现 KV Cache 的物理块分配管理

实现基于 BlockAllocator 的 KV Cache 管理，支持分页式缓存分配与释放

管理批处理请求的元数据信息，支持 Prefill 和 Decode 阶段

实现请求调度机制，包含： - Sequence 序列状态管理 - SchedulerConfig 调度配置 - Scheduler 调度器核心逻辑（Prefill/Decode 阶段调度、抢占驱逐策略）

- PagedAttention kernel：支持分页式 KV Cache 的注意力计算 - GEMM kernel：矩阵乘法优化 - RoPE kernel：支持批处理的旋转位置编码 - 更新 kernels_interface 接口

- test_cu_gemm：GEMM kernel 单元测试 - test_cu_rope_batch：批处理 RoPE kernel 单元测试

- 新增 Request 请求结构 - Model 基类添加分页推理接口 - model_paged.cpp 实现分页推理逻辑 - LLaMA3 适配 PagedAttention 和连续批处理

- base.h：添加相关类型定义 - tensor：支持新的内存布局 - swiglu：算子优化调整

- mainpaged.cpp：单批次分页推理示例 - main_continuous.cpp：连续批处理推理示例 - main_unified.cpp：统一推理接口示例 - 更新 demo CMakeLists.txt

- CMakeLists.txt：添加新模块编译 - .vscode：IDE 配置文件

jinTang-coder added 10 commits February 3, 2026 15:26

feat: 添加 BlockAllocator 块分配器

8a5b040

实现 KV Cache 的物理块分配管理

feat: 添加 KVCacheManager KV缓存管理器

d06cdbc

实现基于 BlockAllocator 的 KV Cache 管理，支持分页式缓存分配与释放

feat: 添加 BatchMetadata 批处理元数据

c6edb23

管理批处理请求的元数据信息，支持 Prefill 和 Decode 阶段

feat: 添加 Scheduler 调度器

17a4b50

实现请求调度机制，包含： - Sequence 序列状态管理 - SchedulerConfig 调度配置 - Scheduler 调度器核心逻辑（Prefill/Decode 阶段调度、抢占驱逐策略）

feat: 添加 PagedAttention 及相关 CUDA 算子

4105547

- PagedAttention kernel：支持分页式 KV Cache 的注意力计算 - GEMM kernel：矩阵乘法优化 - RoPE kernel：支持批处理的旋转位置编码 - 更新 kernels_interface 接口

test: 添加 GEMM 和 RoPE batch 算子测试

0afac65

- test_cu_gemm：GEMM kernel 单元测试 - test_cu_rope_batch：批处理 RoPE kernel 单元测试

feat: Model 和 LLaMA3 支持 PagedAttention

86b6298

- 新增 Request 请求结构 - Model 基类添加分页推理接口 - model_paged.cpp 实现分页推理逻辑 - LLaMA3 适配 PagedAttention 和连续批处理

refactor: 基础模块适配 PagedAttention

89034bd

- base.h：添加相关类型定义 - tensor：支持新的内存布局 - swiglu：算子优化调整

feat: 添加 PagedAttention 推理示例

0e854d8

- mainpaged.cpp：单批次分页推理示例 - main_continuous.cpp：连续批处理推理示例 - main_unified.cpp：统一推理接口示例 - 更新 demo CMakeLists.txt

chore: 更新构建配置和 VSCode 设置

fefa60e

- CMakeLists.txt：添加新模块编译 - .vscode：IDE 配置文件

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

支持pagedattention , continuous batch 提供基础支持 , KVcachemanger 用于 KVcache管理，支持调度器调度#24

支持pagedattention , continuous batch 提供基础支持 , KVcachemanger 用于 KVcache管理，支持调度器调度#24
jinTang-coder wants to merge 10 commits intozjhellofss:mainfrom
jinTang-coder:main

jinTang-coder commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jinTang-coder commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant