PackKV

This is the official repository for the paper "PackKV: Reducing KV Cache Memory Footprint through LLM-Aware Lossy Compression" (IPDPS 2026). [Paper]

PackKV is a high-performance framework designed to reduce the memory footprint of KV cache for Large Language Models Inference with Lossy Compression. By utilizing custom CUDA kernels and sophisticated lossy compression techniques, PackKV aims to reduce memory usage and improve inference throughput.

Installation

Prerequisites

Linux
NVIDIA GPU with CUDA(13.0 for RTX Pro 6000 Blackwell Workstation Edition, 12.8 for 4XA100) support
Anaconda or Miniconda or Miniforge

Step 1: Set up the Environment

Create a new Conda environment using the provided configuration file:

For RTX Pro 6000 machine:

conda env create -f environment.yml
conda activate packkv_pub
pip install torch==2.10.0 --index-url https://download.pytorch.org/whl/cu130 # to support RTX Pro 6000
pip install -r requirements.txt
pip install flash-attn==2.8.1 --no-build-isolation # this may take quit a while to compile flash-attn

For 4XA100 machine:

conda env create -f environment.yml
conda activate packkv_pub
pip install torch==2.9.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt
pip install "https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.1/flash_attn-2.8.1+cu12torch2.9cxx11abiTRUE-cp312-cp312-linux_x86_64.whl"

Step 2: Install CUDA Extensions

Compile and install the custom CUDA kernels required for PackKV:

cd packkv_cuda_ext
pip install -e . --no-build-isolation
cd ..

if you failed to compile this extension, you can try to modify the setup.py file:

'nvcc': [
    '-O3',
    # '-gencode=arch=compute_70,code=compute_70',
    # '-gencode=arch=compute_80,code=sm_80', # this works for A100
    # '-gencode=arch=compute_89,code=sm_89',
    '-gencode=arch=compute_120,code=sm_120', # this works for RTX Pro 6000
]

Step 3: Test Run

cd scripts
CUDA_VISIBLE_DEVICES=0 PYTHONPATH=.. python ./rebuttal_throughout.py
cd ..

Project Structure

packkv_cuda_ext/: C++ and CUDA source code for the packkv custom extension.
models/: implementations of LLMs used in PackKV Experiments.
scripts/: Scripts for automation, benchmarking, and generating experimental results.
evaluation/: Code related to model evaluation.
utils/: Helper functions and utilities.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PackKV

Installation

Prerequisites

Step 1: Set up the Environment

Step 2: Install CUDA Extensions

Step 3: Test Run

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
evaluation		evaluation
models		models
packkv_cuda_ext		packkv_cuda_ext
scripts		scripts
utils		utils
LICENSE		LICENSE
README.md		README.md
ae.md		ae.md
ae_run_all_for_a100.sh		ae_run_all_for_a100.sh
ae_run_all_for_rtx_pro_6000.sh		ae_run_all_for_rtx_pro_6000.sh
environment.yml		environment.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PackKV

Installation

Prerequisites

Step 1: Set up the Environment

Step 2: Install CUDA Extensions

Step 3: Test Run

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages