GitHub - ZJU-DIVER/CoKV

Repository Structure

├── adaptive_kv/                    # Core adaptive KV implementation
│   ├── assets/                     # Pre-computed scores and datasets
│   │   ├── datasets/              # Dataset files
│   │   └── head_scores/           # Pre-computed attention head scores
│   └── monkeypatch/               # Model-specific implementations
│       ├── adaptive_qwen3_hijack.py   # Qwen3 model adaptations
│       ├── monkeypatch.py             # Main monkey-patching logic
│       └── utils.py                   # Utility functions
├── experiments/                    # Experimental scripts and evaluations
│   ├── gsm8k/                     # GSM8K math reasoning experiments
│   ├── longbench/                 # LongBench evaluation scripts
│   ├── math/                      # Mathematical reasoning tasks
│   ├── memory_latency/            # Memory and latency benchmarks
│   └── needle/                    # Needle-in-haystack experiments
└── environment.yml                # Conda environment specification

code_llama_mistral #Code of Llama & Mistrial

Installation

Setup Environment

Create conda environment:

conda env create -f environment.yml
conda activate qwen3

Install additional dependencies (if needed):

pip install transformers torch numpy

Model Setup

Update model paths in the configuration files and scripts:

Modify /path/to/models/ paths in experiment scripts to point to your model directory
Ensure you have access to the required models (Qwen3-32B, LLaMA, Mistral variants)

Usage

Running Experiments

LongBench Evaluation

cd experiments/longbench
python qwen3_inference.py \
    --model_name_or_path /path/to/models/Qwen3-32B \
    --max_length 32768 \
    --compress_args_path c128_w32_k7_maxpool.json \
    --out_name qwen3_cokv_results

GSM8K Math Reasoning

cd experiments/gsm8k
python inference.py \
    --model_name_or_path /path/to/models/Qwen3-32B \
    --mode sv \
    --compress_args_path c128_w32_k7_maxpool.json

Memory and Latency Analysis

cd experiments/memory_latency
python memory.py  # For memory usage analysis
python latency.py # For latency benchmarking

Configuration Files

Configuration files in experiments/longbench/config/ control cache behavior:

c64_w32_k7_maxpool.json: 64 cache size configuration
c128_w32_k7_maxpool.json: 128 cache size configuration
c256_w32_k7_maxpool.json: 256 cache size configuration
c512_w32_k7_maxpool.json: 512 cache size configuration
c1024_w32_k7_maxpool.json: 1024 cache size configuration

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
adaptive_kv		adaptive_kv
code_llama_mistrial/CoKV-main		code_llama_mistrial/CoKV-main
experiments		experiments
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
full_paper.pdf		full_paper.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repository Structure

Installation

Setup Environment

Model Setup

Usage

Running Experiments

LongBench Evaluation

GSM8K Math Reasoning

Memory and Latency Analysis

Configuration Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Repository Structure

Installation

Setup Environment

Model Setup

Usage

Running Experiments

LongBench Evaluation

GSM8K Math Reasoning

Memory and Latency Analysis

Configuration Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages