turboquant

Star

Here are 33 public repositories matching this topic...

arozanov / turboquant-mlx

Star

TurboQuant KV cache compression for MLX with fused Metal kernels. 4.6x compression at 98% FP16 speed.

metal quantization mlx kv-cache apple-silicon llm turboquant

Updated Mar 29, 2026
Python

back2matching / turboquant

Star

First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.

machine-learning compression gpu transformers inference pytorch quantization vram huggingface kv-cache llm turboquant

Updated Mar 27, 2026
Python

Firmamento-Technologies / TurboQuant

Star

Near-optimal vector quantization from Google's ICLR 2026 paper — 95% recall, 5x compression, zero preprocessing, pure Python FAISS replacement

Updated Mar 28, 2026
Python

Minimal, zero-dependency LLM inference in pure C11. CPU-first with NEON/AVX2 SIMD. Flash MoE (pread + LRU expert cache). TurboQuant 3-bit KV compression (8.9x less memory per session). 20+ GGUF quant formats. Compiles to WASM.

c neon wasm inference simd moe avx2 quantization kv-cache cpu-inference llm gguf turboquant

Updated Mar 28, 2026
C

jjang-ai / exploitbot

Star

No bs theatricals. Real automated pentesting. Mac only.

Updated Mar 26, 2026
Python

Lucien2468 / Ollama-TurboQuant-Integration

Star

TurboQuant: Native 3-Bit Quantization for Ollama - Achieve 25-28% better compression than Q4_0 while maintaining high-speed CPU inference. Experimentally integrated into Ollama with custom GGML kernels for LLM efficiency.

llama quantization ggml ollama turboquant

Updated Mar 28, 2026
Go

LostBeard / SpawnDev.ILGPU.ML

Sponsor

Star

Hardware-agnostic machine learning infrastructure for .NET. Implements high-performance neural network layers in C# that are transpiled to run on WebGPU, CUDA, OpenCL, WebGL, CPU, and Wasm via SpawnDev.ILGPU. Optimized for Blazor WebAssembly and native GPU execution.

Updated Mar 29, 2026
WGSL

diyagk01 / TurboRAG

Star

TurboQuant‑style embedding compression for RAG: an SDK using fixed rotations, PolarQuant, and QJL residual sketches for compact storage and fast similarity search

rag turboquant

Updated Mar 28, 2026
Python

amitshekhariitbhu / turboquant-experiment

Star

KV Cache with PagedAttention vs PagedAttention + TurboQuant - experiments across token sizes comparing memory, latency, and accuracy.

inference large-language-models llm llms llm-inference kvcache kvcache-optimization kvcache-compression turboquant

Updated Mar 26, 2026
Python

yzamari / turboQuantPlayground

Star

TurboQuant (ICLR 2026) ported to Apple Silicon — KV cache compression with MLX Metal kernels + PyTorch CPU

machine-learning deep-learning metal transformers inference pytorch attention quantization mlx iclr kv-cache apple-silicon llm llm-inference turboquant

Updated Mar 28, 2026
Python

Ryuketsukami / turboquant-compression

Star

Near-optimal vector quantization for LLM KV cache compression. Python implementation of TurboQuant (ICLR 2026) — PolarQuant + QJL for 3-bit quantization with minimal accuracy loss and up to 8x memory reduction.

Updated Mar 28, 2026
Python

Ryuketsukami / turboquant-skill

Star

AI agent skill implementing Google's TurboQuant compression algorithm (ICLR 2026) — 6x KV cache memory reduction, 8x speedup, zero accuracy loss. Compatible with Claude Code, Codex CLI, and all Agent Skills-compatible tools.

Updated Mar 28, 2026
Python

chahero / turboquant-experiments

Star

Interactive Benchmarking Tool for TurboQuant KV Cache Compression. Supports 2-4 bit quantization with Real-time Metrics

nlp machine-learning deep-learning pytorch transformer mistral vector-quantization model-compression inference-optimization kv-cache llm vllm qwen iclr-2026 turboquant

Updated Mar 28, 2026
Python

consilium-ai / consilium-ai

Star

Local AI agent with 16K context on 8GB RAM. No cloud, no API keys.

apple google web ai model cache opus kv silicon distilled rlm 16k 8gb recurcive qwen3 turboquant polarquant

Updated Mar 29, 2026
Python

wjddusrb03 / diffmind

Star

AI Code Review Memory - learns from your team's bug history and warns when similar patterns appear

python git ai developer-tools code-review semantic-search bug-detection turboquant

Updated Mar 28, 2026
Python

malwarebo / turboquant-cpp

Star

Implementation of https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/

ai cpp turboquant

Updated Mar 30, 2026
C++

tushu1232 / turboquant-server

Star

Turbo Index

google hpc gpu information-theory pytorch nearest-neighbor-search quantization vector-quantization kv-cache large-language-models llm-inference turboquant turboindexer

Updated Mar 25, 2026
Python

pp1840 / turboquant-llama-lab

Star

Experimental TurboQuant implementation and llama.cpp-style integration path for long-context inference

research inference quantization cude kv-cache long-context llama-cpp llm-inference turboquant guff

Updated Mar 29, 2026
C++

wjddusrb03 / chatmind

Star

ChatMind: Semantic search for Discord & KakaoTalk chat messages. Search by meaning, not keywords. Powered by TurboQuant compression (ICLR 2026).

multilingual python nlp discord embeddings developer-tools semantic-search kakaotalk cli-tool sentence-embeddings chat-export iclr2026 natural-language-search turboquant vector-compression chat-search message-search

Updated Mar 28, 2026
Python

wjddusrb03 / commitmind

Star

CommitMind: Semantic search for Git commit history powered by TurboQuant vector compression (ICLR 2026). Search commits by meaning, not just keywords.

python git nlp machine-learning embeddings code-search developer-tools quantization semantic-search git-history cli-tool sentence-embeddings commit-history commit-search iclr2026 natural-language-search turboquant vector-compression

Updated Mar 28, 2026
Python

Improve this page

Add a description, image, and links to the turboquant topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the turboquant topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

turboquant

Here are 33 public repositories matching this topic...

arozanov / turboquant-mlx

back2matching / turboquant

Firmamento-Technologies / TurboQuant

artalis-io / bitnet.c

jjang-ai / exploitbot

Lucien2468 / Ollama-TurboQuant-Integration

LostBeard / SpawnDev.ILGPU.ML

diyagk01 / TurboRAG

amitshekhariitbhu / turboquant-experiment

yzamari / turboQuantPlayground

Ryuketsukami / turboquant-compression

Ryuketsukami / turboquant-skill

chahero / turboquant-experiments

consilium-ai / consilium-ai

wjddusrb03 / diffmind

malwarebo / turboquant-cpp

tushu1232 / turboquant-server

pp1840 / turboquant-llama-lab

wjddusrb03 / chatmind

wjddusrb03 / commitmind

Improve this page

Add this topic to your repo