RAMP: RL-guided Adaptive Mixed-Precision quantization for GGUF models. Data-free sensitivity analysis, evolutionary search, per-tensor type optimization. Produces hardware-optimized GGUF for consumer GPUs.
moe quantization sensitivity-analysis ramp mixed-precision llm llama-cpp qwen gguf qwen3 consumer-gpu imatrix ik-llama
-
Updated
Apr 16, 2026 - Python