JavaScript/TypeScript implementation of LLMLingua-2 (Experimental)
- 
            Updated
            Sep 14, 2025 
- TypeScript
JavaScript/TypeScript implementation of LLMLingua-2 (Experimental)
Python command-line tool for interacting with AI models through the OpenRouter API/Cloudflare AI Gateway, or local self-hosted Ollama. Optionally support Microsoft LLMLingua prompt token compression
This repository is the official implementation of Generative Context Distillation.
Compress LLM Prompts and save 80%+ on GPT-4 in Python
A fast, Unix-style CLI tool for semantic prompt compression. Cuts LLM prompt tokens by 10-20x with >90% fidelity, saving costs and latency.
This repository contains the code and data of the paper titled "FrugalPrompt: Reducing Contextual Overhead in Large Language Models via Token Attribution."
Enhance the performance and cost-efficiency of large-scale Retrieval Augmented Generation (RAG) applications. Learn to integrate vector search with traditional database operations and apply techniques like prefiltering, postfiltering, projection, and prompt compression.
RL-Prompt-Compression employs graph-enhanced reinforcement learning with a Phi-3 compressor trained via GRPO using a TinyLlama evaluator and a MiniLM cross-encoder feedback model, to optimize prompt compression and improve model efficiency.
Add a description, image, and links to the prompt-compression topic page so that developers can more easily learn about it.
To associate your repository with the prompt-compression topic, visit your repo's landing page and select "manage topics."