🛠️ LLM Toolkit

A practical Python library for developers working with Large Language Models.

Stop reinventing the wheel. This toolkit provides battle-tested utilities for the common challenges every AI developer faces: token counting, cost tracking, retry logic, caching, rate limiting, and output validation.

🎯 Why This Exists

Building with LLMs means dealing with the same problems over and over:

"How many tokens is this prompt?"
"How much is this API call going to cost?"
"The API timed out—now what?"
"I'm hitting rate limits constantly"
"How do I reliably extract JSON from model outputs?"
"I'm making the same expensive calls repeatedly"

This library solves all of these with clean, typed, well-tested code.

📦 Installation

pip install llm-toolkit

Or install from source:

git clone https://github.com/ThePagePage/llm-toolkit.git
cd llm-toolkit
pip install -e .

🚀 Quick Start

Token Counting

from llm_toolkit import count_tokens, estimate_cost

# Count tokens for any major model
tokens = count_tokens("Hello, how are you today?", model="gpt-4")
print(f"Token count: {tokens}")  # Token count: 7

# Works with messages too
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is Python?"}
]
tokens = count_tokens(messages, model="gpt-4")

Cost Estimation

from llm_toolkit import estimate_cost, CostTracker

# Quick estimate
cost = estimate_cost(
    input_tokens=1000,
    output_tokens=500,
    model="gpt-4-turbo"
)
print(f"Estimated cost: ${cost:.4f}")

# Track costs across your application
tracker = CostTracker()

# Log each API call
tracker.log(model="gpt-4-turbo", input_tokens=1500, output_tokens=800)
tracker.log(model="gpt-3.5-turbo", input_tokens=3000, output_tokens=1200)

# Get summaries
print(tracker.summary())
# {
#     'total_cost': 0.0845,
#     'total_input_tokens': 4500,
#     'total_output_tokens': 2000,
#     'by_model': {...}
# }

Smart Retry with Backoff

from llm_toolkit import retry_with_backoff, RetryConfig
import openai

# Simple retry with sensible defaults
@retry_with_backoff()
def call_api(prompt):
    return openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )

# Custom retry configuration
config = RetryConfig(
    max_retries=5,
    initial_delay=1.0,
    max_delay=60.0,
    exponential_base=2,
    retry_on=[openai.RateLimitError, openai.APITimeoutError]
)

@retry_with_backoff(config)
def robust_call(prompt):
    return openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )

Response Caching

from llm_toolkit import ResponseCache

# In-memory cache (default)
cache = ResponseCache()

# File-based persistent cache
cache = ResponseCache(backend="file", path="./llm_cache")

# Redis cache for distributed systems
cache = ResponseCache(backend="redis", url="redis://localhost:6379")

# Use the cache
cache_key = cache.make_key(model="gpt-4", messages=messages, temperature=0)

if cached := cache.get(cache_key):
    response = cached
else:
    response = openai.chat.completions.create(...)
    cache.set(cache_key, response, ttl=3600)  # Cache for 1 hour

Rate Limiting

from llm_toolkit import RateLimiter, TokenBucket

# Simple rate limiter (10 requests per minute)
limiter = RateLimiter(requests_per_minute=10)

for prompt in prompts:
    limiter.wait()  # Blocks if necessary
    response = call_api(prompt)

# Token bucket for more control
bucket = TokenBucket(
    capacity=100,        # Max burst
    refill_rate=10,      # Tokens per second
    tokens_per_request=1
)

# Async support
async with limiter:
    response = await async_call_api(prompt)

Output Validation & Parsing

from llm_toolkit import extract_json, validate_output, OutputSchema
from pydantic import BaseModel

# Extract JSON from messy LLM output
response_text = """
Sure! Here's the data you requested:
```json
{"name": "Alice", "age": 30, "city": "London"}

Let me know if you need anything else! """

data = extract_json(response_text)

{'name': 'Alice', 'age': 30, 'city': 'London'}

Validate against a Pydantic model

class Person(BaseModel): name: str age: int city: str

person = validate_output(response_text, Person)

Person(name='Alice', age=30, city='London')

Handle multiple JSON objects

text_with_multiple = "First: {"a": 1} and second: {"b": 2}" all_json = extract_json(text_with_multiple, multiple=True)

[{'a': 1}, {'b': 2}]


## 📊 Supported Models & Pricing

Token counting and cost estimation supports:

| Provider | Models |
|----------|--------|
| OpenAI | GPT-4, GPT-4 Turbo, GPT-4o, GPT-3.5 Turbo, o1, o1-mini |
| Anthropic | Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku |
| Google | Gemini 1.5 Pro/Flash, Gemini 1.0 Pro |
| Mistral | Mistral Large, Medium, Small, Mixtral |
| Meta | Llama 3.1 (405B, 70B, 8B) |
| Cohere | Command R+, Command R |

Pricing is updated regularly. You can also provide custom pricing:

```python
from llm_toolkit import estimate_cost, set_custom_pricing

set_custom_pricing("my-fine-tuned-model", {
    "input": 0.003,   # $ per 1K tokens
    "output": 0.006
})

cost = estimate_cost(1000, 500, model="my-fine-tuned-model")

🔧 Configuration

from llm_toolkit import configure

configure(
    default_model="gpt-4-turbo",
    cache_backend="redis",
    cache_url="redis://localhost:6379",
    rate_limit_rpm=60,
    retry_max_attempts=3
)

Environment variables are also supported:

LLM_TOOLKIT_DEFAULT_MODEL=gpt-4-turbo
LLM_TOOLKIT_CACHE_BACKEND=file
LLM_TOOLKIT_CACHE_PATH=./cache

🧪 Testing

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=llm_toolkit

📁 Project Structure

llm-toolkit/
├── llm_toolkit/
│   ├── __init__.py      # Public API exports
│   ├── tokens.py        # Token counting
│   ├── costs.py         # Cost estimation & tracking
│   ├── retry.py         # Retry with exponential backoff
│   ├── cache.py         # Response caching
│   ├── rate_limit.py    # Rate limiting utilities
│   ├── validation.py    # Output parsing & validation
│   └── models.py        # Model definitions & pricing
├── tests/
├── examples/
├── pyproject.toml
└── README.md

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Ideas for Contributions

Add support for more model providers
Improve token counting accuracy
Add more caching backends
Better async support
Documentation improvements

📄 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

tiktoken - OpenAI's token counting library
tenacity - Inspiration for retry patterns
The AI developer community for feedback and ideas

Built by developers, for developers. If this saves you time, consider giving it a ⭐!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛠️ LLM Toolkit

🎯 Why This Exists

📦 Installation

🚀 Quick Start

Token Counting

Cost Estimation

Smart Retry with Backoff

Response Caching

Rate Limiting

Output Validation & Parsing

{'name': 'Alice', 'age': 30, 'city': 'London'}

Validate against a Pydantic model

Person(name='Alice', age=30, city='London')

Handle multiple JSON objects

[{'a': 1}, {'b': 2}]

🔧 Configuration

🧪 Testing

📁 Project Structure

🤝 Contributing

Ideas for Contributions

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
llm_toolkit		llm_toolkit
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

🛠️ LLM Toolkit

🎯 Why This Exists

📦 Installation

🚀 Quick Start

Token Counting

Cost Estimation

Smart Retry with Backoff

Response Caching

Rate Limiting

Output Validation & Parsing

{'name': 'Alice', 'age': 30, 'city': 'London'}

Validate against a Pydantic model

Person(name='Alice', age=30, city='London')

Handle multiple JSON objects

[{'a': 1}, {'b': 2}]

🔧 Configuration

🧪 Testing

📁 Project Structure

🤝 Contributing

Ideas for Contributions

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages