Skip to content

ThePagePage/llm-toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ› οΈ LLM Toolkit

A practical Python library for developers working with Large Language Models.

Stop reinventing the wheel. This toolkit provides battle-tested utilities for the common challenges every AI developer faces: token counting, cost tracking, retry logic, caching, rate limiting, and output validation.

Python 3.8+ License: MIT

🎯 Why This Exists

Building with LLMs means dealing with the same problems over and over:

  • "How many tokens is this prompt?"
  • "How much is this API call going to cost?"
  • "The API timed outβ€”now what?"
  • "I'm hitting rate limits constantly"
  • "How do I reliably extract JSON from model outputs?"
  • "I'm making the same expensive calls repeatedly"

This library solves all of these with clean, typed, well-tested code.

πŸ“¦ Installation

pip install llm-toolkit

Or install from source:

git clone https://github.com/ThePagePage/llm-toolkit.git
cd llm-toolkit
pip install -e .

πŸš€ Quick Start

Token Counting

from llm_toolkit import count_tokens, estimate_cost

# Count tokens for any major model
tokens = count_tokens("Hello, how are you today?", model="gpt-4")
print(f"Token count: {tokens}")  # Token count: 7

# Works with messages too
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is Python?"}
]
tokens = count_tokens(messages, model="gpt-4")

Cost Estimation

from llm_toolkit import estimate_cost, CostTracker

# Quick estimate
cost = estimate_cost(
    input_tokens=1000,
    output_tokens=500,
    model="gpt-4-turbo"
)
print(f"Estimated cost: ${cost:.4f}")

# Track costs across your application
tracker = CostTracker()

# Log each API call
tracker.log(model="gpt-4-turbo", input_tokens=1500, output_tokens=800)
tracker.log(model="gpt-3.5-turbo", input_tokens=3000, output_tokens=1200)

# Get summaries
print(tracker.summary())
# {
#     'total_cost': 0.0845,
#     'total_input_tokens': 4500,
#     'total_output_tokens': 2000,
#     'by_model': {...}
# }

Smart Retry with Backoff

from llm_toolkit import retry_with_backoff, RetryConfig
import openai

# Simple retry with sensible defaults
@retry_with_backoff()
def call_api(prompt):
    return openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )

# Custom retry configuration
config = RetryConfig(
    max_retries=5,
    initial_delay=1.0,
    max_delay=60.0,
    exponential_base=2,
    retry_on=[openai.RateLimitError, openai.APITimeoutError]
)

@retry_with_backoff(config)
def robust_call(prompt):
    return openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )

Response Caching

from llm_toolkit import ResponseCache

# In-memory cache (default)
cache = ResponseCache()

# File-based persistent cache
cache = ResponseCache(backend="file", path="./llm_cache")

# Redis cache for distributed systems
cache = ResponseCache(backend="redis", url="redis://localhost:6379")

# Use the cache
cache_key = cache.make_key(model="gpt-4", messages=messages, temperature=0)

if cached := cache.get(cache_key):
    response = cached
else:
    response = openai.chat.completions.create(...)
    cache.set(cache_key, response, ttl=3600)  # Cache for 1 hour

Rate Limiting

from llm_toolkit import RateLimiter, TokenBucket

# Simple rate limiter (10 requests per minute)
limiter = RateLimiter(requests_per_minute=10)

for prompt in prompts:
    limiter.wait()  # Blocks if necessary
    response = call_api(prompt)

# Token bucket for more control
bucket = TokenBucket(
    capacity=100,        # Max burst
    refill_rate=10,      # Tokens per second
    tokens_per_request=1
)

# Async support
async with limiter:
    response = await async_call_api(prompt)

Output Validation & Parsing

from llm_toolkit import extract_json, validate_output, OutputSchema
from pydantic import BaseModel

# Extract JSON from messy LLM output
response_text = """
Sure! Here's the data you requested:
```json
{"name": "Alice", "age": 30, "city": "London"}

Let me know if you need anything else! """

data = extract_json(response_text)

{'name': 'Alice', 'age': 30, 'city': 'London'}

Validate against a Pydantic model

class Person(BaseModel): name: str age: int city: str

person = validate_output(response_text, Person)

Person(name='Alice', age=30, city='London')

Handle multiple JSON objects

text_with_multiple = "First: {"a": 1} and second: {"b": 2}" all_json = extract_json(text_with_multiple, multiple=True)

[{'a': 1}, {'b': 2}]


## πŸ“Š Supported Models & Pricing

Token counting and cost estimation supports:

| Provider | Models |
|----------|--------|
| OpenAI | GPT-4, GPT-4 Turbo, GPT-4o, GPT-3.5 Turbo, o1, o1-mini |
| Anthropic | Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku |
| Google | Gemini 1.5 Pro/Flash, Gemini 1.0 Pro |
| Mistral | Mistral Large, Medium, Small, Mixtral |
| Meta | Llama 3.1 (405B, 70B, 8B) |
| Cohere | Command R+, Command R |

Pricing is updated regularly. You can also provide custom pricing:

```python
from llm_toolkit import estimate_cost, set_custom_pricing

set_custom_pricing("my-fine-tuned-model", {
    "input": 0.003,   # $ per 1K tokens
    "output": 0.006
})

cost = estimate_cost(1000, 500, model="my-fine-tuned-model")

πŸ”§ Configuration

from llm_toolkit import configure

configure(
    default_model="gpt-4-turbo",
    cache_backend="redis",
    cache_url="redis://localhost:6379",
    rate_limit_rpm=60,
    retry_max_attempts=3
)

Environment variables are also supported:

LLM_TOOLKIT_DEFAULT_MODEL=gpt-4-turbo
LLM_TOOLKIT_CACHE_BACKEND=file
LLM_TOOLKIT_CACHE_PATH=./cache

πŸ§ͺ Testing

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=llm_toolkit

πŸ“ Project Structure

llm-toolkit/
β”œβ”€β”€ llm_toolkit/
β”‚   β”œβ”€β”€ __init__.py      # Public API exports
β”‚   β”œβ”€β”€ tokens.py        # Token counting
β”‚   β”œβ”€β”€ costs.py         # Cost estimation & tracking
β”‚   β”œβ”€β”€ retry.py         # Retry with exponential backoff
β”‚   β”œβ”€β”€ cache.py         # Response caching
β”‚   β”œβ”€β”€ rate_limit.py    # Rate limiting utilities
β”‚   β”œβ”€β”€ validation.py    # Output parsing & validation
β”‚   └── models.py        # Model definitions & pricing
β”œβ”€β”€ tests/
β”œβ”€β”€ examples/
β”œβ”€β”€ pyproject.toml
└── README.md

🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Ideas for Contributions

  • Add support for more model providers
  • Improve token counting accuracy
  • Add more caching backends
  • Better async support
  • Documentation improvements

πŸ“„ License

MIT License - see LICENSE for details.

πŸ™ Acknowledgments

  • tiktoken - OpenAI's token counting library
  • tenacity - Inspiration for retry patterns
  • The AI developer community for feedback and ideas

Built by developers, for developers. If this saves you time, consider giving it a ⭐!

About

A practical Python library for developers working with Large Language Models - token counting, cost tracking, retry logic, caching, rate limiting, and output validation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages