local-reranker

A local reranker service with a Jina compatible API.

Overview

This project provides a FastAPI-based web service that implements a reranking API endpoint (/v1/rerank) compatible with the Jina AI Rerank API. It allows you to host a reranking model entirely on your own infrastructure for enhanced privacy and performance.

Features

Jina Compatible API: Implements /v1/rerank endpoint structure
Local Hosting: Run reranker model entirely on your own infrastructure
Multiple Backends: Supports both PyTorch and MLX backends for optimal performance
Apple Silicon Optimization: MLX backend provides optimized performance for M1/M2/M3 chips
MLX Fallback Reranker: Automatically wraps MLX-converted Hugging Face models that do not ship a rerank.py helper
Sentence Transformers: Uses powerful sentence-transformers library for PyTorch backend
Configurable Model: Easily switch between different reranker models and backends
Modern FastAPI: Built using modern FastAPI features like lifespan for resource management
Modern Dependencies: Updated to latest stable versions with sensible minimum requirements

Requirements

Python 3.12+
uv (for installation and package management - recommended)
Sufficient RAM and compute resources (CPU or GPU) depending on the chosen reranker model

Backend-Specific Requirements

PyTorch Backend:

PyTorch 2.5+ (automatically installed)
CUDA/MPS support for GPU acceleration (optional)

MLX Backend (Apple Silicon only):

Apple Silicon (M1/M2/M3) Mac
MLX and MLX-LM libraries (automatically installed)
Optimized for memory efficiency and performance on Apple chips

Installation

# Clone the repository
git clone https://github.com/olafgeibig/local-reranker.git
cd local-reranker

# Create virtual environment
uv venv
source .venv/bin/activate

# Install dependencies
uv pip install -e ".[dev]"

Usage

The CLI supports both modern subcommands and legacy arguments for backward compatibility.

Modern CLI (Recommended)

# Start server with subcommand
cli serve --backend <backend_type> [options]

# Show configuration
cli config show

Legacy CLI (Backward Compatible)

# Old-style arguments still work
cli --backend <backend_type> --model <model> --host <host> --port <port>

Available Backends

pytorch: PyTorch-based reranker (default, cross-platform)
mlx: MLX-based reranker (Apple Silicon optimized)

Command Options

--backend: Backend type to use (default: pytorch)
--model: Model name to use (overrides reranker default)
--host: Host to bind server to (default: 0.0.0.0)
--port: Port to bind server to (default: 8010)
--log-level: Uvicorn log level (debug, info, warning, error, critical; default: info)
--reload: Enable auto-reload for development

Examples

PyTorch Backend (default):

cli serve --backend pytorch --model jinaai/jina-reranker-v2-base-multilingual

MLX Backend (Apple Silicon):

cli serve --backend mlx --model jinaai/jina-reranker-v3-mlx

Development Mode:

cli serve --backend pytorch --reload --log-level debug

Configuration Management:

cli config show

API Usage

Once the server is running, you can send requests to the /v1/rerank endpoint. Here's an example using curl:

curl -X POST "http://localhost:8010/v1/rerank" \
     -H "Content-Type: application/json" \
     -d '{
           "model": "jina-reranker-v2-base-multilingual", 
           "query": "What are the benefits of using FastAPI?", 
           "documents": [
             "FastAPI is a modern, fast (high-performance) web framework for building APIs with Python 3.7+ based on standard Python type hints.",
             "Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design.",
             "The key features are: Fast, Fast to code, Fewer dependencies, Intuitive, Easy, Short, Robust, Standards-based.",
             "Flask is a micro web framework written in Python."
           ],
           "top_n": 3,
           "return_documents": true
         }'

Parameters

model: (Currently ignored by API, uses the configured default) The name of the reranker model
query: The search query string
documents: A list of strings or dictionaries ({"text": "..."}) to be reranked against the query
top_n: (Optional) The maximum number of results to return
return_documents: (Optional, default false) Whether to include document text in results

Development

Setting Up Development Environment

Clone the repository:

git clone https://github.com/olafgeibig/local-reranker.git
cd local-reranker

Create a virtual environment:
```
uv venv
source .venv/bin/activate
```
Install development dependencies:
```
uv pip install -e ".[dev]"
```

Verify installation:

# Test CLI works
cli config show

# Test server starts
cli serve --backend pytorch --help

Running Tests

Tests are implemented using pytest. To run tests:

# Ensure virtual environment is active
python -m pytest

# Or using uv run
uv run pytest

# Run specific test categories
uv run pytest -m "not integration"  # Skip integration tests
uv run pytest -m "integration"       # Only integration tests
uv run pytest -m "slow"            # Only slow tests

Code Quality

The project uses modern development tools:

# Run linting
uv run ruff check

# Run type checking
uv run mypy src/

# Run both
uv run ruff check && uv run mypy src/

MLX Backend Troubleshooting

Common Issues

MLX not found:

# Ensure you're on Apple Silicon
uname -m  # Should show arm64

# Install MLX dependencies
uv add mlx mlx-lm safetensors

Model download fails:

# Check internet connection
# Try manual download
huggingface-cli download jinaai/jina-reranker-v3-mlx

Performance issues:

# Check MLX is using GPU (if available)
python -c "import mlx; print(mlx.metal.is_available())"

# Monitor memory usage
top -o mem | grep python

Using MLX Models Without `rerank.py`

The MLX backend now includes an internal cross-encoder reranker that automatically loads any MLX-converted Hugging Face repository, even when the repo does not provide a rerank.py helper. When rerank.py is missing or cannot be imported, the server logs a message similar to Using internal cross-encoder fallback so you can confirm which path is active. If your model ships a projector.safetensors file, place it next to the weights—the fallback reranker will load it to project hidden states into the correct embedding space. When the projector file is absent, the reranker falls back to raw hidden states, so you can still experiment with newly converted models without extra work.

Configuration Issues

If you're having trouble with MLX backend configuration, try these explicit CLI commands:

# Force MLX backend with explicit model
cli serve --backend mlx --model jinaai/jina-reranker-v3-mlx

# Check current configuration
cli config show

# Use development mode for debugging
cli serve --backend mlx --reload --log-level debug

Model download fails:

# Check internet connection
# Try manual download
huggingface-cli download jinaai/jina-reranker-v3-mlx

Performance issues:

# Check MLX is using GPU (if available)
python -c "import mlx; print(mlx.metal.is_available())"

# Monitor memory usage
top -o mem | grep python

Environment Variables

The application uses pydantic-settings for configuration management. You can set the following environment variables to override defaults:

# Force MLX backend
export RERANKER_RERANKER_TYPE=mlx

# Custom model name
export RERANKER_MODEL_NAME=custom-mlx-model

# Custom host and port
export RERANKER_HOST=0.0.0.0
export RERANKER_PORT=8080

# Enable debug logging
export RERANKER_LOG_LEVEL=debug

# Enable auto-reload
export RERANKER_RELOAD=true

Note: Using the CLI command line options is recommended over environment variables for clarity.

Project Structure

local-reranker/
├── src/local_reranker/
│   ├── __init__.py
│   ├── api.py                   # FastAPI application
│   ├── cli.py                   # Command line interface
│   ├── config.py                # Configuration management
│   ├── models.py                # Pydantic models
│   ├── reranker.py              # Base reranker interface (protocol)
│   ├── reranker_pytorch.py      # PyTorch implementation
│   ├── reranker_mlx.py          # MLX implementation
│   ├── batch_manager.py         # Batch size and document management
│   ├── batch_processor.py       # Shared batch processing logic
│   ├── result_aggregator.py     # Result aggregation and ordering
│   ├── streaming_processor.py   # Streaming result processing
│   ├── tokenization_cache.py    # Tokenization caching
│   └── utils.py                 # Utility functions
├── tests/                       # Test suite
├── pyproject.toml               # Project configuration
└── README.md                    # This file

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.attic		.attic
.opencode		.opencode
.ralph-tui		.ralph-tui
.vscode		.vscode
docs		docs
src/local_reranker		src/local_reranker
tasks		tasks
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

local-reranker

Overview

Features

Requirements

Backend-Specific Requirements

Installation

Usage

Modern CLI (Recommended)

Legacy CLI (Backward Compatible)

Available Backends

Command Options

Examples

API Usage

Parameters

Development

Setting Up Development Environment

Running Tests

Code Quality

MLX Backend Troubleshooting

Common Issues

Using MLX Models Without `rerank.py`

Configuration Issues

Environment Variables

Project Structure

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

local-reranker

Overview

Features

Requirements

Backend-Specific Requirements

Installation

Usage

Modern CLI (Recommended)

Legacy CLI (Backward Compatible)

Available Backends

Command Options

Examples

API Usage

Parameters

Development

Setting Up Development Environment

Running Tests

Code Quality

MLX Backend Troubleshooting

Common Issues

Using MLX Models Without rerank.py

Configuration Issues

Environment Variables

Project Structure

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Using MLX Models Without `rerank.py`

Packages