Serve Crossencoder with Triton

In this example we will show how to deploy a Crossencoder model using Triton Inference Server and use it as a LangChain reranker. We will use the BGE Reranker v2 model, but this tutorial can be adapted to any Crossencoder model (with some possible changes in the model inference).

Setup

To run this example you should copy the model files into the model-repository/bge-reranker-v2-m3/1/ directory (they usually are in .cache dir). The model-repository structure should look like this:

model-repository/
└── bge-reranker-v2-m3/
    └── 1/
        ├── model.py
        ├── config.json
        ├── sentencepiece.bpe.model
        ├── special_tokens_map.json
        ├── tokenizer_config.json
        └── tokenizer.json
    └── config.pbtxt

Usage

Create and run the docker:

docker compose up

Install the required Python packages:

pip install -r requirements.txt

Run the example script:

python reranker.py

GPU Configuration Notes

If you have more than one GPU, there are several useful parameters you can set:

Inside docker-compose.yml:

    deploy:
      resources:
        reservations:
            devices:
                - driver: nvidia
                  capabilities: [gpu]
                  count: 2 # Number of GPUs to use
                  gpu_ids: [ '0' , '1' ] # IDs of the GPUs to use (SHOULD NOT HAVE SET COUNT)
    environment:
      - NVIDIA_VISIBLE_DEVICES=0,1 # IDs of the GPUs to use

Inside config.pbtxt:

instance_group [
  {
    count: 1
    kind: KIND_GPU
    gpus: [ 0 ] # IDs of the GPUs to use. Id that container sees (e.g. if gpu_ids: [ '1' ], inside container it will be 0)
  }
]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
model-repository/bge-reranker-v2-m3		model-repository/bge-reranker-v2-m3
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
reranker.py		reranker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Serve Crossencoder with Triton

Setup

Usage

GPU Configuration Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Serve Crossencoder with Triton

Setup

Usage

GPU Configuration Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages