This is a simple FastAPI-based service that provides two main endpoints:
/embed: Generates normalized sentence embeddings using theintfloat/e5-largemodel./count-tokens: Counts the number of tokens for each input sentence using the Hugging Face tokenizer.
- Docker installed
To include the model in the build context, clone the Hugging Face repository inside the app folder:
git clone https://huggingface.co/intfloat/e5-large app/e5_modelMake sure the directory structure is like this:
.
βββ app
β βββ main.py
β βββ requirements.txt
β βββ e5_model
β βββ config.json
...
You can build the Docker image and assign it a specific version (e.g. 0.2.0):
docker build -t e5-embedder:0.2.0 .To run the service and expose it on port 8000:
docker run -p 8000:8000 e5-embedder:0.2.0The API will be accessible at: http://localhost:8000
Request:
{
"sentences": ["What is the capital of France?", "Tell me about Python."]
}Response:
{
"vectors": [[...], [...]]
}Each vector is a normalized embedding of the input sentence.
Request:
{
"sentences": ["Hello world!", "This is a test."]
}Response:
{
"token_counts": [
{"sentence": "Hello world!", "token_count": 4},
{"sentence": "This is a test.", "token_count": 6}
]
}.
βββ Dockerfile
βββ app
β βββ main.py
β βββ requirements.txt
β βββ e5_model/ # Cloned model files here
To remove the container after testing:
docker ps -a # Find container ID
docker rm <container_id>To remove the image:
docker rmi e5-embedder:0.2.0