Student: Sam Touahri (300234041)
This project implements a REST API that serves a pretrained Hugging Face embedding model inside a Docker container.
The service exposes:
GET /health– service status and model infoPOST /embed– returns embeddings for a list of textsPOST /similarity– returns cosine similarity between two texts
The model used:
sentence-transformers/all-MiniLM-L6-v2
- Embedding dimension: 384
- Lightweight (~90MB model)
- CPU-friendly
- Suitable for semantic similarity and search tasks
lab1-model-service/
- src/
- app.py
- tests/
- smoke.sh
- Dockerfile
- requirements.txt
- README.md
GET /health
Example:
curl http://localhost:5000/health
Response:
{ "status": "ok", "model": "sentence-transformers/all-MiniLM-L6-v2", "embedding_dimension": 384 }
POST /embed
Request body:
{ "texts": ["pavement", "sidewalk", "bottle", "cup"] }
Example:
curl -X POST http://localhost:5000/embed
-H "Content-Type: application/json"
-d '{"texts":["pavement","sidewalk","bottle","cup"]}'
Response:
{ "model": "sentence-transformers/all-MiniLM-L6-v2", "dimension": 384, "embeddings": [[...], [...], [...], [...]] }
POST /similarity
Request body:
{ "text_a": "pavement", "text_b": "sidewalk" }
Example:
curl -X POST http://localhost:5000/similarity
-H "Content-Type: application/json"
-d '{"text_a":"pavement","text_b":"sidewalk"}'
Response:
{ "model": "sentence-transformers/all-MiniLM-L6-v2", "dimension": 384, "similarity": 0.68, "text_a": "pavement", "text_b": "sidewalk" }
conda create -n seg4180-lab1 python=3.11 -y
conda activate seg4180-lab1
pip install -r requirements.txt
python src/app.py
Server runs at:
./tests/smoke.sh http://localhost:5000
docker build -t lab1-embed-service:latest .
docker run --rm -p 5000:5000 lab1-embed-service:latest
./tests/smoke.sh http://localhost:5000
Docker Hub Link:
https://hub.docker.com/repository/docker/otoua046/seg4180-embed-service/general
Image repository:
otoua046/seg4180-embed-service
Pull image:
docker pull otoua046/seg4180-embed-service:latest
Run from Docker Hub image:
docker run --rm -p 5000:5000 otoua046/seg4180-embed-service:latest
- Python 3.11
- Flask
- Waitress (WSGI server)
- sentence-transformers
- Torch
- Docker
- Model loads once at container startup.
- Embeddings are normalized, so similarity is computed using dot product (cosine similarity).
- Image built for linux/arm64.
- Service is production-ready using Waitress instead of Flask development server.
- Application code (
src/app.py) - Dockerfile
- requirements.txt
- Smoke test script
- Screenshots demonstrating:
- Running container
- Working API calls
- Docker Hub repository page
- Docker Hub link