VLM Node is designed to process images captured by smart glasses in retail environments. By leveraging Vision Language Models (VLM) and Large Language Models (LLM), it intelligently determines when specific tasks start and end within the store, enabling automated task tracking and analysis.
- Vision Language Model Integration: Uses Ollama for image analysis
- Job Queue System: Asynchronous job processing with PostgreSQL backend
- REST API: HTTP API for job submission and status tracking
- Docker Support: Full containerization with Docker Compose
- Kubernetes Ready: Helm charts for production deployment
Before you begin, make sure you have the following installed on your system:
| Variable | Description | Default | Required |
|---|---|---|---|
POSTGRES_URL |
PostgreSQL connection string | - | Yes |
VLM_MODEL |
Vision model used for analyzing images and detecting task events | moondream:1.8b |
Yes |
LLM_MODEL |
Language model used for interpreting and reasoning about detected events | llama3:latest |
Yes |
OLLAMA_HOST |
Ollama server URL | http://localhost:11434 |
Yes |
DATA_DIR |
Directory for storing data | - | Yes |
API_URL |
External API URL | - | Yes |
DDS_URL |
Data delivery service URL | - | Yes |
CLIENT_ID |
Client identifier, any string that helps us identify you | vlm-node |
Yes |
POSEMESH_EMAIL |
Email for external service | - | Yes |
POSEMESH_PASSWORD |
Password for external service | - | Yes |
IMAGE_BATCH_SIZE |
Number of images to process in batch | 5 |
No |
The system supports various Ollama models. https://ollama.com/search, check model input.
To use different models, update the VLM_MODEL and LLM_MODEL environment variables.
- Use a managed PostgreSQL database
- Set up proper SSL/TLS certificates
- Configure resource limits and requests
- Set up monitoring and logging
- Use secrets management for sensitive data
- Configure backup strategies
- Clone the repository:
git clone git@github.com:aukilabs/vlm-node.git
cd vlm-node- Set up environment variables:
Create a
.env.localfile with your configuration:
POSEMESH_EMAIL=
POSEMESH_PASSWORD=
- Start all services:
(Optional)Install the NVIDIA Container Toolkit https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installation
make docker-cpu
# or
make docker-gpuNote:
shm size should at least match the model size, ideally 1.5–2× for safetyIn
docker-compose.yml, adjust theshm_sizeparameter under theollama-gpuservice. For example:ollama-gpu: ... shm_size: 16gbSet this value according to the requirements of the models you intend to use. Insufficient shared memory may cause model loading or inference to fail.
- Verify the setup:
# Check if services are running
docker compose ps
# Test the API
curl http://localhost:8080/api/v1/jobs?limit=10# start ollama for cpu only by default
make server# start ollama for cpu only by default
make workerSubmit a job using the REST API:
curl -X POST http://localhost:8080/api/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"job_type": "task_timing_v1",
"query": {"ids": []},
"domain_id": "",
"input": {
"prompt": "Analyze this image for task completion",
"webhook_url": "",
"vlm_prompt": "Describe what you see in this image"
}
}'# List all jobs
curl "http://localhost:8080/api/v1/jobs?limit=100"
# Get specific job details
curl "http://localhost:8080/api/v1/jobs/{job_id}"You can perform real-time image inference by connecting to the WebSocket endpoint at ws://localhost:8080/api/v1/ws (or wss://domain.com/api/v1/ws for secure connections).
Note: None of the images or results are persisted.
Protocol Overview:
- Image Upload:
Send image data as binary messages over the WebSocket. The server will process images in batches of size
IMAGE_BATCH_SIZEor after a 10-second timeout, whichever comes first. - Prompt Submission: Send the prompt as a UTF-8 encoded text message.
- Server Response:
The server returns inference results as binary WebSocket messages containing a JSON object:
{"done": <bool>, "response": <string>}
Note:
If your client is not written in JavaScript, you must also respond to pong messages from the server to keep the connection alive.
Example (JavaScript):
let websocketInstance: WebSocket | null = null;
export function initializeWebSocket(): WebSocket {
const url = process.env.COMPUTE_NODE_URL;
if (!url) {
throw new Error("COMPUTE_NODE_URL environment variable is not set");
}
if (websocketInstance) {
return websocketInstance;
}
websocketInstance = new WebSocket(url);
console.log("WebSocket URL: ", url);
websocketInstance.onopen = () => {
console.log("WebSocket connected");
websocketInstance.send("Describe the art work you see in the photo.");
}
let response = "";
websocketInstance.onmessage = (event) => {
try {
let bufferPromise: Promise<ArrayBuffer>;
if (event.data instanceof ArrayBuffer) {
bufferPromise = Promise.resolve(event.data);
} else if (event.data instanceof Blob) {
bufferPromise = event.data.arrayBuffer();
} else {
bufferPromise = Promise.resolve(new TextEncoder().encode(event.data).buffer);
}
bufferPromise.then((buffer) => {
// Try to decode as UTF-8 string
let text: string;
try {
text = new TextDecoder("utf-8").decode(buffer);
} catch (e) {
console.error("Failed to decode WebSocket binary message as UTF-8", e);
return;
}
// Try to parse as JSON
try {
const parsed = JSON.parse(text);
if (
typeof parsed === "object" &&
parsed !== null &&
typeof parsed.response === "string" &&
typeof parsed.done === "boolean"
) {
response += parsed.response;
if (parsed.done) {
console.log("Compute Node response done:", response);
response = "";
}
} else {
console.warn("Received message is not in expected format:", parsed);
}
} catch (e) {
console.error("Failed to parse WebSocket message as JSON", e, "Raw text:", text);
}
});
} catch (err) {
console.error("Error handling WebSocket binary message", err);
}
}
websocketInstance.onclose = () => {
websocketInstance = null;
console.log("WebSocket closed");
}
websocketInstance.onerror = (event) => {
console.error("WebSocket error", event);
}
return websocketInstance;
}
export function sendPhotoToComputeNode(photo: PhotoData): void {
if (!websocketInstance) {
initializeWebSocket();
}
console.log("[STREAMING] Sending photo to Compute Node", photo.filename);
websocketInstance.send(photo.buffer);
}GET /api/v1/jobs- List jobsPOST /api/v1/jobs- Create a new jobGET /api/v1/jobs/{id}- Get job detailsPUT /api/v1/jobs/{id}- Retry a job
- Server doesn't start:
If the model is not already loaded into Ollama's memory, the server may need to pull the model first, which can take some time. To check the progress, run
docker compose logs -f ollama-cpuand look for messages indicating that the model is being downloaded.
View logs for specific services:
docker compose logs server
docker compose logs worker
docker compose logs ui
docker compose logs postgres
docker compose logs ollama-gpu
docker compose logs ollama-cpu