Distributed Visual Semantic Search Engine

Overview

Eye-Q is a cloud-native Video-to-Vector Search Engine capable of identifying and retrieving specific person attributes from live video streams using natural language queries (e.g., "person in white shirt").

The system is engineered for scalability and resilience, deploying AI inference models as microservices on a Kubernetes (K3s) cluster. The entire infrastructure is provisioned via Terraform on AWS, ensuring consistent and reproducible deployments.

Architecture

The pipeline follows an Event-Driven Architecture to decouple ingestion, inference, and storage:

graph LR
    subgraph AWS_Cloud["AWS Cloud (EC2 t3.small)"]
        subgraph Kubernetes_Cluster["Kubernetes Cluster (K3s)"]
            A[Producer Pod] -->|Video Frames| B{Kafka Service}
            B -->|Consume| C[AI Worker Pod]
            C -->|YOLOv8 + CLIP| D[Vector Embedding]
            D -->|Upsert| E[(Qdrant Vector DB)]
            F[API Server Pod] -->|Vector Search| E
            G[Frontend Pod] -->|HTTP Request| F
        end
    end
    User -->|Browser| G

Key Technical Features

Cloud-Native Microservices Architecture

All system components—including video producers, AI inference workers, API services, and frontend—are deployed as containerized microservices. Kubernetes (K3s) is used for orchestration to enable service isolation, independent scaling, and operational resilience.

Event-Driven Data Pipeline

The system adopts an event-driven architecture using Apache Kafka to decouple video ingestion from AI inference. This design improves throughput, enables horizontal scaling of AI workers, and prevents backpressure from propagating across services.

Infrastructure as Code (IaC)

All cloud infrastructure resources (EC2 instances, networking, security groups, and access keys) are provisioned using Terraform. This ensures reproducible deployments, environment consistency, and simplified infrastructure management.

Resource-Efficient AI Inference

The AI pipeline is optimized to run on low-cost CPU-only instances:

PyTorch CPU inference is used to avoid GPU dependency
Memory usage is carefully controlled during model initialization
Swap memory is leveraged to handle transient memory spikes

This allows the entire system to operate reliably on AWS EC2 t3.small instances.

Resilience and Self-Healing

Kubernetes Deployments are configured to automatically restart failed pods caused by crashes or out-of-memory (OOM) events. This self-healing behavior ensures continuous availability without manual intervention.

Unified Container Image Strategy

A single optimized Docker image is reused across multiple services. Runtime behavior is configured via environment variables, reducing image sprawl, minimizing build times, and ensuring consistent execution environments across the cluster.

Scalable Vector Search Backend

Semantic embeddings generated from YOLOv8 and CLIP models are stored in Qdrant, a high-performance vector database optimized for similarity search. This enables real-time natural language queries over large-scale video-derived embeddings.

Tech Stack

Infrastructure

AWS EC2
Terraform
Docker
K3s (Lightweight Kubernetes)

Messaging

Apache Kafka
Zookeeper

AI / Machine Learning

YOLOv8 (Object Detection)
CLIP (Visual-Semantic Embedding)
PyTorch

Backend and Database

FastAPI
Qdrant (Vector Database)

Frontend

Streamlit

Visual Documentation

Terminal outputs showing Kafka streaming and AI processing

Live demonstration of semantic search functionality

Video Example

Quick Start (Deployment)

This system follows a GitOps-style deployment workflow.

Provision Infrastructure

cd terraform
terraform apply

Access the Cluster

ssh -i eye-q-key.pem ubuntu@<YOUR_EC2_IP >

Copyfile to EC2

cd ec2-files
scp -i eye-q-key.pem *.py *.yaml ubuntu@<YOUR_EC2_IP >:~/

library install

sudo apt update && sudo apt install -y python3-pip

pip3 install kafka-python opencv-python-headless numpy ultralytics torch transformers uvicorn fastapi streamlit requests --no-cache-dir

pip3 install qdrant-client==1.7.0

Deploy the Kubernetes Stack and Other

kubectl apply -f kafka-lite.yaml
kubectl apply -f qdrant-lite.yaml

get videos

wget https://raw.githubusercontent.com/opencv/opencv/master/samples/data/vtest.avi -O video_test.avi

(You need to copy file to EC2 first and set up wordspace)

sudo apt update
sudo apt install -y libgl1 libglib2.0-0

mkdir -p ~/.kube

sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config

sudo chown ubuntu:ubuntu ~/.kube/config

chmod 600 ~/.kube/config

export KUBECONFIG=~/.kube/config

kubectl get nodes

python3 producer.py # waiting 30s - 60s then can stop  
python3 ai_memory.py   # waiting 30s - 60s then can stop      
python3 api_server.py  
streamlit run frontend.py

Run UI

Local URL: http://<YOUR_EC2_IP >:8501

Monitor Status

watch kubectl get pods -A

Terminate

terraform destroy

Use Cases

Real-time surveillance and person search
Semantic video analytics
Edge-to-cloud AI pipelines
Large-scale video understanding systems

Disclaimer

This project is developed for experimental and educational purposes only, with a primary focus on exploring and validating modern system architectures, including Event-Driven Microservices (Apache Kafka), Cloud-Native Orchestration (Kubernetes/K3s), and Scalable MLOps Pipelines.

1. Proof of Concept (PoC)

The system and its components are designed as a Proof of Concept (PoC) to demonstrate technical integration between AI inference models and distributed systems. It is not intended for production-grade environments, commercial deployment, or mission-critical security systems.

2. Privacy and Ethics

As this project involves Video Analytics and Person Attribute Retrieval, users must be mindful of Data Privacy and Ethical AI practices. This software should not be used for unauthorized surveillance, tracking individuals without consent, or any activities that violate privacy laws (such as GDPR or PDPA). The developer does not advocate for or support the use of this technology in ways that infringe upon civil liberties.

3. Liability

The developer assumes no responsibility for any consequences, damages, or legal issues arising from the use or misuse of this software. The implementation is provided "as-is" without any warranties regarding performance, reliability, or security.

👨‍💻 Author

Sitta Boonkaew
AI Engineer Intern @ AI SmartTech

📄 License

This project is a personal project .

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
ec2-files		ec2-files
images		images
.gitignore		.gitignore
Dockerfile		Dockerfile
main.tf		main.tf
readme.md		readme.md

Folders and files

Latest commit

History

Repository files navigation

Distributed Visual Semantic Search Engine

Overview

Architecture

Key Technical Features

Cloud-Native Microservices Architecture

Event-Driven Data Pipeline

Infrastructure as Code (IaC)

Resource-Efficient AI Inference

Resilience and Self-Healing

Unified Container Image Strategy

Scalable Vector Search Backend

Tech Stack

Infrastructure

Messaging

AI / Machine Learning

Backend and Database

Frontend

Visual Documentation

Quick Start (Deployment)

Provision Infrastructure

Access the Cluster

Copyfile to EC2

library install

Deploy the Kubernetes Stack and Other

get videos

(You need to copy file to EC2 first and set up wordspace)

Run UI

Monitor Status

Terminate

Use Cases

Disclaimer

1. Proof of Concept (PoC)

2. Privacy and Ethics

3. Liability

👨‍💻 Author

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages