Eye-Q is a cloud-native Video-to-Vector Search Engine capable of identifying and retrieving specific person attributes from live video streams using natural language queries (e.g., "person in white shirt").
The system is engineered for scalability and resilience, deploying AI inference models as microservices on a Kubernetes (K3s) cluster. The entire infrastructure is provisioned via Terraform on AWS, ensuring consistent and reproducible deployments.
The pipeline follows an Event-Driven Architecture to decouple ingestion, inference, and storage:
graph LR
subgraph AWS_Cloud["AWS Cloud (EC2 t3.small)"]
subgraph Kubernetes_Cluster["Kubernetes Cluster (K3s)"]
A[Producer Pod] -->|Video Frames| B{Kafka Service}
B -->|Consume| C[AI Worker Pod]
C -->|YOLOv8 + CLIP| D[Vector Embedding]
D -->|Upsert| E[(Qdrant Vector DB)]
F[API Server Pod] -->|Vector Search| E
G[Frontend Pod] -->|HTTP Request| F
end
end
User -->|Browser| G
All system componentsβincluding video producers, AI inference workers, API services, and frontendβare deployed as containerized microservices. Kubernetes (K3s) is used for orchestration to enable service isolation, independent scaling, and operational resilience.
The system adopts an event-driven architecture using Apache Kafka to decouple video ingestion from AI inference. This design improves throughput, enables horizontal scaling of AI workers, and prevents backpressure from propagating across services.
All cloud infrastructure resources (EC2 instances, networking, security groups, and access keys) are provisioned using Terraform. This ensures reproducible deployments, environment consistency, and simplified infrastructure management.
The AI pipeline is optimized to run on low-cost CPU-only instances:
- PyTorch CPU inference is used to avoid GPU dependency
- Memory usage is carefully controlled during model initialization
- Swap memory is leveraged to handle transient memory spikes
This allows the entire system to operate reliably on AWS EC2 t3.small instances.
Kubernetes Deployments are configured to automatically restart failed pods caused by crashes or out-of-memory (OOM) events. This self-healing behavior ensures continuous availability without manual intervention.
A single optimized Docker image is reused across multiple services. Runtime behavior is configured via environment variables, reducing image sprawl, minimizing build times, and ensuring consistent execution environments across the cluster.
Semantic embeddings generated from YOLOv8 and CLIP models are stored in Qdrant, a high-performance vector database optimized for similarity search. This enables real-time natural language queries over large-scale video-derived embeddings.
- AWS EC2
- Terraform
- Docker
- K3s (Lightweight Kubernetes)
- Apache Kafka
- Zookeeper
- YOLOv8 (Object Detection)
- CLIP (Visual-Semantic Embedding)
- PyTorch
- FastAPI
- Qdrant (Vector Database)
- Streamlit
Live demonstration of semantic search functionality
This system follows a GitOps-style deployment workflow.
cd terraform
terraform applyssh -i eye-q-key.pem ubuntu@<YOUR_EC2_IP >cd ec2-files
scp -i eye-q-key.pem *.py *.yaml ubuntu@<YOUR_EC2_IP >:~/
sudo apt update && sudo apt install -y python3-pip
pip3 install kafka-python opencv-python-headless numpy ultralytics torch transformers uvicorn fastapi streamlit requests --no-cache-dir
pip3 install qdrant-client==1.7.0kubectl apply -f kafka-lite.yaml
kubectl apply -f qdrant-lite.yamlwget https://raw.githubusercontent.com/opencv/opencv/master/samples/data/vtest.avi -O video_test.avisudo apt update
sudo apt install -y libgl1 libglib2.0-0
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown ubuntu:ubuntu ~/.kube/config
chmod 600 ~/.kube/config
export KUBECONFIG=~/.kube/config
kubectl get nodes
python3 producer.py # waiting 30s - 60s then can stop
python3 ai_memory.py # waiting 30s - 60s then can stop
python3 api_server.py
streamlit run frontend.py Local URL: http://<YOUR_EC2_IP >:8501watch kubectl get pods -Aterraform destroy-
Real-time surveillance and person search
-
Semantic video analytics
-
Edge-to-cloud AI pipelines
-
Large-scale video understanding systems
This project is developed for experimental and educational purposes only, with a primary focus on exploring and validating modern system architectures, including Event-Driven Microservices (Apache Kafka), Cloud-Native Orchestration (Kubernetes/K3s), and Scalable MLOps Pipelines.
The system and its components are designed as a Proof of Concept (PoC) to demonstrate technical integration between AI inference models and distributed systems. It is not intended for production-grade environments, commercial deployment, or mission-critical security systems.
As this project involves Video Analytics and Person Attribute Retrieval, users must be mindful of Data Privacy and Ethical AI practices. This software should not be used for unauthorized surveillance, tracking individuals without consent, or any activities that violate privacy laws (such as GDPR or PDPA). The developer does not advocate for or support the use of this technology in ways that infringe upon civil liberties.
The developer assumes no responsibility for any consequences, damages, or legal issues arising from the use or misuse of this software. The implementation is provided "as-is" without any warranties regarding performance, reliability, or security.
Sitta Boonkaew
AI Engineer Intern @ AI SmartTech
Β© 2025 Sitta Boonkaew. All rights reserved.
This project is a personal project .



