Applied AI Engineer with a strong full-stack background, focused on building production-grade RAG and LLM systems.
I work at the intersection of:
- Retrieval-Augmented Generation (RAG)
- Embeddings & semantic search
- Backend systems & APIs
- Product-focused AI engineering
- Designing end-to-end RAG systems for large document collections
- Building ingestion, chunking, and embedding pipelines
- Semantic & hybrid retrieval with citation-aware context assembly
- Streaming LLM responses integrated into real products
- Making RAG systems observable, tunable, and maintainable
- LLMs & RAG: OpenAI-compatible APIs, local models
- Backend: FastAPI, Node.js
- Data: PostgreSQL, vector databases (Qdrant / FAISS)
- Infra: background workers, async pipelines
- Frontend: React / Next.js (when product UX matters)
RAG is a system, not a prompt.
Data quality beats clever prompting.
Production constraints shape good AI.
- RAG Platform — production-oriented retrieval system for unstructured data
- Observer — AI-native tooling for working with complex data flows
- OSS libraries used in production products
New York, NY
https://github.com/keske





