Senior AI Engineer building production AI systems — multi-agent orchestration, RAG & search infrastructure, observability pipelines, and streaming data platforms. 12+ years shipping systems at scale, 3+ years focused on AI/ML engineering.
I don't build prototypes. Everything I ship runs in production.
| Project | Description | Tech Stack |
|---|---|---|
| Agentic Patterns | Multi-protocol AI agent platform with production observability | FastAPI · Pydantic AI · MCP · A2A · gRPC · SSE · PostgreSQL · pgvector · Kafka Streams · ClickHouse · OpenTelemetry |
| MLOps-GCP | MLOps platform on GCP with multi-framework training and RAG pipelines | FastAPI · Vertex AI · BigQuery · Cloud Run · Terraform · MLflow |
| DocumentMind AI | Enterprise RAG system on AWS Bedrock | LangChain · AWS Bedrock · Pinecone · FastAPI |
| AgentBench | Multi-agent evaluation framework | LangGraph · Agno · MCP Protocol · Docker |
| BetterBets AI | Recommendation engine with BERT classification | SageMaker · PyTorch · Pinecone · HuggingFace |
| VoiceFlow POS | Voice-enabled POS with multi-agent orchestration | LangGraph · Whisper · ChromaDB · Streamlit |
| Field GenAI MCP | Multi-document chat with MCP Protocol | Socket.io · Redis · Gemini AI · MCP |
| SD-MLOps-Studio | Stable Diffusion platform with LoRA training | PyTorch · ComfyUI · FastAPI · Docker · Kubernetes |
Production-grade architecture for AI agent platforms across multiple industry verticals.
3-Layer Architecture:
- Agent Layer — Pydantic AI runtime, tool orchestration, RAG with ColBERT reranking, cognitive memory (episodic/semantic/procedural), MCP tool integrations
- Observability Layer — OpenTelemetry → Kafka Streams (Java 21) → ClickHouse with tiered storage (hot/warm/cold). SLO error budget policies, T-Digest/HyperLogLog/Welford's for streaming stats
- Domain Verticals — Built across Healthcare, Education, Logistics, Finance, Trading, and Creative industries
7 Protocols: REST · MCP · A2A v0.3 · gRPC · Webhook · Temporal · SSE
Agentic Patterns Implemented:
| Pattern | How It's Applied |
|---|---|
| ReAct Loop | Reusable tool-use loop with SSE streaming, session persistence, and parallel tool execution |
| Supervisor + Specialist | Orchestrator delegates to domain-specific agents with mandatory execution chains and behavioral detection |
| Role-Based Routing | Same agent backend serves different personas with different tools, prompts, and data visibility per role |
| Human-in-the-Loop | Two-step confirmation: agent drafts, user confirms, agent executes — for approvals, outreach, bookings |
| Cross-Agent State | Agents share state across domains — an action in one agent triggers effects in another |
| Parallel Tool Execution | Multiple tool calls resolved concurrently within a single agent turn |
| Multi-Protocol Endpoints | Single registration generates REST + MCP + A2A + SSE endpoints per vertical |
Guardrails: 5-layer system (hallucination, compliance, PII, toxicity, industry-specific)
Stack: FastAPI · Pydantic AI · MCP · A2A · gRPC · SSE · PostgreSQL · pgvector · Qdrant · Redis · Kafka Streams · ClickHouse · OpenTelemetry · Composio · Temporal · Docker
Production RAG with swappable backends — same pipeline abstraction runs on self-hosted pgvector or managed Pinecone with zero code changes.
End-to-end pipeline:
Ingest → Chunk (recursive 1500/150) → Enrich (SHA-256, category, timestamps)
→ Scan (injection detection) → Embed → Index
→ Query → Hybrid Search (dense + sparse) → Rerank → Guardrails → Generate
Two production implementations against the same ABCs:
| Component | pgvector (Self-Hosted) | Pinecone (Managed) |
|---|---|---|
| Indexing | HNSW + GIN indexes, manual tuning | Serverless, auto-scaling |
| Hybrid Search | Dense + BM25 + RRF fusion (k=60) in single SQL query | Dense + sparse in single query, alpha-weighted fusion |
| Reranking | ms-marco-MiniLM-L-6-v2 (local Docker) | bge-reranker-v2-m3 (Pinecone managed) + local fallback |
| Embeddings | all-MiniLM-L6-v2 (384d, local) | multilingual-e5-large (1024d, Pinecone Inference API) |
| Multi-tenancy | tenant_id column + WHERE clause |
Namespace isolation (physically separated) |
| Fusion | RRF in Python | Built-in (single API call) |
Confidence-driven response routing:
- >= 0.85: Answer directly
- 0.60–0.84: Hedge with caveats
- 0.40–0.59: Cite sources explicitly
- < 0.40: Decline to answer
5-layer confidence pipeline: domain check → grounding → claim verification → weighted confidence score → compliance (FCRA, HIPAA, GDPR, FERPA per-tenant config)
Dual reranker system: Cross-encoder + ColBERT late-interaction, RRF fusion of both, 3-level fallback (both → one → original). Query agent with rewriting and sub-query decomposition.
Evaluation: Correction rate decay (primary metric), MRR, NDCG@k, confidence calibration (ECE), A/B comparison (with RAG vs without)
Stack: pgvector · Pinecone Serverless · Pinecone Inference API · PostgreSQL 16 · FastAPI · MCP Protocol · Docker
- Pydantic AI for type-safe agent orchestration with structured outputs and dependency injection
- LangGraph + StateGraph for workflow orchestration and checkpointing
- Raw Anthropic SDK tool-use for maximum control when frameworks add too much abstraction
- MCP Protocol servers with FastMCP, discovery endpoints, multi-server catalogs
- A2A v0.3 agent cards for agent-to-agent communication
- Google ADK and Agno for additional agent patterns
- Temporal for durable execution with retry logic
- Composio for 500+ tool integrations
- OpenTelemetry → Kafka Streams → ClickHouse full pipeline
- 8-processor OTel collector with tail-based sampling and attribute enrichment
- Kafka Streams dual-path topology: real-time aggregation + raw archival
- ClickHouse MergeTree with tiered storage and automatic rollup
- SLO framework: latency P99, availability, throughput, error rate with burn-rate alerts
- T-Digest, HyperLogLog, Welford's for streaming percentiles/cardinality/variance
- TensorFlow/Keras + PyTorch + sklearn training pipelines
- MLflow experiment tracking and model versioning
- Vertex AI + SageMaker deployment with auto-scaling
- LoRA/QLoRA fine-tuning, DistilBERT for classification tasks
- Terraform for reproducible GCP/AWS infrastructure
Agent Systems: Pydantic AI · LangGraph · MCP · A2A · Anthropic SDK · Google ADK · Agno · Composio · Temporal · n8n
RAG & Search: pgvector · Pinecone · Qdrant · ChromaDB · HNSW · BM25 · RRF · Cross-Encoder Reranking · ColBERT · Hybrid Search · Recursive Chunking · Confidence Routing · Injection Defense
ML/AI: PyTorch · TensorFlow · HuggingFace · BERT/DistilBERT · LoRA/QLoRA · SageMaker · Vertex AI · MLflow · AWS Bedrock · LLM-as-Judge
Observability: OpenTelemetry · Kafka Streams · ClickHouse · T-Digest · HyperLogLog · SLO Frameworks
Infrastructure: Python · TypeScript · FastAPI · Next.js · React · PostgreSQL · Redis · Docker · Kubernetes · Terraform · AWS · GCP · gRPC · GitHub Actions



