Wington Brito wingtonrbrito

Wington Brito

Senior AI Engineer building production AI systems — multi-agent orchestration, RAG & search infrastructure, observability pipelines, and streaming data platforms. 12+ years shipping systems at scale, 3+ years focused on AI/ML engineering.

I don't build prototypes. Everything I ship runs in production.

🤖 AI & Infrastructure Portfolio

Project	Description	Tech Stack
Agentic Patterns	Multi-protocol AI agent platform with production observability	FastAPI · Pydantic AI · MCP · A2A · gRPC · SSE · PostgreSQL · pgvector · Kafka Streams · ClickHouse · OpenTelemetry
MLOps-GCP	MLOps platform on GCP with multi-framework training and RAG pipelines	FastAPI · Vertex AI · BigQuery · Cloud Run · Terraform · MLflow
DocumentMind AI	Enterprise RAG system on AWS Bedrock	LangChain · AWS Bedrock · Pinecone · FastAPI
AgentBench	Multi-agent evaluation framework	LangGraph · Agno · MCP Protocol · Docker
BetterBets AI	Recommendation engine with BERT classification	SageMaker · PyTorch · Pinecone · HuggingFace
VoiceFlow POS	Voice-enabled POS with multi-agent orchestration	LangGraph · Whisper · ChromaDB · Streamlit
Field GenAI MCP	Multi-document chat with MCP Protocol	Socket.io · Redis · Gemini AI · MCP
SD-MLOps-Studio	Stable Diffusion platform with LoRA training	PyTorch · ComfyUI · FastAPI · Docker · Kubernetes

🏗️ Agentic Patterns — Multi-Protocol Agent Infrastructure

Production-grade architecture for AI agent platforms across multiple industry verticals.

3-Layer Architecture:

Agent Layer — Pydantic AI runtime, tool orchestration, RAG with ColBERT reranking, cognitive memory (episodic/semantic/procedural), MCP tool integrations
Observability Layer — OpenTelemetry → Kafka Streams (Java 21) → ClickHouse with tiered storage (hot/warm/cold). SLO error budget policies, T-Digest/HyperLogLog/Welford's for streaming stats
Domain Verticals — Built across Healthcare, Education, Logistics, Finance, Trading, and Creative industries

7 Protocols: REST · MCP · A2A v0.3 · gRPC · Webhook · Temporal · SSE

Agentic Patterns Implemented:

Pattern	How It's Applied
ReAct Loop	Reusable tool-use loop with SSE streaming, session persistence, and parallel tool execution
Supervisor + Specialist	Orchestrator delegates to domain-specific agents with mandatory execution chains and behavioral detection
Role-Based Routing	Same agent backend serves different personas with different tools, prompts, and data visibility per role
Human-in-the-Loop	Two-step confirmation: agent drafts, user confirms, agent executes — for approvals, outreach, bookings
Cross-Agent State	Agents share state across domains — an action in one agent triggers effects in another
Parallel Tool Execution	Multiple tool calls resolved concurrently within a single agent turn
Multi-Protocol Endpoints	Single registration generates REST + MCP + A2A + SSE endpoints per vertical

Guardrails: 5-layer system (hallucination, compliance, PII, toxicity, industry-specific)

Stack: FastAPI · Pydantic AI · MCP · A2A · gRPC · SSE · PostgreSQL · pgvector · Qdrant · Redis · Kafka Streams · ClickHouse · OpenTelemetry · Composio · Temporal · Docker

🔍 Knowledge Engine — Production RAG Infrastructure

Production RAG with swappable backends — same pipeline abstraction runs on self-hosted pgvector or managed Pinecone with zero code changes.

End-to-end pipeline:

Ingest → Chunk (recursive 1500/150) → Enrich (SHA-256, category, timestamps)
  → Scan (injection detection) → Embed → Index
  → Query → Hybrid Search (dense + sparse) → Rerank → Guardrails → Generate

Two production implementations against the same ABCs:

Component	pgvector (Self-Hosted)	Pinecone (Managed)
Indexing	HNSW + GIN indexes, manual tuning	Serverless, auto-scaling
Hybrid Search	Dense + BM25 + RRF fusion (k=60) in single SQL query	Dense + sparse in single query, alpha-weighted fusion
Reranking	ms-marco-MiniLM-L-6-v2 (local Docker)	bge-reranker-v2-m3 (Pinecone managed) + local fallback
Embeddings	all-MiniLM-L6-v2 (384d, local)	multilingual-e5-large (1024d, Pinecone Inference API)
Multi-tenancy	`tenant_id` column + WHERE clause	Namespace isolation (physically separated)
Fusion	RRF in Python	Built-in (single API call)

Confidence-driven response routing:

>= 0.85: Answer directly
0.60–0.84: Hedge with caveats
0.40–0.59: Cite sources explicitly
< 0.40: Decline to answer

5-layer confidence pipeline: domain check → grounding → claim verification → weighted confidence score → compliance (FCRA, HIPAA, GDPR, FERPA per-tenant config)

Dual reranker system: Cross-encoder + ColBERT late-interaction, RRF fusion of both, 3-level fallback (both → one → original). Query agent with rewriting and sub-query decomposition.

Evaluation: Correction rate decay (primary metric), MRR, NDCG@k, confidence calibration (ECE), A/B comparison (with RAG vs without)

Stack: pgvector · Pinecone Serverless · Pinecone Inference API · PostgreSQL 16 · FastAPI · MCP Protocol · Docker

🧠 What I Work On

Multi-Agent Systems

Pydantic AI for type-safe agent orchestration with structured outputs and dependency injection
LangGraph + StateGraph for workflow orchestration and checkpointing
Raw Anthropic SDK tool-use for maximum control when frameworks add too much abstraction
MCP Protocol servers with FastMCP, discovery endpoints, multi-server catalogs
A2A v0.3 agent cards for agent-to-agent communication
Google ADK and Agno for additional agent patterns
Temporal for durable execution with retry logic
Composio for 500+ tool integrations

Observability & Streaming

OpenTelemetry → Kafka Streams → ClickHouse full pipeline
8-processor OTel collector with tail-based sampling and attribute enrichment
Kafka Streams dual-path topology: real-time aggregation + raw archival
ClickHouse MergeTree with tiered storage and automatic rollup
SLO framework: latency P99, availability, throughput, error rate with burn-rate alerts
T-Digest, HyperLogLog, Welford's for streaming percentiles/cardinality/variance

MLOps & Training

TensorFlow/Keras + PyTorch + sklearn training pipelines
MLflow experiment tracking and model versioning
Vertex AI + SageMaker deployment with auto-scaling
LoRA/QLoRA fine-tuning, DistilBERT for classification tasks
Terraform for reproducible GCP/AWS infrastructure

🛠️ Tech Stack

Agent Systems: Pydantic AI · LangGraph · MCP · A2A · Anthropic SDK · Google ADK · Agno · Composio · Temporal · n8n

RAG & Search: pgvector · Pinecone · Qdrant · ChromaDB · HNSW · BM25 · RRF · Cross-Encoder Reranking · ColBERT · Hybrid Search · Recursive Chunking · Confidence Routing · Injection Defense

ML/AI: PyTorch · TensorFlow · HuggingFace · BERT/DistilBERT · LoRA/QLoRA · SageMaker · Vertex AI · MLflow · AWS Bedrock · LLM-as-Judge

Observability: OpenTelemetry · Kafka Streams · ClickHouse · T-Digest · HyperLogLog · SLO Frameworks

Infrastructure: Python · TypeScript · FastAPI · Next.js · React · PostgreSQL · Redis · Docker · Kubernetes · Terraform · AWS · GCP · gRPC · GitHub Actions

📫 Connect

💼 LinkedIn
✉️ Email

Provide feedback

Saved searches

Use saved searches to filter your results more quickly