I work at the intersection of applied mathematics, machine learning, and distributed systems.
I am less interested in training another model and more interested in making intelligent systems behave correctly under real constraints—latency, partial data, failure modes, scale, and ambiguity.
My background in Mathematics & Scientific Computing shapes how I approach AI: I think in terms of assumptions, trade-offs, stability, and convergence, not just APIs.
Most of my work focuses on the last mile of AI systems:
- Retrieval-Augmented Generation that does not hallucinate under pressure
- Agentic workflows that degrade gracefully instead of failing silently
- Low-latency inference pipelines (< 1s) with predictable behavior
- Systems that can explain why they produced an output, not just what
I care about repeatability, observability, and failure analysis as much as accuracy.
- Linear Algebra, Optimization, Probability
- ML model behavior analysis (not blind fine-tuning)
- Async systems and real-time data pipelines
- Designing for latency, throughput, and cost
| Area | Stack |
|---|---|
| LLM & RAG | HuggingFace, vLLM, FAISS, Milvus, BGE Rerankers |
| Agentic Systems | LangChain, LlamaIndex, custom orchestration |
| Backend | Python, FastAPI (async), WebSockets |
| Streaming / Infra | Kafka, Docker, Kubernetes |
| ML / Math | PyTorch, TensorFlow, NumPy, SciPy |
I choose tools based on constraints, not trends.
Most recent production work is proprietary. Below is the nature of problems I solve.
Context: Streaming market data, news, and macro signals
Problem: Generate reasoning-aware outputs in sub-second latency
Approach:
- Hybrid RAG combining historical embeddings with live streams
- Task-specialized models instead of one large general model
- Aggressive caching + async pipelines
Outcome: ~85% latency reduction with more stable outputs
Context: Enterprise automation under noisy, changing environments
Problem: Traditional RPA breaks when UIs or workflows change
Approach:
- Vision-Language models to interpret screens semantically
- Speech pipelines with Whisper + VAD + diarization
- Robust handling of partial or overlapping inputs
Outcome: Systems that adapt instead of failing on small changes
I document the hard parts.
-
Applied Soft Computing (Under Review):
Co-author of a Dynamic Adaptive Large Neighborhood Search (DALNS) algorithm for probabilistic multi-objective routing problems. -
Technical Writing:
Deep dives on topic modeling (BERTopic), object detection (YOLO), and applied ML systems.
I believe good research should survive contact with production.
This profile contains:
- Research experiments
- System prototypes
- Exploratory implementations
- Iterative work (not just polished demos)
Some repositories are intentionally raw—they show how an idea evolved.
If you’re looking for flashy demos, this may disappoint.
If you care about engineering judgment, you’ll feel at home.
