Data Scientist with 3+ years building production ML systems. I architect AI pipelines that process terabyte-scale datasets, orchestrate multi-agent workflows, and deploy models that researchers and engineers actually use.
Multi-agent systems orchestrating LLMs for document classification, entity extraction, and semantic search
Computer vision pipelines for page layout analysis, image segmentation, and fine-grained classification
Predictive models integrated into mobile apps and production environments
RAG architectures that retrieve, reason, and generate insights from massive text corpora
MLOps frameworks for remote fine-tuning, model deployment, and API orchestration
Machine Learning Data Scientist @MiiE Lab, University of Chicago
- Building multi-agent pipelines (Gemini 2.0) for 1TB+ corpus processing
- Automating face-name matching with orchestrated LLM workflows
- Extracting structured insights from 1.2M research papers
- Designing CNNs for document layout classification
- AI agents | PyTorch | HuggingFace | GCP | Python | SQL | LLM | LMM | CV
|
Architected 3-agent system with DeepSeek-V3 for product search and support. Built hybrid retrieval tools with semantic search, end-to-end LLM-database integration, and Streamlit UI. Stack: AI Agents, Agent Orchestration, DeepSeek-V3, Semantic Search, Streamlit |
Multimodal search application using CLIP-ViT-B/32 for embedding-based similarity matching across images and text. Deployed on GCP with FAISS vector storage. Stack: CLIP, FAISS, GCP, Streamlit |
|
RAG-based system generating role-specific questions from curated job descriptions. Chain-of-Thought prompting evaluates responses and delivers personalized feedback. Stack: RAG, Chain-of-Thought, LLMs, Streamlit |
Cloud platform with Streamlit UI for real-time model tuning on GPU VMs. Models deployed via API for immediate downstream integration. Stack: Cloud VMs, GPU, Flask API, Streamlit |
ML Data Scientist @ University of Chicago (June 2025 - Present)
- Built agent-orchestrated pipeline (Gemini 2.0 Flash) automating face-name matching at 92% F1
- Leveraged few-shot prompting for page layout classification across 1TB+ yearbook corpus
- Prompted GPT-5-nano to extract insights from 1.2M papers, saving 12K review hours
Data Scientist @ Schneider Electric (Aug 2021 - Jul 2023)
- Built LightGBM classifier predicting equipment failures 5 days in advance at 87% precision
- Integrated predictive maintenance model into mobile app used by 10+ field engineers
- Engineered features from high-frequency sensor data and built SQL pipeline for scalable training
Graduate Research Assistant @ CIRES (May 2024 - Aug 2024)
- Designed Siamese CNN architecture on satellite data for regional drought forecasting
- Identified drought signals by linking anomalies in monthly averages to observed events
Graduate Student Assistant @ CIRES (Jan 2024 - April 2024)
- Trained four XGBoost models on DistilBERT embeddings, achieving up to 78% F1 across climate change text classification tasks
- Deployed models as Flask RESTful API integrated in Label Studio, automating annotation and saving 2 hours/week in manual effort
┌─────────────────────────────────────────────────────────────┐
│ 12K+ review hours saved │ 1TB+ data processed │
│ 6 industrial assets │ 1.2M papers analyzed │
│ 100+ researchers enabled │ 10+ engineers using my ML │
└─────────────────────────────────────────────────────────────┘
- From Scratch: MLOps Fine-Tuning System — Building production ML infrastructure
- Can ML Solve Your Problem? — Framework for determining when ML is the right solution
- MY DATA ACTION — Moving from analysis to implementation
- More than a Chatbot — Workshop at Environmental Data Science Summit 2025
- Mathematical Problems in Industry 2024 — Research on noise reduction in image processing
Chicago, IL • Open to relocation • trrshla@gmail.com