Trishala Thakur trishthakur

Hi, I'm Trishala Thakur

About

Data Scientist with 3+ years building production ML systems. I architect AI pipelines that process terabyte-scale datasets, orchestrate multi-agent workflows, and deploy models that researchers and engineers actually use.

What I Build

Multi-agent systems orchestrating LLMs for document classification, entity extraction, and semantic search

Computer vision pipelines for page layout analysis, image segmentation, and fine-grained classification

Predictive models integrated into mobile apps and production environments

RAG architectures that retrieve, reason, and generate insights from massive text corpora

MLOps frameworks for remote fine-tuning, model deployment, and API orchestration

Current

Machine Learning Data Scientist @MiiE Lab, University of Chicago

Building multi-agent pipelines (Gemini 2.0) for 1TB+ corpus processing
Automating face-name matching with orchestrated LLM workflows
Extracting structured insights from 1.2M research papers
Designing CNNs for document layout classification
AI agents | PyTorch | HuggingFace | GCP | Python | SQL | LLM | LMM | CV

Tech Stack

Core

ML & DL

AI & Specialized

Infrastructure

Projects

Multi-Agent E-Commerce Bot

Architected 3-agent system with DeepSeek-V3 for product search and support. Built hybrid retrieval tools with semantic search, end-to-end LLM-database integration, and Streamlit UI.

Stack: AI Agents, Agent Orchestration, DeepSeek-V3, Semantic Search, Streamlit

Lost & Found AI

Multimodal search application using CLIP-ViT-B/32 for embedding-based similarity matching across images and text. Deployed on GCP with FAISS vector storage.

Stack: CLIP, FAISS, GCP, Streamlit

Mock Interview Agent

RAG-based system generating role-specific questions from curated job descriptions. Chain-of-Thought prompting evaluates responses and delivers personalized feedback.

Stack: RAG, Chain-of-Thought, LLMs, Streamlit

Remote Fine-tuning Framework

Cloud platform with Streamlit UI for real-time model tuning on GPU VMs. Models deployed via API for immediate downstream integration.

Stack: Cloud VMs, GPU, Flask API, Streamlit

Professional Timeline

ML Data Scientist @ University of Chicago (June 2025 - Present)

Built agent-orchestrated pipeline (Gemini 2.0 Flash) automating face-name matching at 92% F1
Leveraged few-shot prompting for page layout classification across 1TB+ yearbook corpus
Prompted GPT-5-nano to extract insights from 1.2M papers, saving 12K review hours

Data Scientist @ Schneider Electric (Aug 2021 - Jul 2023)

Built LightGBM classifier predicting equipment failures 5 days in advance at 87% precision
Integrated predictive maintenance model into mobile app used by 10+ field engineers
Engineered features from high-frequency sensor data and built SQL pipeline for scalable training

Graduate Research Assistant @ CIRES (May 2024 - Aug 2024)

Designed Siamese CNN architecture on satellite data for regional drought forecasting
Identified drought signals by linking anomalies in monthly averages to observed events

Graduate Student Assistant @ CIRES (Jan 2024 - April 2024)

Trained four XGBoost models on DistilBERT embeddings, achieving up to 78% F1 across climate change text classification tasks
Deployed models as Flask RESTful API integrated in Label Studio, automating annotation and saving 2 hours/week in manual effort

Impact

┌─────────────────────────────────────────────────────────────┐
│  12K+ review hours saved     │  1TB+ data processed         │
│  6 industrial assets         │  1.2M papers analyzed        │
│  100+ researchers enabled    │  10+ engineers using my ML   │
└─────────────────────────────────────────────────────────────┘

Writing

From Scratch: MLOps Fine-Tuning System — Building production ML infrastructure
Can ML Solve Your Problem? — Framework for determining when ML is the right solution
MY DATA ACTION — Moving from analysis to implementation
More than a Chatbot — Workshop at Environmental Data Science Summit 2025
Mathematical Problems in Industry 2024 — Research on noise reduction in image processing

Chicago, IL • Open to relocation • trrshla@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly