Skip to content
View trishthakur's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report trishthakur

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
trishthakur/README.md

Hi, I'm Trishala Thakur

Portfolio LinkedIn Medium Email

About

Data Scientist with 3+ years building production ML systems. I architect AI pipelines that process terabyte-scale datasets, orchestrate multi-agent workflows, and deploy models that researchers and engineers actually use.

What I Build

Multi-agent systems orchestrating LLMs for document classification, entity extraction, and semantic search

Computer vision pipelines for page layout analysis, image segmentation, and fine-grained classification

Predictive models integrated into mobile apps and production environments

RAG architectures that retrieve, reason, and generate insights from massive text corpora

MLOps frameworks for remote fine-tuning, model deployment, and API orchestration

Current

Machine Learning Data Scientist @MiiE Lab, University of Chicago

  • Building multi-agent pipelines (Gemini 2.0) for 1TB+ corpus processing
  • Automating face-name matching with orchestrated LLM workflows
  • Extracting structured insights from 1.2M research papers
  • Designing CNNs for document layout classification
  • AI agents | PyTorch | HuggingFace | GCP | Python | SQL | LLM | LMM | CV

Tech Stack

Core
Python R SQL Linux

ML & DL
PyTorch TensorFlow scikit-learn XGBoost

AI & Specialized
HuggingFace OpenAI OpenCV LangChain

Infrastructure
GCP Docker Git APIs

Projects

Architected 3-agent system with DeepSeek-V3 for product search and support. Built hybrid retrieval tools with semantic search, end-to-end LLM-database integration, and Streamlit UI.

Stack: AI Agents, Agent Orchestration, DeepSeek-V3, Semantic Search, Streamlit

Multimodal search application using CLIP-ViT-B/32 for embedding-based similarity matching across images and text. Deployed on GCP with FAISS vector storage.

Stack: CLIP, FAISS, GCP, Streamlit

RAG-based system generating role-specific questions from curated job descriptions. Chain-of-Thought prompting evaluates responses and delivers personalized feedback.

Stack: RAG, Chain-of-Thought, LLMs, Streamlit

Cloud platform with Streamlit UI for real-time model tuning on GPU VMs. Models deployed via API for immediate downstream integration.

Stack: Cloud VMs, GPU, Flask API, Streamlit

Professional Timeline

ML Data Scientist @ University of Chicago (June 2025 - Present)

  • Built agent-orchestrated pipeline (Gemini 2.0 Flash) automating face-name matching at 92% F1
  • Leveraged few-shot prompting for page layout classification across 1TB+ yearbook corpus
  • Prompted GPT-5-nano to extract insights from 1.2M papers, saving 12K review hours

Data Scientist @ Schneider Electric (Aug 2021 - Jul 2023)

  • Built LightGBM classifier predicting equipment failures 5 days in advance at 87% precision
  • Integrated predictive maintenance model into mobile app used by 10+ field engineers
  • Engineered features from high-frequency sensor data and built SQL pipeline for scalable training

Graduate Research Assistant @ CIRES (May 2024 - Aug 2024)

  • Designed Siamese CNN architecture on satellite data for regional drought forecasting
  • Identified drought signals by linking anomalies in monthly averages to observed events

Graduate Student Assistant @ CIRES (Jan 2024 - April 2024)

  • Trained four XGBoost models on DistilBERT embeddings, achieving up to 78% F1 across climate change text classification tasks
  • Deployed models as Flask RESTful API integrated in Label Studio, automating annotation and saving 2 hours/week in manual effort

Impact

┌─────────────────────────────────────────────────────────────┐
│  12K+ review hours saved     │  1TB+ data processed         │
│  6 industrial assets         │  1.2M papers analyzed        │
│  100+ researchers enabled    │  10+ engineers using my ML   │
└─────────────────────────────────────────────────────────────┘

Writing


Chicago, IL • Open to relocation • trrshla@gmail.com

Pinned Loading

  1. partSelectBot partSelectBot Public

    Multi - Agent Orchestrated Chatbot assistant for e-commerce website

    Jupyter Notebook

  2. Fact-Verification-of-LLM-Outputs Fact-Verification-of-LLM-Outputs Public

    This project focuses on fact verification of outputs generated by Google’s Gemma language models. The goal is to automatically check whether statements produced by the model are factually accurate …

    Jupyter Notebook 1

  3. mobilenet-finetuning-local mobilenet-finetuning-local Public

    A scalable GCP framework for fine-tuning models with a Streamlit UI for hyperparameter tuning, GPU-powered training, and RESTful API integration for model deployment in pipelines (note: the repo sh…

    Python

  4. MoreThanAChatBot MoreThanAChatBot Public

    Forked from ahuang11/MoreThanAChatBot

    Resources about using LLMs beyond a chatbot.

  5. Mock-Interview-Agent Mock-Interview-Agent Public

    An AI-powered interview practice system that generates role-specific questions and provides personalized feedback using Retrieval-Augmented Generation (RAG) and Chain-of-Thought evaluation.

    Python 1

  6. lost-and-found-ai lost-and-found-ai Public

    A multimodal search application that helps users find lost items using AI-powered image and text matching. Built with OpenAI's CLIP (Contrastive Language-Image Pre-training) model for semantic simi…

    Python