유미 LimPark996

About Me

name: Yumi Park
location: Seoul, Korea
education: M.S. Statistics, Dongguk University
focus: RAG Systems · Video Retrieval · Data Pipelines

background:
  - Designed and built RAG systems for enterprise document search
  - Built Video RAG prototype for gov't R&D project (1st round pass)
  - ML modeling at EdTech company (Woongjin ThinkBig AI Labs)
  - Taught generative AI to non-developers at Samsung C&T AI Academy
  - Wrote technical deep-dives on VQGAN, Transformer, BERT source code (byumm315.tistory.com)

Featured Projects

VideoRAG — AI-Powered Video Retrieval & Synthesis

Hybrid retrieval pipeline combining BM25 + InternVideo2 + ColBERT. Selected for government R&D funding (1st round pass).

InternVideo2 FAISS ColBERT BM25 Runway DINOv2 C2PA Gradio

Indexed 7,010 MSR-VTT videos with hybrid search: BM25 · Dense · WRRF fusion · ColBERT · ITM reranking
MSR-VTT 1k-A benchmark: ITC dense alone R@1 3.5% → R@1 44.4% after full ITM (paper: 51.9%; −10.6%p gap attributed to unresolved ITC collapse)
Scene Graph → 2-path routing (USE_AS_IS / TRANSFORM) PD workstation
DINOv2 transition scoring + DreamColour 3D LUT color grading + C2PA ES256 provenance signing

Construction Law RAG Chatbot

AI chatbot over 9 construction regulation PDFs · Samsung C&T AI Academy project · Team lead

Python FAISS BM25 bge-reranker GPT-4o-mini Streamlit

FAISS + BM25 hybrid retrieval with bge-reranker for precision improvement
7-type query classification via GPT-4o-mini with per-type response strategy branching
Designed step-by-step implementation notebooks for non-developer teammates

Metal Defect Synthesis

Image synthesis PoC to address manufacturing defect data scarcity

VQGAN MaskGIT PyTorch HuggingFace Gradio

v1 (taming VQGAN + custom MaskGIT) → v2 (LlamaGen VQGAN + Halton-MaskGIT): architecture migration driven by codebook incompatibility — Halton-MaskGIT (ICLR 2025) is built for LlamaGen VQGAN and cannot be combined with taming VQGAN; pretrained weights available only for LlamaGen made the switch the practical choice
Merged 3 datasets including NEU-DET; 2,659 images → 8× augmentation
VQGAN fine-tuning: Edge IoU +10.6%, PSNR +3.1%, SSIM +0.73%
MaskGIT training loss 6.77 (target ~4.0) → convergence failure — structural limitation of 69M-param model on 21K data
Deployed Gradio inpainting demo on HuggingFace Spaces

Term Search System

Hybrid search engine for construction standard terminology

Python FAISS ColBERT OpenAI API

Weighted fusion: OpenAI Embedding semantic similarity (60%) + ColBERT token-level MaxSim (40%)
Achieved accurate standard term retrieval from colloquial queries
Circuit Breaker + Rate Limiting for API failure resilience

Ticketmon — Concert Ticketing Platform

High-concurrency ticketing system · Bootcamp final project · Team lead

Spring Boot Redis MySQL React Docker AWS

Designed and implemented AI review summarization feature (Together AI → OpenAI migration)
Refactored seat management from section-based to grade-based architecture
Implemented dynamic seat layout rendering based on venue scale (small / medium / large)

Work Experience

Period	Role	Key Work
2025.08–11	AI Instructor, Samsung C&T AI Academy (Elice)	Generative AI curriculum & RAG chatbot training for construction industry professionals
2023.09–2024.12	Research Team, Woongjin ThinkBig AI Labs	CatBoost difficulty prediction model for exam items (R²=0.57), ALP system reverse analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

유미 LimPark996

Achievements

Achievements

Block or report LimPark996

About Me

Featured Projects

VideoRAG — AI-Powered Video Retrieval & Synthesis

Construction Law RAG Chatbot

Metal Defect Synthesis

Term Search System

Ticketmon — Concert Ticketing Platform

Work Experience

Skills

AI / RAG / Search

ML / Data

Backend / Infra

Connect

Pinned Loading

Uh oh!