Skip to content

PaweekornS/DIP_MT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DIP Project: English-to-Thai Machine Translation system for WIPO Nice Classification Products and Services

📌 Project Overview

This project focuses on English → Thai machine translation (En2Th) with domain-specific emphasis on product and service categories defined by the WIPO Nice Classification system.
The main goal is to evaluate and improve the translation quality for category-specific terminology, ensuring translations are accurate, consistent, and context-aware for intellectual property and business use.

🎯 Objectives

  • Benchmark multiple Large Language Models (LLMs) on En2Th and Th2En translation tasks.
  • Explore fine-tuning and Retrieval-Augmented Generation (RAG) methods for domain adaptation.
  • Provide insights into translation quality across different WIPO Nice product categories.

⚙️ Methodology

  1. Data Preparation

    • Source: WIPO Nice classification dataset (product & service categories).
    • Preprocessed into English–Thai parallel pairs.
  2. Approaches

    • Baseline Translation: transformer-based (NLLB-200-3.3B) with LoRA fine-tuning
    • Fine-Tuning: Domain-specific fine-tuning on product-category text.
    • RAG (Retrieval-Augmented Generation): Incorporating WIPO classification documents as external knowledge for context-aware translations.

🛠️ Tech Stack

  • Fine-tuning: unsloth
  • RAG: faiss, langchain
  • Inference: vllm
  • Evaluation: pythainlp, nltk, jiwer

About

internship project - SuperAI Engineer Season 5

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors