Skip to content

horde-research/kaz-mm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ‡°πŸ‡Ώ Multimodal Kazakh Research Repository

Welcome to the Multimodal Kazakh Research repo β€” a work-in-progress collection of tools, scripts, and models aimed at advancing research in multimodal learning (mostly vision) for the Kazakh language.


πŸ“ Structure

.
β”œβ”€β”€ projects/
β”‚   └── horde-vision/
β”‚       β”œβ”€β”€ benchmark/       # Benchmarking apps and model request scripts
β”‚       β”œβ”€β”€ evaluate/        # Evaluation and metrics calculation
β”‚       β”œβ”€β”€ scripts/         # Data processing and utility scripts
β”‚       └── train_scripts/   # Training (SFT, RL) and inference scripts
└── results/                 # Evaluation results and visualizations

🧠 Goals

  • πŸ–ΌοΈ Build and curate high-quality Kazakh-language multimodal datasets
  • πŸ€– Train and evaluate multimodal models on Kazakh content
  • πŸ§ͺ Support downstream tasks like retrieval, captioning, VQA

πŸ“Š Results

Horde Vision Model Performance Summary
Model caption vqa ocr reason instruct_follow Avg Rank
horde-vision 83.5 (↑12.3%) 68.1 (↑5.3%) 64.7 (↑2.6%) 77.4 (↑5.7%) 70.5 (↑5.9%) #1
Qolda 75.2 (↑8.7%) 61.7 (↑3.0%) 60.6 (↑2.0%) 70.3 (↑2.9%) 62.2 (↑2.8%) #2
Qwen3-VL-8B-Instruct 41.3 (↑0.5%) 53.6 (↑1.1%) 59.3 (↑2.1%) 55.5 (↑0.7%) 49.5 (↑0.9%) #3
gemma-3-4b-it 42.0 (↑0.1%) 41.8 (↑0.4%) 50.3 (↑2.3%) 53.0 (↑0.6%) 42.5 (↑0.5%) #4
Qwen2.5-VL-7B-Instruct 35.4 (↑0.0%) 41.6 (↑0.4%) 51.0 (↑0.9%) 44.6 (↑0.3%) 37.7 (↑0.3%) #5
Llama-3.2-11B-Vision 36.2 (↑0.1%) 38.0 (↑0.3%) 15.0 (↑0.1%) 43.4 (↑0.3%) 36.4 (↑0.3%) #6
InternVL3-8B 26.1 (↑0.6%) 29.0 (↑0.0%) 29.1 (↑0.3%) 27.3 (↑0.0%) 25.7 (↑0.0%) #7

Overall Radar Chart Task Comparison Bar


About

πŸ‡°πŸ‡Ώ Multimodal track

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors