This project implements and compares multiple transformer architectures for:
- Text → Gloss Translation: Convert English text to ASL gloss notation
- Gloss → Text Translation: Convert ASL gloss notation to English text
- ✅ Baseline Transformer: Standard seq2seq transformer (Vaswani et al., 2017)
- ✅ Modern Transformer: State-of-the-art architecture with:
- Rotary Position Embeddings (RoPE)
- Grouped-Query Attention (GQA)
- RMSNorm (Root Mean Square Normalization)
- SwiGLU Activation Functions
- ✅ Bidirectional Training: Both text→gloss and gloss→text directions
- ✅ Comprehensive Evaluation: BLEU, METEOR, chrF++, ROUGE metrics
- ✅ Modular Architecture: Easy to extend and experiment
Current performance on ASLG-PC12 dataset:
| Model | Direction | BLEU-4 | METEOR | chrF++ | ROUGE-L |
|---|---|---|---|---|---|
| Baseline | Text→Gloss | TBD | TBD | TBD | TBD |
| Baseline | Gloss→Text | TBD | TBD | TBD | TBD |
| Modern | Text→Gloss | TBD | TBD | TBD | TBD |
| Modern | Gloss→Text | TBD | TBD | TBD | TBD |
# Clone the repository
git clone https://github.com/kagozi/asl.git
cd asl
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\\Scripts\\activate
# Install dependencies
pip install -r requirements.txtTrain Baseline Model (Text → Gloss):
python experiments/train_baseline.py \\
--direction text2gloss \\
--config config/baseline_config.yaml \\
--gpu 0Train Modern Model (Gloss → Text):
python experiments/train_modern.py \\
--direction gloss2text \\
--config config/modern_config.yaml \\
--gpu 0Train Both Directions:
python experiments/train_baseline.py --direction both
python experiments/train_modern.py --direction both# Evaluate a specific model
python evaluation/evaluator.py \\
--checkpoint results/checkpoints/modern_text2gloss_best.pt \\
--test-data data/processed/test.pkl
# Compare all models
python experiments/compare_models.pytext-gloss-translation/
├── config/ # Configuration files
├── data/ # Data loading and preprocessing
├── models/ # Model architectures
├── training/ # Training utilities
├── evaluation/ # Evaluation metrics and scripts
├── utils/ # Helper functions
├── experiments/ # Training scripts
├── results/ # Model checkpoints and results
└── notebooks/ # Jupyter notebooks for analysis
Edit config/baseline_config.yaml or config/modern_config.yaml:
model:
embedding_dim: 512
num_heads: 8
num_encoder_layers: 6
num_decoder_layers: 6
dropout: 0.1
training:
batch_size: 32
epochs: 100
learning_rate: 0.0001
warmup_steps: 4000
gradient_accumulation_steps: 1Using ASLG-PC12 (American Sign Language Gloss Parallel Corpus):
- Training: 82,710 sentence pairs
- Validation: 4,000 sentence pairs
- Test: 4,145 sentence pairs
The dataset is automatically downloaded from HuggingFace:
from datasets import load_dataset
dataset = load_dataset("achrafothman/aslg_pc12")Compare standard transformer with modern improvements:
python experiments/compare_models.py --experiment architectureAnalyze performance differences between text→gloss and gloss→text:
python experiments/compare_models.py --experiment bidirectionalTest individual components:
python experiments/ablation.py --component rope # Test without RoPE
python experiments/ablation.py --component gqa # Test without GQAView training progress with tensorboard:
tensorboard --logdir results/logsOr use the built-in plotting:
python utils/visualization.py --log-dir results/logs/baseline_text2gloss- Baseline Transformer (
models/baseline_transformer.py): Standard transformer with sinusoidal positional encoding - Modern Transformer (
models/modern_transformer.py): Enhanced with RoPE, GQA, RMSNorm, SwiGLU
- Warmup Scheduler (
training/scheduler.py): Noam learning rate schedule - Label Smoothing (
training/loss.py): Regularization technique - Mixed Precision (
training/trainer.py): Faster training with AMP
- BLEU Score: Standard MT metric
- METEOR: Semantic similarity metric
- chrF++: Character-level F-score
- ROUGE: Recall-oriented metric
If you use this code in your research, please cite:
@thesis{kagozi2025textgloss,
title={Modern Transformer Architectures for Bidirectional Text-Gloss Translation},
author={Alex Kagozi},
year={2025},
school={University of South Dakota}
}MIT License - see LICENSE file for details
- Multi-dataset training (PHOENIX-2014T, CSL-Daily)
- Data augmentation (back-translation, paraphrasing)
- Attention visualization
- Human evaluation study
- Real-time inference API