This repository is a Python-based demonstration of a lightweight and deployable chatbot. The chatbot leverages optimization techniques such as distillation, pruning, and quantization to create an efficient and performant language model suitable for deployment in resource-constrained environments.
- Distillation: A student model is trained to mimic the behavior of a larger teacher model, reducing model size while retaining performance.
- Pruning: Redundant weights in the model are removed to decrease model size and computation.
- Quantization: Model weights are converted to lower precision (e.g., INT8) for improved inference speed and reduced memory usage.
- Chatbot Functionality: Lightweight chatbot capable of generating conversational responses.
api/
├── models/ # Models and related utilities
│ ├── flan_t5_model.py # Model loading and tokenizer setup
│ ├── student_model/ # Optimized student model
│ ├── teacher_model/ # Teacher model for distillation
│
├── optimizations/ # Optimization scripts
│ ├── pruning.py # Pruning utilities
│ ├── quantization.py # Quantization utilities
│ ├── distillation.py # Distillation training loop
│
├── scripts/ # Utility scripts for running tasks
│ ├── run_distillation.py # Script for running distillation
│ ├── evaluate_model.py # Script for evaluating model performance
│ ├── prepare_model.py # Script for preparing models for deployment
│
├── services/ # Chatbot service
│ ├── chatbot_service.py # Core chatbot functionality
│
├── requirements.txt # Python dependencies
├── README.md # Project overview and instructions
├── run.py # Run Script
└── .gitignore # Git ignore rules
- Python 3.8+
- TensorFlow
- PyTorch
- Hugging Face Transformers
- TensorFlow Model Optimization Toolkit
- Datasets
Install all required dependencies using:
pip install -r requirements.txtDistillation trains a student model to mimic the teacher model.
python api/scripts/run_distillation.pyThe trained student model will be saved to api/models/student_model/.
Evaluate the model's performance using the ROUGE metric on a subset of the validation dataset:
python api/scripts/evaluate_distillation.pyApply pruning to the model to reduce its size:
python api/scripts/prepare_model.pyQuantize the pruned model for deployment:
python api/scripts/prepare_model.pyThe quantized model will be saved as a .tflite file for lightweight deployment.
Run the chatbot service to interact with the optimized language model:
from api.services.chatbot_service import generate_response
# Example usage
conversation_history = []
response = generate_response("Hello, how are you?", conversation_history)
print(response)- Teacher Model:
google/flan-t5-base - Student Model:
google/flan-t5-small - Trained using a subset of the CNN/DailyMail dataset.
Prunes dense and convolutional layers using TensorFlow's Polynomial Decay schedule.
Quantizes the model to INT8 precision using TensorFlow Lite for fast inference.
Contributions are welcome! Feel free to submit issues or pull requests.
This project is licensed under the MIT License. See the LICENSE file for details.