PaliGemma: A versatile 3B VLM for transfer PyTorch implementation based on Umar Jamil's Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation.
To set up and run this project:
- Create a new environment with the provided
requirements.txtfile:virtualenv venv source venv/bin/activate pip3 install -r requirements.txt - Run inference:
bash launch_inference.sh- Implement a
Streamlit/Gradiointerface for interacting with the model - Fix an issue with generating tokens for model
- Add
requirementsfile
