gemma-3n

A Python app that listens to your voice, transcribes it, analyzes an image, and returns a caption or answer using Google's Gemma 3n multimodal model.

Features

Speech Recognition: Captures and transcribes spoken input using your microphone.
Image Analysis: Loads and processes an image for context.
Multimodal Reasoning: Uses the Gemma 3n E4B model to generate answers or captions based on both text and image input.

Requirements

Python 3.8+
transformers >= 4.53.1
SpeechRecognition >= 3.14.3
pillow >= 11.3.0

Install dependencies:

pip install -r requirements.txt

Usage

Place an image file named sample.jpg in the project directory (or modify the code to use your own image path).
Run the app:
```
python main.py
```
Speak into your microphone when prompted. The app will transcribe your speech, analyze the image, and generate a response using Gemma 3n.

Example

🎙️ Speak now...
📝 Transcribed: What is the dog doing on the beach?
🤖 Response: The dog is sitting on the beach, possibly enjoying the view or resting.

About Gemma 3n

This project uses the google/gemma-3n-E4B model, a state-of-the-art, open, multimodal model from Google DeepMind. Gemma 3n supports text, image, and audio input, and is optimized for efficient execution on a wide range of devices. Learn more in the official documentation.

References

Notes

The app requires a working microphone and an image file named sample.jpg in the project directory.
For best results, use clear speech and high-quality images.
The first run may take time as the model downloads weights from Hugging Face.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gemma-3n

Features

Requirements

Usage

Example

About Gemma 3n

References

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gemma-3n

Features

Requirements

Usage

Example

About Gemma 3n

References

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages