An intelligent terminal-based agent that allows you to query an image using natural language. The system utilizes LLaMA-3.2 Vision Preview to understand and respond to questions about the content of an uploaded image.
- 📷 Accepts any local image file
- 🧠 Uses LLaMA-3.2 90B Vision model via Groq for multimodal reasoning
- 🗣️ Interactive CLI interface with feedback loop
- 🔁 Follow-up questions and clarification supported
- Python 3.x
- Agno Framework
- Groq API (LLaMA-3 Vision)
- Base64 encoding for image handling
- Clone the repo
git clone https://github.com/yourusername/image-llama-agent.git
cd image-llama-agent- Create and activate virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Create .env file with your API keys
# .env.example
GROQ_API_KEY=your_groq_key_here
python image_analysis_llama_agent.py- Provide the path to your image
- Ask any visual question about the image
- Interact with the LLM and refine queries if needed
- “What is shown in this image?”
- “Is this image from a city or a rural area?”
- “What animals or objects can you detect?”
MIT License.
- Groq for LLaMA Vision
- Agno Agent Framework