Simple Python wrapper around Ollama's /api/generate endpoint.
- Ollama running locally at
http://localhost:11434 - Python 3.8+
OLLAMA_HOSTdefaults tohttp://localhost:11434inollama_run.py.- If you run Ollama on another machine or inside a container, update this value.
- Make sure Ollama has "Expose Ollama on network" enabled if you're accessing it remotely.
MODELis set tonous-hermes2:latestby default. Update it to any installed model.
To list available models:
curl -i http://localhost:11434/api/tagsTo add/pull a model:
ollama pull llama3(Replace llama3 with the model you want.)
Regular (non-streaming) output:
python ollama_run.py "hello"Streaming output as the model responds:
python ollama_run.py --stream "explain docker like I'm 5 years old"The script supports two modes:
- Regular mode: waits for the full response, then prints it.
- Stream mode: prints tokens as they arrive.