-
Notifications
You must be signed in to change notification settings - Fork 9
feat: Add native ollama_generate endpoint type support #448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ajc/ollamaRecommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ajc/ollama |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
WalkthroughAdds a new Ollama Generate Endpoint implementation to the aiperf framework. Includes endpoint class with payload formatting and response parsing logic, enum registration, package exports, comprehensive test coverage, and documentation of configuration options and usage examples. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–25 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/tutorials/ollama-endpoint.md(1 hunks)src/aiperf/common/enums/plugin_enums.py(1 hunks)src/aiperf/endpoints/__init__.py(2 hunks)src/aiperf/endpoints/ollama_generate.py(1 hunks)tests/endpoints/test_ollama_generate.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-23T03:16:02.685Z
Learnt from: ajcasagrande
Repo: ai-dynamo/aiperf PR: 389
File: src/aiperf/endpoints/openai_chat.py:41-46
Timestamp: 2025-10-23T03:16:02.685Z
Learning: In the aiperf project, the ChatEndpoint at src/aiperf/endpoints/openai_chat.py supports video inputs (supports_videos=True) through custom extensions, even though the standard OpenAI /v1/chat/completions API does not natively support raw video inputs.
Applied to files:
src/aiperf/endpoints/__init__.py
🧬 Code graph analysis (3)
src/aiperf/endpoints/__init__.py (1)
src/aiperf/endpoints/ollama_generate.py (1)
OllamaGenerateEndpoint(20-118)
src/aiperf/endpoints/ollama_generate.py (6)
src/aiperf/common/enums/plugin_enums.py (1)
EndpointType(19-35)src/aiperf/common/factories.py (1)
EndpointFactory(477-495)src/aiperf/common/models/record_models.py (2)
ParsedResponse(628-661)RequestInfo(755-808)src/aiperf/common/models/metadata.py (1)
EndpointMetadata(11-45)src/aiperf/common/protocols.py (2)
EndpointProtocol(347-358)InferenceServerResponse(362-403)src/aiperf/endpoints/base_endpoint.py (2)
BaseEndpoint(31-257)make_text_response_data(88-90)
tests/endpoints/test_ollama_generate.py (7)
src/aiperf/common/models/record_models.py (2)
ParsedResponse(628-661)RequestInfo(755-808)src/aiperf/common/models/metadata.py (1)
EndpointMetadata(11-45)src/aiperf/common/models/model_endpoint_info.py (4)
EndpointInfo(57-114)ModelEndpointInfo(117-150)ModelInfo(19-30)ModelListInfo(33-54)src/aiperf/common/models/dataset_models.py (1)
Turn(48-78)src/aiperf/common/protocols.py (1)
InferenceServerResponse(362-403)src/aiperf/endpoints/base_endpoint.py (1)
make_text_response_data(88-90)src/aiperf/common/models/usage_models.py (2)
prompt_tokens(37-39)completion_tokens(42-44)
🪛 markdownlint-cli2 (0.18.1)
docs/tutorials/ollama-endpoint.md
214-214: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
218-218: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
222-222: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
226-226: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.14.3)
src/aiperf/endpoints/ollama_generate.py
49-49: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: build (macos-latest, 3.10)
- GitHub Check: build (macos-latest, 3.11)
- GitHub Check: build (macos-latest, 3.12)
- GitHub Check: build (ubuntu-latest, 3.10)
- GitHub Check: integration-tests (ubuntu-latest, 3.12)
- GitHub Check: integration-tests (macos-latest, 3.12)
- GitHub Check: integration-tests (macos-latest, 3.10)
- GitHub Check: integration-tests (macos-latest, 3.11)
- GitHub Check: integration-tests (ubuntu-latest, 3.11)
- GitHub Check: integration-tests (ubuntu-latest, 3.10)
🔇 Additional comments (3)
src/aiperf/common/enums/plugin_enums.py (1)
33-33: Enum registration aligns with new endpointAdding
OLLAMA_GENERATEkeeps the factory enum in sync with the new endpoint implementation. Looks good.src/aiperf/endpoints/__init__.py (1)
22-52: Public API update looks correctImporting and exporting
OllamaGenerateEndpointkeeps the package barrel consistent with the new endpoint. Thanks for wiring it up.docs/tutorials/ollama-endpoint.md (1)
1-233: Great coverage of Ollama-specific knobsThe tutorial does a nice job documenting how to drive Ollama-specific options (system prompt, JSON mode, keep_alive, streaming) from
--extra-inputs. This should help users get productive quickly.
| text = json_obj.get("response") | ||
| if not text: | ||
| self.debug(lambda: f"No 'response' field in Ollama response: {json_obj}") | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't drop streaming completion chunks
For streaming /api/generate, the final chunk arrives with "response": "" but still carries prompt_eval_count, eval_count, and other usage metrics.(ollama.readthedocs.io) Because if not text: treats the empty string as missing, the final chunk is discarded and we lose usage accounting for every streaming run. Please treat only a missing/None field as absent so that empty strings still flow through.
- text = json_obj.get("response")
- if not text:
+ text = json_obj.get("response")
+ if text is None:
self.debug(lambda: f"No 'response' field in Ollama response: {json_obj}")
return Nonemake_text_response_data already returns None for empty strings, so this keeps usage while preserving the logging for truly absent fields. Adding a regression test that feeds a streaming final chunk (response="", done=True) would help ensure this path stays covered.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| text = json_obj.get("response") | |
| if not text: | |
| self.debug(lambda: f"No 'response' field in Ollama response: {json_obj}") | |
| return None | |
| text = json_obj.get("response") | |
| if text is None: | |
| self.debug(lambda: f"No 'response' field in Ollama response: {json_obj}") | |
| return None |
🤖 Prompt for AI Agents
In src/aiperf/endpoints/ollama_generate.py around lines 98 to 101, the code
treats an empty string response as missing (using a falsy check) and drops the
final streaming chunk that contains usage metrics; change the check to only
treat a truly missing/None field as absent (e.g., use json_obj.get("response")
is None or explicit key presence check) so empty string responses still pass
through, keep the existing make_text_response_data behavior which will return
None for empty strings, and add a regression test that feeds a streaming final
chunk with response="" and done=True to ensure usage metrics are preserved.
|
For streaming, support will need to be added for |
Summary by CodeRabbit
New Features
Documentation