feat: Add native ollama_generate endpoint type support #448

ajcasagrande · 2025-11-06T17:54:32Z

Summary by CodeRabbit

New Features
- Added support for Ollama Generate endpoint with streaming and non-streaming generation modes.
- Includes support for system prompts, JSON formatting, raw mode, vision inputs, and model-specific options.
Documentation
- Added comprehensive guide for Ollama Generate endpoint with usage scenarios, configuration options, example workflows, and troubleshooting tips.

github-actions · 2025-11-06T17:54:43Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ajc/ollama

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ajc/ollama

codecov · 2025-11-06T17:56:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

coderabbitai · 2025-11-06T18:01:11Z

Walkthrough

Adds a new Ollama Generate Endpoint implementation to the aiperf framework. Includes endpoint class with payload formatting and response parsing logic, enum registration, package exports, comprehensive test coverage, and documentation of configuration options and usage examples.

Changes

Cohort / File(s)	Summary
Documentation `docs/tutorials/ollama-endpoint.md`	New documentation page covering Ollama Generate Endpoint usage, configuration options, streaming/non-streaming workflows, model options, JSON formatting, system prompts, and troubleshooting guidance.
Enum & Package Registration `src/aiperf/common/enums/plugin_enums.py`	Added `OLLAMA_GENERATE = "ollama_generate"` enum member to `EndpointType` class to register the new endpoint type.
Package Exports `src/aiperf/endpoints/__init__.py`	Imported and exported `OllamaGenerateEndpoint` in the endpoints package's public API via `__all__` list.
Core Implementation `src/aiperf/endpoints/ollama_generate.py`	Implemented `OllamaGenerateEndpoint` class extending `BaseEndpoint` with metadata classmethod, `format_payload()` for Ollama API payload construction (handling streaming, max_tokens, system prompts, options merging), and `parse_response()` for extracting text and computing token usage metrics.
Test Suite `tests/endpoints/test_ollama_generate.py`	Added comprehensive test coverage for metadata, payload formatting across multiple scenarios (streaming toggles, system prompts, images, options handling), response parsing (completion detection, token counting), and edge cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

src/aiperf/endpoints/ollama_generate.py requires careful review of the format_payload() method's option merging logic and parse_response()'s token counting and defensive checks for missing fields.
Verify that streaming vs. non-streaming flag handling correctly maps request configuration to Ollama API expectations.
Confirm max_tokens to options.num_predict mapping and potential boundary conditions.
Validate that the test suite's edge cases align with actual Ollama API behavior and error conditions.

Poem

🐰 A new endpoint hops into place,
Ollama's generate finds its space,
With streaming and tokens in flight,
Tests ensure everything's right,
Documentation lights up the trace! 🌟

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.54% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main change: adding native support for the ollama_generate endpoint type across the codebase.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e5194fa and e2915b3.

📒 Files selected for processing (5)

docs/tutorials/ollama-endpoint.md (1 hunks)
src/aiperf/common/enums/plugin_enums.py (1 hunks)
src/aiperf/endpoints/__init__.py (2 hunks)
src/aiperf/endpoints/ollama_generate.py (1 hunks)
tests/endpoints/test_ollama_generate.py (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-23T03:16:02.685Z

Learnt from: ajcasagrande
Repo: ai-dynamo/aiperf PR: 389
File: src/aiperf/endpoints/openai_chat.py:41-46
Timestamp: 2025-10-23T03:16:02.685Z
Learning: In the aiperf project, the ChatEndpoint at src/aiperf/endpoints/openai_chat.py supports video inputs (supports_videos=True) through custom extensions, even though the standard OpenAI /v1/chat/completions API does not natively support raw video inputs.

Applied to files:

src/aiperf/endpoints/__init__.py

🧬 Code graph analysis (3)

src/aiperf/endpoints/__init__.py (1)

src/aiperf/endpoints/ollama_generate.py (1)

OllamaGenerateEndpoint (20-118)

src/aiperf/endpoints/ollama_generate.py (6)

src/aiperf/common/enums/plugin_enums.py (1)

EndpointType (19-35)

src/aiperf/common/factories.py (1)

EndpointFactory (477-495)

src/aiperf/common/models/record_models.py (2)

ParsedResponse (628-661)

RequestInfo (755-808)

src/aiperf/common/models/metadata.py (1)

EndpointMetadata (11-45)

src/aiperf/common/protocols.py (2)

EndpointProtocol (347-358)

InferenceServerResponse (362-403)

src/aiperf/endpoints/base_endpoint.py (2)

BaseEndpoint (31-257)

make_text_response_data (88-90)

tests/endpoints/test_ollama_generate.py (7)

src/aiperf/common/models/record_models.py (2)

ParsedResponse (628-661)

RequestInfo (755-808)

src/aiperf/common/models/metadata.py (1)

EndpointMetadata (11-45)

src/aiperf/common/models/model_endpoint_info.py (4)

EndpointInfo (57-114)

ModelEndpointInfo (117-150)

ModelInfo (19-30)

ModelListInfo (33-54)

src/aiperf/common/models/dataset_models.py (1)

Turn (48-78)

src/aiperf/common/protocols.py (1)

InferenceServerResponse (362-403)

src/aiperf/endpoints/base_endpoint.py (1)

make_text_response_data (88-90)

src/aiperf/common/models/usage_models.py (2)

prompt_tokens (37-39)

completion_tokens (42-44)

🪛 markdownlint-cli2 (0.18.1)

docs/tutorials/ollama-endpoint.md

214-214: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

218-218: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

222-222: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

226-226: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 Ruff (0.14.3)

src/aiperf/endpoints/ollama_generate.py

49-49: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: build (macos-latest, 3.10)
GitHub Check: build (macos-latest, 3.11)
GitHub Check: build (macos-latest, 3.12)
GitHub Check: build (ubuntu-latest, 3.10)
GitHub Check: integration-tests (ubuntu-latest, 3.12)
GitHub Check: integration-tests (macos-latest, 3.12)
GitHub Check: integration-tests (macos-latest, 3.10)
GitHub Check: integration-tests (macos-latest, 3.11)
GitHub Check: integration-tests (ubuntu-latest, 3.11)
GitHub Check: integration-tests (ubuntu-latest, 3.10)

🔇 Additional comments (3)

src/aiperf/common/enums/plugin_enums.py (1)

33-33: Enum registration aligns with new endpoint

Adding OLLAMA_GENERATE keeps the factory enum in sync with the new endpoint implementation. Looks good.

src/aiperf/endpoints/__init__.py (1)

22-52: Public API update looks correct

Importing and exporting OllamaGenerateEndpoint keeps the package barrel consistent with the new endpoint. Thanks for wiring it up.

docs/tutorials/ollama-endpoint.md (1)

1-233: Great coverage of Ollama-specific knobs

The tutorial does a nice job documenting how to drive Ollama-specific options (system prompt, JSON mode, keep_alive, streaming) from --extra-inputs. This should help users get productive quickly.

coderabbitai · 2025-11-06T18:01:14Z

src/aiperf/endpoints/ollama_generate.py

+        text = json_obj.get("response")
+        if not text:
+            self.debug(lambda: f"No 'response' field in Ollama response: {json_obj}")
+            return None


⚠️ Potential issue | 🔴 Critical

Don't drop streaming completion chunks

For streaming /api/generate, the final chunk arrives with "response": "" but still carries prompt_eval_count, eval_count, and other usage metrics.(ollama.readthedocs.io) Because if not text: treats the empty string as missing, the final chunk is discarded and we lose usage accounting for every streaming run. Please treat only a missing/None field as absent so that empty strings still flow through.

- text = json_obj.get("response") - if not text: + text = json_obj.get("response") + if text is None: self.debug(lambda: f"No 'response' field in Ollama response: {json_obj}") return None

make_text_response_data already returns None for empty strings, so this keeps usage while preserving the logging for truly absent fields. Adding a regression test that feeds a streaming final chunk (response="", done=True) would help ensure this path stays covered.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

text = json_obj.get("response")

if not text:

self.debug(lambda: f"No 'response' field in Ollama response: {json_obj}")

return None

text = json_obj.get("response")

if text is None:

self.debug(lambda: f"No 'response' field in Ollama response: {json_obj}")

return None

🤖 Prompt for AI Agents

In src/aiperf/endpoints/ollama_generate.py around lines 98 to 101, the code treats an empty string response as missing (using a falsy check) and drops the final streaming chunk that contains usage metrics; change the check to only treat a truly missing/None field as absent (e.g., use json_obj.get("response") is None or explicit key presence check) so empty string responses still pass through, keep the existing make_text_response_data behavior which will return None for empty strings, and add a regression test that feeds a streaming final chunk with response="" and done=True to ensure usage metrics are preserved.

ajcasagrande · 2025-11-07T15:27:48Z

For streaming, support will need to be added for application/x-ndjson

feat: Add native ollama_generate endpoint type support

e2915b3

github-actions bot added the feat label Nov 6, 2025

coderabbitai bot reviewed Nov 6, 2025

View reviewed changes

ajcasagrande marked this pull request as draft November 7, 2025 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add native ollama_generate endpoint type support #448

feat: Add native ollama_generate endpoint type support #448

Uh oh!

ajcasagrande commented Nov 6, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Nov 6, 2025

Uh oh!

codecov bot commented Nov 6, 2025

Uh oh!

coderabbitai bot commented Nov 6, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 6, 2025

Uh oh!

ajcasagrande commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add native ollama_generate endpoint type support #448

Are you sure you want to change the base?

feat: Add native ollama_generate endpoint type support #448

Uh oh!

Conversation

ajcasagrande commented Nov 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Nov 6, 2025

Try out this PR

Uh oh!

codecov bot commented Nov 6, 2025

Codecov Report

Uh oh!

coderabbitai bot commented Nov 6, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

ajcasagrande commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ajcasagrande commented Nov 6, 2025 •

edited by coderabbitai bot

Loading