feat: add LlamaIndex memory integration example (Python + TypeScript)#545
feat: add LlamaIndex memory integration example (Python + TypeScript)#545m1lestones wants to merge 2 commits intoplastic-labs:mainfrom
Conversation
WalkthroughAdds a LlamaIndex example demonstrating Honcho-backed persistent memory with parallel Python and TypeScript implementations, documentation, and tools to save/retrieve conversation turns and expose memory query tooling to LlamaIndex agents. Changes
Sequence DiagramsequenceDiagram
actor User
participant App as LlamaIndex App
participant Honcho as Honcho Client
participant Agent as ReAct Agent
participant LLM as OpenAI LLM
User->>App: send message
App->>Honcho: save_memory(user_id, message, "user")
Honcho-->>App: saved
App->>Honcho: get_context(ctx, tokens=2000)
Honcho-->>App: conversation history
App->>App: build system prompt + history
App->>Agent: init with system prompt + query_memory tool
Agent->>Agent: process message (may call query_memory)
Agent->>Honcho: peer.chat(query) if invoked
Honcho-->>Agent: memory results
Agent->>LLM: request with prompt + context
LLM-->>Agent: response
Agent-->>App: response
App->>Honcho: save_memory(user_id, response, "assistant")
Honcho-->>App: saved
App-->>User: return response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
examples/llamaindex/typescript/tools/client.ts (1)
19-24: Consider memoizing the Honcho client instance.
getClient()creates a new SDK client on every call. In this example flow, that happens multiple times per turn; a shared instance keeps setup overhead lower.♻️ Proposed refactor
+let cachedClient: Honcho | null = null; + export function getClient(): Honcho { + if (cachedClient) return cachedClient; const apiKey = process.env.HONCHO_API_KEY; if (!apiKey) throw new Error("HONCHO_API_KEY is required."); const workspaceId = process.env.HONCHO_WORKSPACE_ID ?? "default"; - return new Honcho({ apiKey, workspaceId }); + cachedClient = new Honcho({ apiKey, workspaceId }); + return cachedClient; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llamaindex/typescript/tools/client.ts` around lines 19 - 24, getClient() currently constructs a new Honcho SDK client on every call; change it to return a cached singleton by adding a module-level variable (e.g., let cachedClient: Honcho | null = null) and only instantiate new Honcho({ apiKey, workspaceId }) when cachedClient is null, then assign and return cachedClient; ensure you still validate HONCHO_API_KEY and HONCHO_WORKSPACE_ID as before and reference the same getClient and Honcho symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/llamaindex/python/main.py`:
- Around line 78-85: Wrap the interactive loop's external calls in error
handling so one SDK/provider exception doesn't exit the REPL: surround the call
to chat(_user_id, _user_input, _session_id) (and the subsequent print) with a
try/except, catch broad exceptions, log or print a concise error message
including exception info, and continue the loop (preserving _user_id/_session_id
and prompting again) so the session remains alive after transient failures.
In `@examples/llamaindex/python/tools/get_context.py`:
- Around line 19-20: Update the docstring that describes the returned list to
reflect the actual shape produced by to_openai(): state that entries may include
roles "user", "assistant", or "system" and that entries can optionally include a
"name" field, not just {"role","content"}; reference the helper that converts
messages (to_openai()) and update the description in the function (get_context)
so callers know the list may contain system-role items and optional name keys
and that an empty list is returned when there are no messages.
In `@examples/llamaindex/python/tools/query_memory.py`:
- Around line 30-31: The current guard if not query: in the function handling
queries allows whitespace-only strings; update the validation to treat strings
containing only whitespace as empty by using a trimmed check (e.g., check
query.strip() or equivalent) before raising ValueError("query must not be
empty"), so any whitespace-only input triggers the same ValueError; locate the
validation where query is inspected and replace or augment the condition
accordingly.
In `@examples/llamaindex/python/tools/save_memory.py`:
- Around line 25-36: The code currently treats any non-"assistant" role as a
user message, risking silent corruption; update the logic around role, sender,
and message creation (variables: role, sender, assistant_peer, user_peer,
session.add_messages) to validate role explicitly—allow only "assistant" or
"user" (or your canonical enum), raise a ValueError on invalid values, and only
map "assistant"->assistant_peer and "user"->user_peer before calling
session.add_messages to prevent misattributed writes.
In `@examples/llamaindex/README.md`:
- Line 9: Update the Features wording to clarify that context injection differs
by implementation: change the sentence referencing `prefix_messages` to indicate
that non-TypeScript implementations use `prefix_messages` while the TypeScript
client uses `chatHistory` for injecting conversation history into the LLM;
mention both symbols (`prefix_messages`, `chatHistory`) and phrase it like
"Conversation history is retrieved from Honcho and formatted for the LLM before
every request (uses `prefix_messages` in most SDKs; TypeScript client uses
`chatHistory`)."
In `@examples/llamaindex/typescript/main.ts`:
- Around line 75-90: The readline interface `rl` is not guaranteed to be closed
if `chat(...)` throws; wrap the interactive loop in a try/finally so
`rl.close()` always runs: create `rl` as now, then put the while loop and calls
to `chat(userId, userInput, sessionId)` inside a try block and call `rl.close()`
in the finally block (keeping the existing early-close on "quit"/"exit"
behavior, but still ensure `rl.close()` in finally for error-safe cleanup).
- Around line 27-92: Export the chat function as a named export and prevent
automatic CLI execution on import by wrapping the main() invocation in a
module-entry guard; specifically, add a named export for chat (export async
function chat...) and replace the unconditional main().catch(console.error) call
with a runtime check (e.g., if (require && require.main === module) {
main().catch(console.error); } for CommonJS or if (import.meta &&
import.meta.main) { main().catch(console.error); } for ESM) so importing this
module does not run the CLI.
---
Nitpick comments:
In `@examples/llamaindex/typescript/tools/client.ts`:
- Around line 19-24: getClient() currently constructs a new Honcho SDK client on
every call; change it to return a cached singleton by adding a module-level
variable (e.g., let cachedClient: Honcho | null = null) and only instantiate new
Honcho({ apiKey, workspaceId }) when cachedClient is null, then assign and
return cachedClient; ensure you still validate HONCHO_API_KEY and
HONCHO_WORKSPACE_ID as before and reference the same getClient and Honcho
symbols.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 80a33bc3-5ab4-475f-a12a-9783acf12aac
📒 Files selected for processing (15)
examples/llamaindex/README.mdexamples/llamaindex/python/main.pyexamples/llamaindex/python/pyproject.tomlexamples/llamaindex/python/tools/__init__.pyexamples/llamaindex/python/tools/client.pyexamples/llamaindex/python/tools/get_context.pyexamples/llamaindex/python/tools/query_memory.pyexamples/llamaindex/python/tools/save_memory.pyexamples/llamaindex/typescript/main.tsexamples/llamaindex/typescript/package.jsonexamples/llamaindex/typescript/tools/client.tsexamples/llamaindex/typescript/tools/getContext.tsexamples/llamaindex/typescript/tools/queryMemory.tsexamples/llamaindex/typescript/tools/saveMemory.tsexamples/llamaindex/typescript/tsconfig.json
| A list of message dicts: ``[{"role": "user" | "assistant", "content": "..."}]``. | ||
| Returns an empty list if the session has no messages yet. |
There was a problem hiding this comment.
Return contract is too narrow for actual to_openai() output.
to_openai() can include "system" role entries and optional "name" fields, so this docstring currently over-promises a stricter shape than returned.
✏️ Suggested doc fix
- A list of message dicts: ``[{"role": "user" | "assistant", "content": "..."}]``.
- Returns an empty list if the session has no messages yet.
+ OpenAI-format message dicts with ``role``/``content`` (and optional ``name``).
+ Depending on available memory artifacts, the list may include ``"system"``
+ messages (e.g., summary/peer metadata) before conversation turns.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| A list of message dicts: ``[{"role": "user" | "assistant", "content": "..."}]``. | |
| Returns an empty list if the session has no messages yet. | |
| OpenAI-format message dicts with ``role``/``content`` (and optional ``name``). | |
| Depending on available memory artifacts, the list may include ``"system"`` | |
| messages (e.g., summary/peer metadata) before conversation turns. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/llamaindex/python/tools/get_context.py` around lines 19 - 20, Update
the docstring that describes the returned list to reflect the actual shape
produced by to_openai(): state that entries may include roles "user",
"assistant", or "system" and that entries can optionally include a "name" field,
not just {"role","content"}; reference the helper that converts messages
(to_openai()) and update the description in the function (get_context) so
callers know the list may contain system-role items and optional name keys and
that an empty list is returned when there are no messages.
|
|
||
| - **Persistent Memory**: Every conversation turn is saved to Honcho and automatically injected into the agent's system prompt on the next turn. | ||
| - **Natural Language Recall**: The agent can query Honcho's Dialectic API to answer questions like "What are my hobbies?" or "What did we talk about last time?" | ||
| - **Context Injection**: Conversation history is retrieved from Honcho and formatted for the LLM before every request via `prefix_messages`. |
There was a problem hiding this comment.
Clarify TypeScript context-injection wording in Features.
Line 9 currently implies prefix_messages for all implementations, but TypeScript uses chatHistory.
📝 Proposed fix
-- **Context Injection**: Conversation history is retrieved from Honcho and formatted for the LLM before every request via `prefix_messages`.
+- **Context Injection**: Conversation history is retrieved from Honcho and formatted for the LLM before every request via `prefix_messages` (Python) or `chatHistory` (TypeScript).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **Context Injection**: Conversation history is retrieved from Honcho and formatted for the LLM before every request via `prefix_messages`. | |
| - **Context Injection**: Conversation history is retrieved from Honcho and formatted for the LLM before every request via `prefix_messages` (Python) or `chatHistory` (TypeScript). |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/llamaindex/README.md` at line 9, Update the Features wording to
clarify that context injection differs by implementation: change the sentence
referencing `prefix_messages` to indicate that non-TypeScript implementations
use `prefix_messages` while the TypeScript client uses `chatHistory` for
injecting conversation history into the LLM; mention both symbols
(`prefix_messages`, `chatHistory`) and phrase it like "Conversation history is
retrieved from Honcho and formatted for the LLM before every request (uses
`prefix_messages` in most SDKs; TypeScript client uses `chatHistory`)."
| async function chat( | ||
| userId: string, | ||
| message: string, | ||
| sessionId: string | ||
| ): Promise<string> { | ||
| const ctx: HonchoContext = createContext(userId, sessionId); | ||
|
|
||
| const base = | ||
| "You are a helpful assistant with persistent memory powered by Honcho. " + | ||
| "You remember users across conversations. " + | ||
| "When a user asks what you remember about them, use the query_memory tool."; | ||
|
|
||
| const history = await getContext(ctx, 2000); | ||
| const systemContent = | ||
| history.length > 0 | ||
| ? `${base}\n\n## Conversation History\n${history | ||
| .map( | ||
| (m) => | ||
| `${m.role.charAt(0).toUpperCase() + m.role.slice(1)}: ${m.content}` | ||
| ) | ||
| .join("\n")}` | ||
| : base; | ||
|
|
||
| const llm = new OpenAI({ model: "gpt-4.1-mini" }); | ||
| const agent = new ReActAgent({ | ||
| tools: [makeQueryMemoryTool(ctx)], | ||
| llm, | ||
| verbose: false, | ||
| }); | ||
|
|
||
| await saveMemory(userId, message, "user", sessionId); | ||
|
|
||
| const result = await agent.chat({ | ||
| message, | ||
| chatHistory: [{ role: "system", content: systemContent }], | ||
| }); | ||
| const response = result.message.content as string; | ||
|
|
||
| await saveMemory(userId, response, "assistant", sessionId); | ||
|
|
||
| return response; | ||
| } | ||
|
|
||
| async function main() { | ||
| console.log("LlamaIndex HonchoMemoryAgent — type 'quit' to exit\n"); | ||
| const userId = "demo-user"; | ||
| const sessionId = "demo-session"; | ||
|
|
||
| const rl = readline.createInterface({ | ||
| input: process.stdin, | ||
| output: process.stdout, | ||
| }); | ||
|
|
||
| while (true) { | ||
| const userInput = (await rl.question("You: ")).trim(); | ||
| if (!userInput) continue; | ||
| if (["quit", "exit"].includes(userInput.toLowerCase())) { | ||
| rl.close(); | ||
| break; | ||
| } | ||
| const response = await chat(userId, userInput, sessionId); | ||
| console.log(`Agent: ${response}\n`); | ||
| } | ||
| } | ||
|
|
||
| main().catch(console.error); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the import-side-effect issue in examples/llamaindex/typescript/main.ts
rg -nP '^\s*async function chat\(' examples/llamaindex/typescript/main.ts
rg -nP '^\s*main\(\)\.catch\(console\.error\);' examples/llamaindex/typescript/main.ts
# Expected: both patterns match in current code.Repository: plastic-labs/honcho
Length of output: 118
🏁 Script executed:
#!/bin/bash
# Check if README mentions importing chat from main.ts or this file
find . -name README.md -o -name readme.md | head -20Repository: plastic-labs/honcho
Length of output: 437
🏁 Script executed:
#!/bin/bash
# Search for any references to chat import or this module in docs
rg -i "llamaindex.*chat|import.*chat.*main" --type mdRepository: plastic-labs/honcho
Length of output: 126
🏁 Script executed:
#!/bin/bash
# Check if any other files import from this main.ts
rg "from.*examples/llamaindex/typescript/main|from.*\./main" --type ts --type tsx --type js --type jsxRepository: plastic-labs/honcho
Length of output: 90
🏁 Script executed:
#!/bin/bash
# Verify import.meta.main support in TypeScript/Node.js context
# Check the package.json and tsconfig to understand the environment
cat examples/llamaindex/typescript/package.json 2>/dev/null || echo "package.json not found"
cat examples/llamaindex/typescript/tsconfig.json 2>/dev/null || echo "tsconfig.json not found"Repository: plastic-labs/honcho
Length of output: 861
Export chat and guard CLI bootstrap to avoid import side effects.
The README documents importing chat as a named export, but the function is not exported. Additionally, Line 92 unconditionally invokes main().catch(console.error), which causes CLI execution during module import instead of exposing a reusable function.
Proposed fix
-async function chat(
+export async function chat(
userId: string,
message: string,
sessionId: string
): Promise<string> {
@@
-main().catch(console.error);
+if (import.meta.main) {
+ main().catch(console.error);
+}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/llamaindex/typescript/main.ts` around lines 27 - 92, Export the chat
function as a named export and prevent automatic CLI execution on import by
wrapping the main() invocation in a module-entry guard; specifically, add a
named export for chat (export async function chat...) and replace the
unconditional main().catch(console.error) call with a runtime check (e.g., if
(require && require.main === module) { main().catch(console.error); } for
CommonJS or if (import.meta && import.meta.main) { main().catch(console.error);
} for ESM) so importing this module does not run the CLI.
| const rl = readline.createInterface({ | ||
| input: process.stdin, | ||
| output: process.stdout, | ||
| }); | ||
|
|
||
| while (true) { | ||
| const userInput = (await rl.question("You: ")).trim(); | ||
| if (!userInput) continue; | ||
| if (["quit", "exit"].includes(userInput.toLowerCase())) { | ||
| rl.close(); | ||
| break; | ||
| } | ||
| const response = await chat(userId, userInput, sessionId); | ||
| console.log(`Agent: ${response}\n`); | ||
| } | ||
| } |
There was a problem hiding this comment.
Close readline in a finally block for error-safe cleanup.
If chat(...) throws, rl.close() is skipped. Wrap the loop in try/finally so cleanup always runs.
💡 Proposed fix
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
- while (true) {
- const userInput = (await rl.question("You: ")).trim();
- if (!userInput) continue;
- if (["quit", "exit"].includes(userInput.toLowerCase())) {
- rl.close();
- break;
- }
- const response = await chat(userId, userInput, sessionId);
- console.log(`Agent: ${response}\n`);
+ try {
+ while (true) {
+ const userInput = (await rl.question("You: ")).trim();
+ if (!userInput) continue;
+ if (["quit", "exit"].includes(userInput.toLowerCase())) {
+ break;
+ }
+ const response = await chat(userId, userInput, sessionId);
+ console.log(`Agent: ${response}\n`);
+ }
+ } finally {
+ rl.close();
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const rl = readline.createInterface({ | |
| input: process.stdin, | |
| output: process.stdout, | |
| }); | |
| while (true) { | |
| const userInput = (await rl.question("You: ")).trim(); | |
| if (!userInput) continue; | |
| if (["quit", "exit"].includes(userInput.toLowerCase())) { | |
| rl.close(); | |
| break; | |
| } | |
| const response = await chat(userId, userInput, sessionId); | |
| console.log(`Agent: ${response}\n`); | |
| } | |
| } | |
| const rl = readline.createInterface({ | |
| input: process.stdin, | |
| output: process.stdout, | |
| }); | |
| try { | |
| while (true) { | |
| const userInput = (await rl.question("You: ")).trim(); | |
| if (!userInput) continue; | |
| if (["quit", "exit"].includes(userInput.toLowerCase())) { | |
| break; | |
| } | |
| const response = await chat(userId, userInput, sessionId); | |
| console.log(`Agent: ${response}\n`); | |
| } | |
| } finally { | |
| rl.close(); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/llamaindex/typescript/main.ts` around lines 75 - 90, The readline
interface `rl` is not guaranteed to be closed if `chat(...)` throws; wrap the
interactive loop in a try/finally so `rl.close()` always runs: create `rl` as
now, then put the while loop and calls to `chat(userId, userInput, sessionId)`
inside a try block and call `rl.close()` in the finally block (keeping the
existing early-close on "quit"/"exit" behavior, but still ensure `rl.close()` in
finally for error-safe cleanup).
…PL error handling
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/llamaindex/python/main.py`:
- Around line 27-28: Add a whitespace-only input guard at the start of the
chat(user_id: str, message: str, session_id: str) -> str function: call message
= message.strip() (or a separate local variable) and if the result is empty,
raise a ValueError (with a brief message like "message must not be empty or
whitespace") so the function does not save/send blank content; apply the same
strip+ValueError pattern to the other handler referenced around the same area
(the second chat-like call at lines ~66-67) to ensure consistent validation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: aec3604c-e9f3-4ad8-8254-e971a98155bd
📒 Files selected for processing (3)
examples/llamaindex/python/main.pyexamples/llamaindex/python/tools/query_memory.pyexamples/llamaindex/python/tools/save_memory.py
🚧 Files skipped from review as they are similar to previous changes (2)
- examples/llamaindex/python/tools/query_memory.py
- examples/llamaindex/python/tools/save_memory.py
| def chat(user_id: str, message: str, session_id: str) -> str: | ||
| """Run one conversation turn with persistent Honcho memory. |
There was a problem hiding this comment.
Guard chat() against whitespace-only input.
chat() is reusable outside the REPL, and currently accepts " " which gets saved and sent to the agent. Add a local strip() validation at function entry.
Suggested patch
def chat(user_id: str, message: str, session_id: str) -> str:
@@
- ctx = HonchoContext(user_id=user_id, session_id=session_id)
+ cleaned_message = message.strip()
+ if not cleaned_message:
+ raise ValueError("message must not be empty or whitespace")
+
+ ctx = HonchoContext(user_id=user_id, session_id=session_id)
@@
- save_memory(user_id, message, "user", session_id)
- response = str(agent.chat(message))
+ save_memory(user_id, cleaned_message, "user", session_id)
+ response = str(agent.chat(cleaned_message))Also applies to: 66-67
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/llamaindex/python/main.py` around lines 27 - 28, Add a
whitespace-only input guard at the start of the chat(user_id: str, message: str,
session_id: str) -> str function: call message = message.strip() (or a separate
local variable) and if the result is empty, raise a ValueError (with a brief
message like "message must not be empty or whitespace") so the function does not
save/send blank content; apply the same strip+ValueError pattern to the other
handler referenced around the same area (the second chat-like call at lines
~66-67) to ensure consistent validation.
|
Closing this as part of a broader prioritization shift and in an effort to minimize maintenance burden. Thanks for putting in the work on this! |
Summary
examples/llamaindex/with both Python and TypeScript implementations of Honcho memory for LlamaIndex agentsReActAgent.from_tools()withprefix_messagesfor dynamic system prompt injectionnew ReActAgent({ tools, llm })withchatHistoryfor system contextexamples/openai-agents/exampleWhat's included
How it works
prefix_messages(Python) orchatHistory(TypeScript) before every LLM call.make_query_memory_tool(ctx)/makeQueryMemoryTool(ctx)wraps aFunctionToolthat calls Honcho's Dialectic API, closing over the user context.chat()persists the user message before the agent runs and the assistant response after.Test plan
Python:
pip install llama-index llama-index-llms-openai honcho-ai python-dotenvcd python && python main.pyTypeScript:
cd typescript && bun install && bun run main.ts🤖 Generated with Claude Code
Summary by CodeRabbit
Documentation
New Features