tinygent/examples/memory/buffer-summary-memory at master · filchy/tinygent

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
main.py	main.py

# BufferSummaryChatMemory — Summarizing Conversation Memory

`BufferSummaryChatMemory` is a memory backend that keeps recent messages in a buffer and automatically summarizes older messages when the buffer exceeds a token limit. This gives the model both precise recent context and a compressed summary of earlier conversation.

---

## Concept

* Stores messages in a `BaseChatHistory`.
* When the buffer exceeds `max_token_limit`, it prunes the oldest messages and uses an LLM to generate/update a running summary.
* The summary is stored as a `TinySummaryMessage` and prepended to the buffer when loading variables.
* Perfect for long conversations where you need full context awareness without exceeding token limits.

---

## API

* `llm`: the LLM instance used to generate summaries.
* `max_token_limit`: integer (default `2000`). Maximum tokens before pruning and summarizing.
* `return_messages`: boolean (default `False`). If `True`, returns a list of messages; otherwise returns a formatted string.
* `save_context(message)`: add a new message and trigger pruning if needed.
* `load_variables()`: returns the summary (if any) followed by the recent buffer.
* `prune()`: manually trigger the summarization of older messages.
* `clear()`: reset all stored messages and summary.

---

## Example

```python
from tinygent.core.datamodels.messages import TinyChatMessage, TinyHumanMessage, TinyPlanMessage
from tinygent.core.factory import build_llm
from tinygent.memory import BufferSummaryChatMemory

memory = BufferSummaryChatMemory(
    build_llm('openai:gpt-4o-mini'),
    max_token_limit=30,
    return_messages=True,
)

# First exchange
memory.save_context(TinyHumanMessage(content='Hello, assistant.'))
memory.save_context(TinyChatMessage(content='Hi there! How can I help you today?'))

# Second exchange
memory.save_context(TinyHumanMessage(content='Can you make a plan for my weekend?'))
memory.save_context(TinyPlanMessage(content='Sure! 1. Go hiking. 2. Watch a movie. 3. Relax.'))

print('Full memory:', memory.load_variables())
```

**Output:**

```
Full memory: {'summarized_chat': [TinySummaryMessage(type='summary', metadata={}, content='The human initiated a conversation with the assistant, requesting assistance in planning their weekend.'), TinyPlanMessage(type='plan', metadata={}, content='Sure! 1. Go hiking. 2. Watch a movie. 3. Relax.')]}
```

The oldest messages (greeting exchange and weekend planning request) are summarized into a `TinySummaryMessage`, while the most recent `TinyPlanMessage` remains in the buffer.

---

## When to Use

* Long-running conversations where full history would exceed token limits.
* When you need to preserve important context from earlier in the conversation.
* Ideal for agents that require awareness of past interactions without passing every message.

---

## Notes

* The quality of summaries depends on the LLM you provide.
* Lower `max_token_limit` values trigger more frequent summarization (useful for testing).
* The summary accumulates over time — new pruned messages are merged into the existing summary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

FilesExpand file tree

buffer-summary-memory

Directory actions

More options

Directory actions

More options

Latest commit

History

buffer-summary-memory

Folders and files

parent directory

README.md