Agent Memory is a production-ready, highly secure multi-tenant AI agent architecture built on Microsoft Agent Framework and .NET 10.
At its core, this project demonstrates how to build a multi-layered memory system (Working, Episodic, and Semantic) combined with absolute cryptographically-isolated PII scrubbing and rehydrating. By leveraging the native components of the Microsoft Agent Framework (ChatHistoryProvider, ContextProvider, and Middleware), it guarantees data ty and O(1) performance scaling without compromising on context window management or cross-thread knowledge recall.
To provide the LLM with infinite-feeling context while maintaining strict tokenizer budget limits, the system implements a sophisticated tiered memory strategy:
- Hot / Working Memory (Redis LRU Cache)
- Purpose: Millisecond lookup caching for active conversational turns, session metadata, and instantaneous PII token unmasking.
- Features: Auto-expiring keys and an
allkeys-lrueviction policy ensure optimal memory utilization without manual cleanup.
- Episodic Memory (Cosmos DB
ThreadTranscripts)- Purpose: Persistent, chronological, append-only chat history.
- Features: Protected by a native
MultiHashpartition key design (/UserId+/ThreadId). It serves as the durable system of record for exactly what was said, maintaining absolute tenant isolation.
- Semantic / Long-Term Memory (Cosmos DB Vector Store)
- Purpose: Cross-thread knowledge and fact retrieval.
- Features: Uses an asynchronous background
KnowledgeGraphExtractorto continuously synthesize conversational facts. Built-in Cosmos DB Vector Indexing (quantizedFlatcosine search) allows the system to recall historical tenant facts seamlessly across entirely disjointed threads.
- Context Window Compaction & Sliding Window
- Purpose: Preventing LLM context exhaustion and infinite prompt bloat.
- Features: Rather than loading the entire target Episodic Memory into the prompt, the system relies on the custom Context Pipeline to dynamically "compact" history. It guarantees the inclusion of hot Semantic facts and recent turns, safely truncating older episodic history using exact Tiktoken mathematical limits.
This architecture depends on Azure Cosmos DB as the backbone because it uniquely supports our scale and isolation requirements natively. We heavily utilize Cosmos DB well-architected best practices:
- Hierarchical Partition Keys (HPK): We use a
MultiHashpartition key design (/UserIdfollowed by/ThreadId). This overcomes the standard 20GB logical partition limit, eliminates "hot partitions" for highly active users, and guarantees absolute tenant isolation at the hardware level. - Zero Cross-Partition Queries: Because the Agent Framework routing and the PII Vault resolver always possess the
UserIdandThreadIdcontext, every database interaction is an O(1) point-read or a strictly scoped single-partition query. This results in single-digit millisecond latency regardless of data scale. - Unified Vector & Operational Data: By leveraging Cosmos DB NoSQL's native
vectorEmbeddingsandquantizedFlatindexes, we store the transactional chat schemas and the 1536-dimensional semantic embeddings in the exact same database. This completely eliminates the architectural complexity and consistency bugs of syncing operational databases with standalone Vector DBs.
Instead of polluting the system prompt or arbitrarily mutating chat histories, we heavily leverage the Microsoft Agent Framework's Context Providers Pipeline.
graph TD
UI["User Input"] --> Middleware["PII Masking Middleware"]
Middleware -->|"Masked: {{PII_Person_hash_1}}"| AF["Agent Framework"]
subgraph AFCP ["Agent Framework Context Pipeline"]
AF --> CP1["1. AgentChatProvider"]
CP1 -.->|"Loads Cosmos DB History"| AF
AF --> CP2["2. AgentSemanticContextProvider"]
CP2 -.->|"Injects Stamped Vector Facts"| AF
AF --> CP3["3. TokenTruncationContextProvider"]
CP3 -.->|"Tiktoken Exact Math Eviction"| AF
end
AF --> LLM["LLM Backend"]
LLM --> Unmask["PII Unmasking Middleware"]
Unmask -->|"Resolves Tokens via PiiVault"| Output["User Output"]
-
AgentChatProvider(ChatHistoryProvider)- Responsible only for fetching exact Cosmos DB
ThreadTranscripts. - Crucial Feature: Excludes injected memory facts automatically by ignoring messages tagged with a
AgentRequestMessageSourceId. This prevents Semantic Memories from bleeding into the permanent chronological transcript, preventing infinite DB bloat!
- Responsible only for fetching exact Cosmos DB
-
AgentSemanticContextProvider- Retrieves vectorized facts about the user.
- Leverages
.WithAgentRequestMessageSource(AgentRequestMessageSourceType.AIContextProvider)to transparently "stamp" the memories into the context window as aChatRole.Usermessage. This keeps the System Prompt clean and framework-compliant.
-
TokenTruncationContextProvider- Runs last in the DI pipeline.
- Evaluates the total aggregate Tiktoken count of the System Prompt, Semantic Facts, and Chat History.
- Safely and mathematically evicts only unstamped historical messages from the top of the history until the exact token budget is met.
To keep the primary chat request/response loop lightning fast, Agent Memory completely offloads heavy database writes and LLM fact-extraction to .NET asynchronous IHostedService background workers:
RedisToCosmosSyncService(Eventual Consistency)- Role: Drains the hot conversational buffers from Redis and gracefully persists them into the cold
ThreadTranscriptsEpisodic memory in Cosmos DB. - Why: Safely decouples the user's chat experience from database write latencies. The UI feels instant, while the system guarantees durability asynchronously.
- Role: Drains the hot conversational buffers from Redis and gracefully persists them into the cold
MemoryProcessingService&KnowledgeGraphExtractor- Role: Tails the conversation transcripts in the background, invokes an LLM to extract declarative facts about the user, generates 1536-dimensional embeddings, and upserts them into the
SemanticMemoryVector Store. - Why: Extracting semantic graph memories and generating vector embeddings takes several seconds. By moving this out-of-band, the agent replies instantly, and the semantic knowledge graph quietly updates itself moments later.
- Role: Tails the conversation transcripts in the background, invokes an LLM to extract declarative facts about the user, generates 1536-dimensional embeddings, and upserts them into the
The system supports multiple PII detection engines—including a high-speed Regex-based provider and a cloud-scale Azure Text Analytics provider (NLP)—combined with an intelligent O(1) Hash Tokenizer. This allows the system to securely mask sensitive data before it reaches the LLM, and flawlessly rehydrate it in the response stream before it reaches the client.
To ensure absolute security, the PiiVault implements a cryptographically secure Envelope Encryption pattern:
- Data Encryption Key (DEK): Every original PII string is encrypted locally in memory using a DEK before hitting the database. Cosmos DB only ever stores the ciphertext.
- Key Encryption Key (KEK): The DEK is "wrapped" using a Master Key permanently secured within Azure Key Vault. Only the AGUI Server's Managed Identity has the cryptographic rights to unwrap the DEK and decrypt the underlying string value.
- Master Key Rotation: Because the data is secured via enveloped DEKs, you can invoke a master key rotation in Key Vault seamlessly without needing to perform an expensive, system-wide database re-encryption job.
Tokens are generated with a mathematical bind to their source thread:
{{PII_EntityType_ThreadHash_Counter}} => e.g., {{PII_Email_12ab34cd_1}}
Why this matters:
- No Rainbow Tables: We do not hash the target word (e.g., "John"). We hash the Thread ID. This makes reverse-engineering the underlying string cryptographically impossible from the token alone.
- True Tenant Security: Partitioned natively by
ThreadIdand protected byUserId. User B can never query User A's real name, even if they guess the token.
When the LLM encounters or regurgitates a token, the PiiVault elegantly resolves it using a cascading fallback strategy:
flowchart LR
Token["{{PII_Email_12ab34cd_1}}"] --> Vault["PiiVault.GetOriginalValueAsync"]
Vault --> Hot{"1. Redis Hot Cache"}
Hot -- "Hit" --> Output["some@email.com"]
Hot -- "Miss" --> Cold{"2. Cosmos Target Partition"}
Cold -- "Hit" --> Cache["Rehydrate Redis Cache"] --> Output
Cold -- "Miss" --> Vector{"3. Cross-Thread Vector Recall"}
Vector -->|"O(n) Cosmos DB Query<br/>FILTER BY UserId"| Cache
If the user types their email in Thread A, it gets {{PII_Email_hashA_1}}.
A month later in Thread B, the AgentSemanticContextProvider retrieves a vector memory containing that token. The LLM reads it, and responds: "I will send it to {{PII_Email_hashA_1}}".
The PiiVault recognizes that the hash doesn't match the current Thread B. It seamlessly drops into Layer 3 (Cross-Thread Recall), validates the UserId for absolute security, decodes the token from Thread A's partition, and caches it instantly into Thread B's Redis memory for the remainder of the conversation.
This repository provides the backend AGUI (Agent GUI) Server implementation. It is intentionally headless and expects the connected AGUI client to handle user authentication (e.g., acquiring Entra ID tokens) and secure negotiation with this service.
Sample Client: We are also open-sourcing a companion frontend sample (
copilot-app) alongside this C# backend. It provides a reference implementation for securely connecting a rich chat interface to this memory backend.
Follow these steps to deploy the infrastructure and run the Agent Memory backend locally.
All infrastructure (Cosmos DB, Redis, Key Vault, Web Apps, and Text Analytics) is fully codified in Bicep. To deploy via the Azure Portal, first compile the Bicep file to an ARM JSON template:
az bicep build --file AgentMemory/mainTemplate.bicepUpload the generated mainTemplate.json file into the Azure Portal using "Deploy a custom template".
Because the application uses identity-based access (DefaultAzureCredential) for local development, you must assign your own Azure user account the necessary RBAC roles on the deployed resources:
- Key Vault:
Key Vault Crypto Service Encryption User(to wrap/unwrap the hash tokens). - Text Analytics:
Cognitive Services User(if using the Azure PII Masking Provider). - Cosmos DB & Redis: Ensure you have appropriate Data Plane access or use connection strings.
Once the infrastructure is deployed, grab the resource endpoints and connection strings from the portal, along with your LLM configuration, and populate appsettings.json:
{
"Vault": {
"KeyVaultUri": "https://<your-prefix>-kv.vault.azure.net/"
},
"ConnectionStrings": {
"Cosmos": "AccountEndpoint=...;AccountKey=...;",
"Redis": "<your-prefix>-redis.redis.cache.windows.net:6380,password=..."
},
"AzureTextAnalytics": {
"Endpoint": "https://<your-prefix>-ta.cognitiveservices.azure.com/"
},
"LLM": {
// Your LLM deployment configuration here
}
}See ENTRA_SETUP.md for instructions on setting up Microsoft Entra ID for the SPA client and Backend API authentication.
Run the .NET backend API, then navigate to your companion frontend sample (copilot-app) and start the client development server:
cd ../copilot-app
npm install
npm run devChat Transcripts with masked PII in storage - rehydrated at runtime




