Skip to content

sagarvadodaria/AgentMemory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agent Memory

.NET 10 Microsoft Agent Framework

Agent Memory is a production-ready, highly secure multi-tenant AI agent architecture built on Microsoft Agent Framework and .NET 10.

At its core, this project demonstrates how to build a multi-layered memory system (Working, Episodic, and Semantic) combined with absolute cryptographically-isolated PII scrubbing and rehydrating. By leveraging the native components of the Microsoft Agent Framework (ChatHistoryProvider, ContextProvider, and Middleware), it guarantees data ty and O(1) performance scaling without compromising on context window management or cross-thread knowledge recall.


🧠 The Multi-Layer Memory Architecture

To provide the LLM with infinite-feeling context while maintaining strict tokenizer budget limits, the system implements a sophisticated tiered memory strategy:

  1. Hot / Working Memory (Redis LRU Cache)
    • Purpose: Millisecond lookup caching for active conversational turns, session metadata, and instantaneous PII token unmasking.
    • Features: Auto-expiring keys and an allkeys-lru eviction policy ensure optimal memory utilization without manual cleanup.
  2. Episodic Memory (Cosmos DB ThreadTranscripts)
    • Purpose: Persistent, chronological, append-only chat history.
    • Features: Protected by a native MultiHash partition key design (/UserId + /ThreadId). It serves as the durable system of record for exactly what was said, maintaining absolute tenant isolation.
  3. Semantic / Long-Term Memory (Cosmos DB Vector Store)
    • Purpose: Cross-thread knowledge and fact retrieval.
    • Features: Uses an asynchronous background KnowledgeGraphExtractor to continuously synthesize conversational facts. Built-in Cosmos DB Vector Indexing (quantizedFlat cosine search) allows the system to recall historical tenant facts seamlessly across entirely disjointed threads.
  4. Context Window Compaction & Sliding Window
    • Purpose: Preventing LLM context exhaustion and infinite prompt bloat.
    • Features: Rather than loading the entire target Episodic Memory into the prompt, the system relies on the custom Context Pipeline to dynamically "compact" history. It guarantees the inclusion of hot Semantic facts and recent turns, safely truncating older episodic history using exact Tiktoken mathematical limits.

Why Azure Cosmos DB?

This architecture depends on Azure Cosmos DB as the backbone because it uniquely supports our scale and isolation requirements natively. We heavily utilize Cosmos DB well-architected best practices:

  • Hierarchical Partition Keys (HPK): We use a MultiHash partition key design (/UserId followed by /ThreadId). This overcomes the standard 20GB logical partition limit, eliminates "hot partitions" for highly active users, and guarantees absolute tenant isolation at the hardware level.
  • Zero Cross-Partition Queries: Because the Agent Framework routing and the PII Vault resolver always possess the UserId and ThreadId context, every database interaction is an O(1) point-read or a strictly scoped single-partition query. This results in single-digit millisecond latency regardless of data scale.
  • Unified Vector & Operational Data: By leveraging Cosmos DB NoSQL's native vectorEmbeddings and quantizedFlat indexes, we store the transactional chat schemas and the 1536-dimensional semantic embeddings in the exact same database. This completely eliminates the architectural complexity and consistency bugs of syncing operational databases with standalone Vector DBs.

Microsoft Agent Framework Pipeline

Instead of polluting the system prompt or arbitrarily mutating chat histories, we heavily leverage the Microsoft Agent Framework's Context Providers Pipeline.

graph TD
    UI["User Input"] --> Middleware["PII Masking Middleware"]
    Middleware -->|"Masked: {{PII_Person_hash_1}}"| AF["Agent Framework"]
    
    subgraph AFCP ["Agent Framework Context Pipeline"]
        AF --> CP1["1. AgentChatProvider"]
        CP1 -.->|"Loads Cosmos DB History"| AF
        
        AF --> CP2["2. AgentSemanticContextProvider"]
        CP2 -.->|"Injects Stamped Vector Facts"| AF
        
        AF --> CP3["3. TokenTruncationContextProvider"]
        CP3 -.->|"Tiktoken Exact Math Eviction"| AF
    end
    
    AF --> LLM["LLM Backend"]
    LLM --> Unmask["PII Unmasking Middleware"]
    Unmask -->|"Resolves Tokens via PiiVault"| Output["User Output"]
Loading

Context Pipeline Breakdown

  1. AgentChatProvider (ChatHistoryProvider)

    • Responsible only for fetching exact Cosmos DB ThreadTranscripts.
    • Crucial Feature: Excludes injected memory facts automatically by ignoring messages tagged with a AgentRequestMessageSourceId. This prevents Semantic Memories from bleeding into the permanent chronological transcript, preventing infinite DB bloat!
  2. AgentSemanticContextProvider

    • Retrieves vectorized facts about the user.
    • Leverages .WithAgentRequestMessageSource(AgentRequestMessageSourceType.AIContextProvider) to transparently "stamp" the memories into the context window as a ChatRole.User message. This keeps the System Prompt clean and framework-compliant.
  3. TokenTruncationContextProvider

    • Runs last in the DI pipeline.
    • Evaluates the total aggregate Tiktoken count of the System Prompt, Semantic Facts, and Chat History.
    • Safely and mathematically evicts only unstamped historical messages from the top of the history until the exact token budget is met.

⚙️ Asynchronous Background Workers

To keep the primary chat request/response loop lightning fast, Agent Memory completely offloads heavy database writes and LLM fact-extraction to .NET asynchronous IHostedService background workers:

  1. RedisToCosmosSyncService (Eventual Consistency)
    • Role: Drains the hot conversational buffers from Redis and gracefully persists them into the cold ThreadTranscripts Episodic memory in Cosmos DB.
    • Why: Safely decouples the user's chat experience from database write latencies. The UI feels instant, while the system guarantees durability asynchronously.
  2. MemoryProcessingService & KnowledgeGraphExtractor
    • Role: Tails the conversation transcripts in the background, invokes an LLM to extract declarative facts about the user, generates 1536-dimensional embeddings, and upserts them into the SemanticMemory Vector Store.
    • Why: Extracting semantic graph memories and generating vector embeddings takes several seconds. By moving this out-of-band, the agent replies instantly, and the semantic knowledge graph quietly updates itself moments later.

Background Worker Execution Logs


🔐 PII Vault

The system supports multiple PII detection engines—including a high-speed Regex-based provider and a cloud-scale Azure Text Analytics provider (NLP)—combined with an intelligent O(1) Hash Tokenizer. This allows the system to securely mask sensitive data before it reaches the LLM, and flawlessly rehydrate it in the response stream before it reaches the client.

Envelope Encryption (DEK/KEK) & Key Rotation

To ensure absolute security, the PiiVault implements a cryptographically secure Envelope Encryption pattern:

  • Data Encryption Key (DEK): Every original PII string is encrypted locally in memory using a DEK before hitting the database. Cosmos DB only ever stores the ciphertext.
  • Key Encryption Key (KEK): The DEK is "wrapped" using a Master Key permanently secured within Azure Key Vault. Only the AGUI Server's Managed Identity has the cryptographic rights to unwrap the DEK and decrypt the underlying string value.
  • Master Key Rotation: Because the data is secured via enveloped DEKs, you can invoke a master key rotation in Key Vault seamlessly without needing to perform an expensive, system-wide database re-encryption job.

Intelligent Hash Tokens

Tokens are generated with a mathematical bind to their source thread: {{PII_EntityType_ThreadHash_Counter}} => e.g., {{PII_Email_12ab34cd_1}}

Why this matters:

  1. No Rainbow Tables: We do not hash the target word (e.g., "John"). We hash the Thread ID. This makes reverse-engineering the underlying string cryptographically impossible from the token alone.
  2. True Tenant Security: Partitioned natively by ThreadId and protected by UserId. User B can never query User A's real name, even if they guess the token.

PII Vault Resolution Flow

When the LLM encounters or regurgitates a token, the PiiVault elegantly resolves it using a cascading fallback strategy:

flowchart LR
    Token["{{PII_Email_12ab34cd_1}}"] --> Vault["PiiVault.GetOriginalValueAsync"]
    Vault --> Hot{"1. Redis Hot Cache"}
    Hot -- "Hit" --> Output["some@email.com"]
    
    Hot -- "Miss" --> Cold{"2. Cosmos Target Partition"}
    Cold -- "Hit" --> Cache["Rehydrate Redis Cache"] --> Output
    
    Cold -- "Miss" --> Vector{"3. Cross-Thread Vector Recall"}
    Vector -->|"O(n) Cosmos DB Query<br/>FILTER BY UserId"| Cache
Loading

Cross-Thread Recall (The Magic)

If the user types their email in Thread A, it gets {{PII_Email_hashA_1}}. A month later in Thread B, the AgentSemanticContextProvider retrieves a vector memory containing that token. The LLM reads it, and responds: "I will send it to {{PII_Email_hashA_1}}".

The PiiVault recognizes that the hash doesn't match the current Thread B. It seamlessly drops into Layer 3 (Cross-Thread Recall), validates the UserId for absolute security, decodes the token from Thread A's partition, and caches it instantly into Thread B's Redis memory for the remainder of the conversation.


� Frontend & AGUI Client Integration

This repository provides the backend AGUI (Agent GUI) Server implementation. It is intentionally headless and expects the connected AGUI client to handle user authentication (e.g., acquiring Entra ID tokens) and secure negotiation with this service.

Sample Client: We are also open-sourcing a companion frontend sample (copilot-app) alongside this C# backend. It provides a reference implementation for securely connecting a rich chat interface to this memory backend.


�🚀 Getting Started

Follow these steps to deploy the infrastructure and run the Agent Memory backend locally.

1. Deploy Azure Infrastructure

All infrastructure (Cosmos DB, Redis, Key Vault, Web Apps, and Text Analytics) is fully codified in Bicep. To deploy via the Azure Portal, first compile the Bicep file to an ARM JSON template:

az bicep build --file AgentMemory/mainTemplate.bicep

Upload the generated mainTemplate.json file into the Azure Portal using "Deploy a custom template".

2. Assign IAM Roles (Azure RBAC)

Because the application uses identity-based access (DefaultAzureCredential) for local development, you must assign your own Azure user account the necessary RBAC roles on the deployed resources:

  • Key Vault: Key Vault Crypto Service Encryption User (to wrap/unwrap the hash tokens).
  • Text Analytics: Cognitive Services User (if using the Azure PII Masking Provider).
  • Cosmos DB & Redis: Ensure you have appropriate Data Plane access or use connection strings.

3. Configure Local Settings

Once the infrastructure is deployed, grab the resource endpoints and connection strings from the portal, along with your LLM configuration, and populate appsettings.json:

{
  "Vault": {
    "KeyVaultUri": "https://<your-prefix>-kv.vault.azure.net/"
  },
  "ConnectionStrings": {
    "Cosmos": "AccountEndpoint=...;AccountKey=...;",
    "Redis": "<your-prefix>-redis.redis.cache.windows.net:6380,password=..."
  },
  "AzureTextAnalytics": {
    "Endpoint": "https://<your-prefix>-ta.cognitiveservices.azure.com/"
  },
  "LLM": {
    // Your LLM deployment configuration here
  }
}

4. Client Authentication (Entra ID)

See ENTRA_SETUP.md for instructions on setting up Microsoft Entra ID for the SPA client and Backend API authentication.

5. Start the Application

Run the .NET backend API, then navigate to your companion frontend sample (copilot-app) and start the client development server:

cd ../copilot-app
npm install
npm run dev

6. Screenshots

Chat Transcripts with masked PII in storage - rehydrated at runtime

Chat Demo

Envelope Encryption of PII data stored in Cosmos PII Encryption

Semantic Memory extracted Semantic Memory

Compaction Compaction

About

Working codebase meant as a reference architecture showcasing AGUI Microsoft Agent Framework with Layered Memory and PII handling

Topics

Resources

License

Stars

Watchers

Forks

Contributors