From aa5a777c7fbd4c934c3cc24c16b596a43f54c430 Mon Sep 17 00:00:00 2001
From: GitHub Actions <actions@github.com>
Date: Tue, 22 Oct 2024 17:05:04 +0000
Subject: [PATCH 01/10] Auto-generated: README.md and related content

---
 README.md | 91 ++++++++++++++++++++++++-------------------------------
 1 file changed, 40 insertions(+), 51 deletions(-)

diff --git a/README.md b/README.md
index 73eb9ee..8ddb440 100644
--- a/README.md
+++ b/README.md
@@ -39,37 +39,34 @@ accuracy and reliability of your generative AI models, [read this
 blog](https://developer.nvidia.com/blog/enhancing-rag-applications-with-nvidia-nim/).
 
 Get started with NIM Anywhere now with the [quick-start](#quick-start)
-instructions and build your first RAG application using NIMs\!
+instructions and build your first RAG application using NIMs!
 
 ![NIM Anywhere Screenshot](.static/_static/nim-anywhere.png)
  
-  - [Quick-start](#quick-start)
-      - [Generate your NGC Personal
-        Key](#generate-your-ngc-personal-key)
-      - [Install AI Workbench](#install-ai-workbench)
-      - [Download this project](#download-this-project)
-      - [Configure this project](#configure-this-project)
-      - [Start This Project](#start-this-project)
-      - [Populating the Knowledge Base](#populating-the-knowledge-base)
-  - [Developing Your Own
-    Applications](#developing-your-own-applications)
-  - [Application Configuration](#application-configuration)
-      - [Config from a file](#config-from-a-file)
-      - [Config from a custom file](#config-from-a-custom-file)
-      - [Config from env vars](#config-from-env-vars)
-      - [Chain Server config schema](#chain-server-config-schema)
-      - [Chat Frontend config schema](#chat-frontend-config-schema)
-  - [Contributing](#contributing)
-      - [Code Style](#code-style)
-      - [Updating the frontend](#updating-the-frontend)
-      - [Updating documentation](#updating-documentation)
-  - [Managing your Development
-    Environment](#managing-your-development-environment)
-      - [Environment Variables](#environment-variables)
-      - [Python Environment Packages](#python-environment-packages)
-      - [Operating System
-        Configuration](#operating-system-configuration)
-      - [Updating Dependencies](#updating-dependencies)
+- [Quick-start](#quick-start)
+  - [Generate your NGC Personal Key](#generate-your-ngc-personal-key)
+  - [Install AI Workbench](#install-ai-workbench)
+  - [Download this project](#download-this-project)
+  - [Configure this project](#configure-this-project)
+  - [Start This Project](#start-this-project)
+  - [Populating the Knowledge Base](#populating-the-knowledge-base)
+- [Developing Your Own Applications](#developing-your-own-applications)
+- [Application Configuration](#application-configuration)
+  - [Config from a file](#config-from-a-file)
+  - [Config from a custom file](#config-from-a-custom-file)
+  - [Config from env vars](#config-from-env-vars)
+  - [Chain Server config schema](#chain-server-config-schema)
+  - [Chat Frontend config schema](#chat-frontend-config-schema)
+- [Contributing](#contributing)
+  - [Code Style](#code-style)
+  - [Updating the frontend](#updating-the-frontend)
+  - [Updating documentation](#updating-documentation)
+- [Managing your Development
+  Environment](#managing-your-development-environment)
+  - [Environment Variables](#environment-variables)
+  - [Python Environment Packages](#python-environment-packages)
+  - [Operating System Configuration](#operating-system-configuration)
+  - [Updating Dependencies](#updating-dependencies)
 
 # Quick-start
 
@@ -278,17 +275,17 @@ run these steps as `root`.
     machine to the remote machine. If this is not currently enabled, the
     following commands will enable this is most situations. Change
     `REMOTE_USER` and `REMOTE-MACHINE` to reflect your remote address.
-    
-      - From a Windows local client, use the following PowerShell:
-        ``` powershell
-        ssh-keygen -f "C:\Users\local-user\.ssh\id_rsa" -t rsa -N '""'
-        type $env:USERPROFILE\.ssh\id_rsa.pub | ssh REMOTE_USER@REMOTE-MACHINE "cat >> .ssh/authorized_keys"
-        ```
-      - From a MacOS or Linux local client, use the following shell:
-        ``` bash
-        if [ ! -e ~/.ssh/id_rsa ]; then ssh-keygen -f ~/.ssh/id_rsa -t rsa -N ""; fi
-        ssh-copy-id REMOTE_USER@REMOTE-MACHINE
-        ```
+
+    - From a Windows local client, use the following PowerShell:
+      ``` powershell
+      ssh-keygen -f "C:\Users\local-user\.ssh\id_rsa" -t rsa -N '""'
+      type $env:USERPROFILE\.ssh\id_rsa.pub | ssh REMOTE_USER@REMOTE-MACHINE "cat >> .ssh/authorized_keys"
+      ```
+    - From a MacOS or Linux local client, use the following shell:
+      ``` bash
+      if [ ! -e ~/.ssh/id_rsa ]; then ssh-keygen -f ~/.ssh/id_rsa -t rsa -N ""; fi
+      ssh-copy-id REMOTE_USER@REMOTE-MACHINE
+      ```
 
 2.  SSH into the remote host. Then, use the following commands to
     download and execute the NVIDIA AI Workbench Installer.
@@ -343,10 +340,6 @@ Cloning this repository is the recommended way to start. This will not
 allow for local modifications, but is the fastest to get started. This
 also allows for the easiest way to pull updates.
 
-Forking this repository is recommended for development as changes will
-be able to be saved. However, to get updates, the fork maintainer will
-have to regularly pull from the upstream repo. To work from a fork,
-follow [GitHub's
 Forking this repository is recommended for development as changes will
 be able to be saved. However, to get updates, the fork maintainer will
 have to regularly pull from the upstream repo. To work from a fork,
@@ -363,9 +356,8 @@ section.
 1.  Open the local NVIDIA AI Workbench window. From the list of
     locations displayed, select either the remote one you just set up,
     or local if you're going to work locally.
-    
-    ![AI Workbench Locations
-    Menu](.static/da9474cbe2ca0da073b0ced28dd1dc492dfb3cf5.png)
+
+    ![AI Workbench Locations Menu](.static/_static/nvwb_locations.png)
 
 2.  Once inside the location, select *Clone Project*.
 
@@ -376,9 +368,8 @@ section.
     as the default of
     `/home/REMOTE_USER/nvidia-workbench/nim-anywhere.git`. Click
     *Clone*.\`
-    
-    ![AI Workbench Clone Project
-    Menu](.static/eb6d2e60199d06d752eb6e34478c683f2a084d28.png)
+
+    ![AI Workbench Clone Project Menu](.static/_static/nvwb_clone.png)
 
 4.  You will be redirected to the new project’s page. Workbench will
     automatically bootstrap the development environment. You can view
@@ -650,7 +641,6 @@ milvus:
 
 log_level: 
 
-
 ```
 
 ## Chat Frontend config schema
@@ -676,7 +666,6 @@ chain_config_file: ./config.yaml
 
 log_level: 
 
-
 ```
 
 # Contributing

From 035a2f260a834da12828ba2c9014cd64c4f19c19 Mon Sep 17 00:00:00 2001
From: Amelia Taihui Ye <taihui@umich.edu>
Date: Tue, 22 Oct 2024 18:13:29 +0000
Subject: [PATCH 02/10] created amelia-new and making new pull request

---
 docs/_SUMMARY.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/_SUMMARY.md b/docs/_SUMMARY.md
index e7cc95d..0c83b1b 100644
--- a/docs/_SUMMARY.md
+++ b/docs/_SUMMARY.md
@@ -2,7 +2,7 @@ Please join #cdd-nim-anywhere slack channel if you are a internal user, open an
 
 One of the primary benefit of using AI for Enterprises is their ability to work with and learn from their internal data. Retrieval-Augmented Generation ([RAG](https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/)) is one of the best ways to do so. NVIDIA has developed a set of micro-services called [NIM micro-service](https://docs.nvidia.com/nim/large-language-models/latest/introduction.html) to help our partners and customers build effective RAG pipeline with ease. 
 
-NIM Anywhere contains all the tooling required to start integrating NIMs. It natively scales out to full-sized labs and up to production environments. This is great news for building a RAG architecture and easily adding NIMs as needed. If you're unfamiliar with RAG, it dynamically retrieves relevant
+NIM Anywhere contains all the tooling required to start integrating NIMs for RAG. It natively scales out to full-sized labs and up to production environments. This is great news for building a RAG architecture and easily adding NIMs as needed. If you're unfamiliar with RAG, it dynamically retrieves relevant
 external information during inference without modifying the model
 itself. Imagine you're the tech lead of a company with a local database containing confidential, up-to-date information. You don’t want OpenAI to access your data, but you need the model to understand it to answer questions accurately. The solution is to connect your language model to the database and feed them with the information. 
 

From 8a50857d05782d5bc83e69a20a818dcfa98ac914 Mon Sep 17 00:00:00 2001
From: Amelia Taihui Ye <taihui@umich.edu>
Date: Fri, 1 Nov 2024 17:37:16 +0000
Subject: [PATCH 03/10] code walk through

---
 .project/spec.yaml                     |  20 +++
 code/walkthrough/code_walkthrough.html | 162 +++++++++++++++++++++
 code/walkthrough/code_walkthrough.txt  | 189 +++++++++++++++++++++++++
 3 files changed, 371 insertions(+)
 create mode 100644 code/walkthrough/code_walkthrough.html
 create mode 100644 code/walkthrough/code_walkthrough.txt

diff --git a/.project/spec.yaml b/.project/spec.yaml
index 5b758b4..d8cafde 100644
--- a/.project/spec.yaml
+++ b/.project/spec.yaml
@@ -206,6 +206,26 @@ execution:
       proxy:
         trim_prefix: true
       url: http://localhost:3030
+<<<<<<< Updated upstream
+=======
+  - name: Code Walkthrough
+    type: custom
+    class: webapp
+    start_command: cd /project/code/walkthrough && python -m http.server --directory
+      /project/code/walkthrough 8000
+    health_check_command: curl -f "http://localhost:8000/"
+    stop_command: pkill -f "^python -m http.server --directory /project/code/walkthrough"
+    user_msg: ""
+    logfile_path: ""
+    timeout_seconds: 60
+    icon_url: ""
+    webapp_options:
+      autolaunch: true
+      port: "8000"
+      proxy:
+        trim_prefix: true
+      url: http://localhost:8000/
+>>>>>>> Stashed changes
   resources:
     gpu:
       requested: 0
diff --git a/code/walkthrough/code_walkthrough.html b/code/walkthrough/code_walkthrough.html
new file mode 100644
index 0000000..358d5d2
--- /dev/null
+++ b/code/walkthrough/code_walkthrough.html
@@ -0,0 +1,162 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+<head>
+  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+  <meta http-equiv="Content-Style-Type" content="text/css" />
+  <meta name="generator" content="pandoc" />
+  <meta name="author" content="EPG TME" />
+  <meta name="date" content="2024-01-01" />
+  <title>How to build a RAG Chain</title>
+  <style type="text/css">
+    code{white-space: pre-wrap;}
+    span.smallcaps{font-variant: small-caps;}
+    span.underline{text-decoration: underline;}
+    div.column{display: inline-block; vertical-align: top; width: 50%;}
+    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+    ul.task-list{list-style: none;}
+  </style>
+  <link rel="stylesheet" type="text/css" media="screen, projection, print"
+    href="https://www.w3.org/Talks/Tools/Slidy2/styles/slidy.css" />
+  <script src="https://www.w3.org/Talks/Tools/Slidy2/scripts/slidy.js"
+    charset="utf-8" type="text/javascript"></script>
+</head>
+<body>
+<div class="slide titlepage">
+  <h1 class="title">How to build a RAG Chain</h1>
+  <p class="author">
+EPG TME
+  </p>
+  <p class="date">2024</p>
+</div>
+<div id="imports" class="title-slide slide section level1">
+<h1>Imports</h1>
+
+</div>
+<div id="langchain-nvidia-integration" class="slide section level2">
+<h1>Langchain NVIDIA Integration</h1>
+<ul>
+<li>from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings, NVIDIARerank</li>
+<li>explanation:</li>
+</ul>
+<p>LangChain is a framework for developing applications powered by large language models (LLMs). In the process of developing a LLM application, there are many things to consider: which language model to choose, how to load documents, how to embed and retrieve loaded documents, etc. Here LangChain acts as a orchestrator of all the components in the development process and unify them in a single framework to make them compatible with one another.</p>
+<p>Here the langchain_nvidia_ai_endpoints is one of the integration provided by LangChian. NVIDIA wrapped popular language models in NIMs to make them optimized to deliver the best performance on NVIDIA accelerated infrastructure. One easy way of using NIM is to do it through LangChian endpoint.</p>
+<p>NIMs provide easy, consistent, and familiar APIs for running optimized inference on an AI models you want.</p>
+</div>
+<div id="from-langchain_nvidia_ai_endpoints-import-chatnvidia" class="slide section level2">
+<h1>from langchain_nvidia_ai_endpoints import ChatNVIDIA</h1>
+<ul>
+<li>This class provides access to a NVIDIA NIM for chat. By default, it connects to a hosted NIM, but can be configured to connect to a local NIM using the base_url parameter.</li>
+</ul>
+</div>
+<div id="other-langchain-libraries" class="slide section level2">
+<h1>Other LangChain Libraries</h1>
+<ul>
+<li><p>from langchain_core.documents import Document</p></li>
+<li><p>from langchain.retrievers import ContextualCompressionRetriever</p></li>
+<li><p>from langchain_milvus.vectorstores.milvus import Milvus</p></li>
+<li><p>from langchain_community.chat_message_histories import RedisChatMessageHistory</p></li>
+<li><p>from langchain_core.runnables.history import RunnableWithMessageHistory</p></li>
+<li><p>from langchain_core.output_parsers import StrOutputParser</p></li>
+<li><p>from langchain_core.runnables import RunnablePassthrough, chain</p></li>
+</ul>
+<p>Here we import other langchain packages. This is a good opportunity to explain other components in our development of a LLM chain.</p>
+</div>
+<div id="retriever" class="slide section level2">
+<h1>retriever</h1>
+<p>Packages that are related to retrievers are: - from langchain_core.documents import Document - from langchain.retrievers import ContextualCompressionRetriever - from langchain_milvus.vectorstores.milvus import Milvus</p>
+<ul>
+<li><p>A retriever is an interface that returns documents given an unstructured query. In most cases, a retriever relies on a vectorestore. Retriever retrieves documents from a database that stores the digital representation(vector) of documents.</p></li>
+<li><p>Document in LangChain is the class for storing a piece of text and associated information.</p></li>
+<li><p>The Contextual Compression Retriever passes queries to the base retriever, takes the initial documents and passes them through the Document Compressor. The Document Compressor takes a list of documents and shortens it by reducing the contents of documents or dropping documents altogether. Here we can use a NVIDIA infrastructure optimized NIM - reranking NIM as a compressor.</p></li>
+<li><p>Milvus is a database that store, index, and manage embedding vectors generated by machine learning models. We choose Milvus here because it is GPU accelarated.</p></li>
+</ul>
+</div>
+<div id="chat-history" class="slide section level2">
+<h1>Chat history</h1>
+<ul>
+<li><p>from langchain_community.chat_message_histories import RedisChatMessageHistory</p></li>
+<li><p>from langchain_core.runnables.history import RunnableWithMessageHistory</p></li>
+<li><p>The chat history is sequence of messages. Each message has a role (e.g., “user”, “assistant”, ie, from human or from the model), content (e.g., text, multimodal data), and additional metadata.</p></li>
+<li><p>use the RedisChatMessageHistory class from the langchain-redis package to store and manage chat message history using Redis. Together with RunnableWithMessageHistory, they keep track of the message history.</p></li>
+</ul>
+</div>
+
+<div id="actual-code-for-the-chain" class="title-slide slide section level1">
+<h1>Actual Code for the chain</h1>
+
+</div>
+<div id="embedding-model" class="slide section level2">
+<h1>Embedding Model</h1>
+<ul>
+<li><p>embedding_model = NVIDIAEmbeddings( model=app_config.embedding_model.name, base_url=str(app_config.embedding_model.url), api_key=app_config.nvidia_api_key, )</p></li>
+<li><p>This creates an embedding model using NVIDIA’s embeddings, which convert text or other data types into numerica</p></li>
+</ul>
+</div>
+<div id="vector-store" class="slide section level2">
+<h1>Vector store</h1>
+<ul>
+<li><p>vector_store = Milvus( embedding_function=embedding_model, connection_args={“uri”: app_config.milvus.url}, collection_name=app_config.milvus.collection_name, auto_id=True, timeout=10, )</p></li>
+<li><p>This block initializes a vector store using Milvus, a database optimized for vector-based retrieval.</p></li>
+</ul>
+</div>
+<div id="vector-store-1" class="slide section level2">
+<h1>Vector store</h1>
+<ul>
+<li><p>retriever = vector_store.as_retriever()</p></li>
+<li><p>This converts the vector_store into a retriever object, enabling it to perform similarity searches. Given a query, the retriever can find vectors in Milvus that are close (in semantic space) to the query’s embedding.</p></li>
+<li><p>reranker = NVIDIARerank( model=app_config.reranking_model.name, base_url=str(app_config.reranking_model.url), api_key=app_config.nvidia_api_key, )</p></li>
+<li><p>reranking_retriever = ContextualCompressionRetriever(base_compressor=reranker, base_retriever=retriever)</p></li>
+<li><p>This defines a final retrieval pipeline combining the retriever with the reranker. ContextualCompressionRetriever uses the reranker (base_compressor) to compress and order the results from the initial retrieval step (base_retriever). This pipeline returns the most relevant results based on the reranking model’s assessment.</p></li>
+</ul>
+</div>
+<div id="document-formatting" class="slide section level2">
+<h1>Document Formatting</h1>
+<ul>
+<li>def format_docs(docs: list[Document]) -&gt; str: ""“Take in a list of docs and concatenate the content, separating by newlines.”"" return “”.join(doc.page_content for doc in docs)</li>
+<li>format_docs is a helper function that takes a list of documents (docs) and concatenates their content with two newline characters in between. This prepares documents for output in a readable format for the user.</li>
+</ul>
+</div>
+<div id="language-model-initialization" class="slide section level2">
+<h1>Language Model Initialization:</h1>
+<ul>
+<li>llm = ChatNVIDIA( model=app_config.llm_model.name, curr_mode=“nim”, base_url=str(app_config.llm_model.url), api_key=app_config.nvidia_api_key, )</li>
+<li>Initializes an NVIDIA-powered language model (ChatNVIDIA)</li>
+</ul>
+</div>
+<div id="document-retrieval-function" class="slide section level2">
+<h1>Document Retrieval Function</h1>
+<ul>
+<li><p><span class="citation">@chain</span> async def retrieve_context(msg, config) -&gt; str: ""“The Retrieval part of the RAG chain.”"" use_kb = msg[“use_kb”] use_reranker = msg[“use_reranker”] question = msg[“question”]</p>
+<p>if not use_kb: return ""</p>
+<p>if use_reranker: return (reranking_retriever | format_docs).invoke(question, config)</p>
+<p>return (retriever | format_docs).invoke(question, config)</p></li>
+<li><p>This asynchronous function (retrieve_context) retrieves relevant documents based on the user’s question.</p></li>
+<li><p>If use_kb is False, it returns an empty string (no retrieval). If use_reranker is True, it uses the reranking_retriever, which rerank the retrieved documents for accuracy, then formats the results. Otherwise, it uses the basic retriever without reranking</p></li>
+</ul>
+</div>
+<div id="question-parsing-and-condensing" class="slide section level2">
+<h1>Question Parsing and Condensing</h1>
+<ul>
+<li><p><span class="citation">@chain</span> async def question_parsing(msg, config) -&gt; str: ""“Condense the question with chat history”""</p>
+<p>condense_question_prompt = prompts.CONDENSE_QUESTION_TEMPLATE.with_config(run_name=“condense_question_prompt”) condensed_chain = condense_question_prompt | llm | StrOutputParser().with_config(run_name=“condense_question_chain”) if msg[“history”]: return condensed_chain.invoke(msg, config) return msg[“question”]</p></li>
+<li><p>This function condenses the user’s question along with any existing chat history to maintain context.</p></li>
+</ul>
+</div>
+<div id="combining-the-chain" class="slide section level2">
+<h1>Combining the Chain</h1>
+<ul>
+<li>my_chain = ( { “context”: retrieve_context, “question”: question_parsing, “history”: itemgetter(“history”), } | RunnablePassthrough().with_config(run_name=“LLM Prompt Input”) | prompts.CHAT_PROMPT | llm )</li>
+<li>This combines the components into a chain for question-answering: retrieve_context fetches relevant documents. question_parsing formats the question with history. RunnablePassthrough and CHAT_PROMPT set up the prompt format for the language model (llm) to process and respond.</li>
+</ul>
+</div>
+<div id="configuring-message-history" class="slide section level2">
+<h1>Configuring Message History</h1>
+<ul>
+<li><p>my_chain = RunnableWithMessageHistory( my_chain, lambda session_id: RedisChatMessageHistory(session_id, url=str(app_config.redis_dsn)), input_messages_key=“question”, output_messages_key=“output”, history_messages_key=“history”, ).with_types(input_type=ChainInputs)</p></li>
+<li><p>Wraps my_chain with message history using Redis, storing the chat history under a specific session ID for continuity.</p></li>
+</ul>
+</div>
+</body>
+</html>
diff --git a/code/walkthrough/code_walkthrough.txt b/code/walkthrough/code_walkthrough.txt
new file mode 100644
index 0000000..edb0dc2
--- /dev/null
+++ b/code/walkthrough/code_walkthrough.txt
@@ -0,0 +1,189 @@
+% How to build a RAG Chain
+% EPG TME
+% 2024
+
+# Imports
+
+## Langchain NVIDIA Integration
+
+- from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings, NVIDIARerank
+- explanation: 
+
+LangChain is a framework for developing applications powered by large language models (LLMs).
+In the process of developing a LLM application, there are many things to consider: 
+which language model to choose, how to load documents, how to embed and retrieve loaded documents,
+etc. Here LangChain acts as a orchestrator of all the components in the development process 
+and unify them in a single framework to make them compatible with one another.
+
+Here the langchain_nvidia_ai_endpoints is one of the integration provided by LangChian. NVIDIA wrapped 
+popular language models in NIMs to make them optimized to deliver the best performance on 
+NVIDIA accelerated infrastructure. One easy way of using NIM is to do it through LangChian endpoint.
+
+NIMs provide easy, consistent, and familiar APIs for running optimized inference on an AI models you want.
+
+## from langchain_nvidia_ai_endpoints import ChatNVIDIA
+- This class provides access to a NVIDIA NIM for chat. 
+By default, it connects to a hosted NIM, but can be configured to connect 
+to a local NIM using the base_url parameter. 
+
+## Other LangChain Libraries
+
+- from langchain_core.documents import Document
+- from langchain.retrievers import ContextualCompressionRetriever
+- from langchain_milvus.vectorstores.milvus import Milvus
+
+- from langchain_community.chat_message_histories import RedisChatMessageHistory
+- from langchain_core.runnables.history import RunnableWithMessageHistory
+
+- from langchain_core.output_parsers import StrOutputParser
+- from langchain_core.runnables import RunnablePassthrough, chain
+
+
+Here we import other langchain packages. This is a good opportunity to explain other 
+components in our development of a LLM chain. 
+
+## retriever
+Packages that are related to retrievers are:
+- from langchain_core.documents import Document
+- from langchain.retrievers import ContextualCompressionRetriever
+- from langchain_milvus.vectorstores.milvus import Milvus
+
+- A retriever is an interface that returns documents given an unstructured query. 
+In most cases, a retriever relies on a vectorestore. Retriever retrieves documents 
+from a database that stores the digital representation(vector) of documents.
+
+- Document in LangChain is the class for storing a piece of text and associated information.
+
+- The Contextual Compression Retriever 
+passes queries to the base retriever, takes the initial documents and passes them 
+through the Document Compressor. The Document Compressor takes a list of documents 
+and shortens it by reducing the contents of documents or dropping documents altogether.
+Here we can use a NVIDIA infrastructure optimized NIM - reranking NIM as a compressor.
+
+- Milvus is a database that store, index, and manage embedding vectors generated by machine learning models.
+We choose Milvus here because it is GPU accelarated.
+
+
+## Chat history
+- from langchain_community.chat_message_histories import RedisChatMessageHistory
+- from langchain_core.runnables.history import RunnableWithMessageHistory
+
+- The chat history is sequence of messages. Each message has a role (e.g., "user", "assistant", 
+ie, from human or from the model), content (e.g., text, multimodal data), and additional metadata.
+
+- use the RedisChatMessageHistory class from the langchain-redis package to store 
+and manage chat message history using Redis. Together with RunnableWithMessageHistory, they keep track of the message history.
+
+
+# Actual Code for the chain
+## Embedding Model
+
+- embedding_model = NVIDIAEmbeddings(
+    model=app_config.embedding_model.name,
+    base_url=str(app_config.embedding_model.url),
+    api_key=app_config.nvidia_api_key,
+)
+
+- This creates an embedding model using NVIDIA’s embeddings, which convert text or other data types into numerica
+
+## Vector store
+- vector_store = Milvus(
+    embedding_function=embedding_model,
+    connection_args={"uri": app_config.milvus.url},
+    collection_name=app_config.milvus.collection_name,
+    auto_id=True,
+    timeout=10,
+)
+
+- This block initializes a vector store using Milvus, a database optimized for vector-based retrieval.
+
+## Vector store
+
+- retriever = vector_store.as_retriever()
+- This converts the vector_store into a retriever object, enabling it to perform similarity searches. Given a query, the retriever can find vectors in Milvus that are close (in semantic space) to the query’s embedding.
+
+- reranker = NVIDIARerank(
+    model=app_config.reranking_model.name,
+    base_url=str(app_config.reranking_model.url),
+    api_key=app_config.nvidia_api_key,
+)
+- reranking_retriever = ContextualCompressionRetriever(base_compressor=reranker, base_retriever=retriever)
+- This defines a final retrieval pipeline combining the retriever with the reranker. ContextualCompressionRetriever uses the reranker (base_compressor) to compress and order the results from the initial retrieval step (base_retriever). This pipeline returns the most relevant results based on the reranking model’s assessment.
+
+## Document Formatting
+
+- def format_docs(docs: list[Document]) -> str:
+    """Take in a list of docs and concatenate the content, separating by newlines."""
+    return "\n\n".join(doc.page_content for doc in docs)
+- format_docs is a helper function that takes a list of documents (docs) and concatenates their content with two newline characters in between.
+This prepares documents for output in a readable format for the user.
+
+## Language Model Initialization:
+- llm = ChatNVIDIA(
+    model=app_config.llm_model.name,
+    curr_mode="nim",
+    base_url=str(app_config.llm_model.url),
+    api_key=app_config.nvidia_api_key,
+)
+- Initializes an NVIDIA-powered language model (ChatNVIDIA)
+
+## Document Retrieval Function
+- @chain
+async def retrieve_context(msg, config) -> str:
+    """The Retrieval part of the RAG chain."""
+    use_kb = msg["use_kb"]
+    use_reranker = msg["use_reranker"]
+    question = msg["question"]
+
+    if not use_kb:
+        return ""
+
+    if use_reranker:
+        return (reranking_retriever | format_docs).invoke(question, config)
+
+    return (retriever | format_docs).invoke(question, config)
+
+- This asynchronous function (retrieve_context) retrieves relevant documents based on the user’s question.
+- If use_kb is False, it returns an empty string (no retrieval).
+If use_reranker is True, it uses the reranking_retriever, which rerank the retrieved documents for accuracy, then formats the results.
+Otherwise, it uses the basic retriever without reranking
+
+## Question Parsing and Condensing
+- @chain
+async def question_parsing(msg, config) -> str:
+    """Condense the question with chat history"""
+
+    condense_question_prompt = prompts.CONDENSE_QUESTION_TEMPLATE.with_config(run_name="condense_question_prompt")
+    condensed_chain = condense_question_prompt | llm | StrOutputParser().with_config(run_name="condense_question_chain")
+    if msg["history"]:
+        return condensed_chain.invoke(msg, config)
+    return msg["question"]
+- This function condenses the user’s question along with any existing chat history to maintain context.
+
+## Combining the Chain
+- my_chain = (
+    {
+        "context": retrieve_context,
+        "question": question_parsing,
+        "history": itemgetter("history"),
+    }
+    | RunnablePassthrough().with_config(run_name="LLM Prompt Input")
+    | prompts.CHAT_PROMPT
+    | llm
+)
+- This combines the components into a chain for question-answering:
+retrieve_context fetches relevant documents.
+question_parsing formats the question with history.
+RunnablePassthrough and CHAT_PROMPT set up the prompt format for the language model (llm) to process and respond.
+
+## Configuring Message History
+
+- my_chain = RunnableWithMessageHistory(
+    my_chain,
+    lambda session_id: RedisChatMessageHistory(session_id, url=str(app_config.redis_dsn)),
+    input_messages_key="question",
+    output_messages_key="output",
+    history_messages_key="history",
+).with_types(input_type=ChainInputs)
+
+- Wraps my_chain with message history using Redis, storing the chat history under a specific session ID for continuity.

From c17c824be320f13fbf3fed4998cbdf4aecc482a2 Mon Sep 17 00:00:00 2001
From: Amelia <ameliay@nvidia.com>
Date: Tue, 19 Nov 2024 16:17:46 -0800
Subject: [PATCH 04/10] Update spec.yaml

deleted merge markers
---
 .project/spec.yaml | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/.project/spec.yaml b/.project/spec.yaml
index d8cafde..789702b 100644
--- a/.project/spec.yaml
+++ b/.project/spec.yaml
@@ -206,8 +206,6 @@ execution:
       proxy:
         trim_prefix: true
       url: http://localhost:3030
-<<<<<<< Updated upstream
-=======
   - name: Code Walkthrough
     type: custom
     class: webapp
@@ -225,7 +223,6 @@ execution:
       proxy:
         trim_prefix: true
       url: http://localhost:8000/
->>>>>>> Stashed changes
   resources:
     gpu:
       requested: 0

From 39223f9f9ca4784b2015d08446502455c85e3eba Mon Sep 17 00:00:00 2001
From: Sophie Watson <sophwats@gmail.com>
Date: Mon, 27 Jan 2025 11:22:14 +0000
Subject: [PATCH 05/10] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 8ddb440..68f6ddc 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ accuracy and reliability of your generative AI models, [read this
 blog](https://developer.nvidia.com/blog/enhancing-rag-applications-with-nvidia-nim/).
 
 Get started with NIM Anywhere now with the [quick-start](#quick-start)
-instructions and build your first RAG application using NIMs!
+instructions and build your first RAG application using NIM Microservices!
 
 ![NIM Anywhere Screenshot](.static/_static/nim-anywhere.png)
  

From 7af68930d21d756ffb1f66d996c460f18f764f4e Mon Sep 17 00:00:00 2001
From: Sophie Watson <sophwats@gmail.com>
Date: Mon, 27 Jan 2025 11:22:27 +0000
Subject: [PATCH 06/10] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 68f6ddc..58a018e 100644
--- a/README.md
+++ b/README.md
@@ -48,7 +48,7 @@ instructions and build your first RAG application using NIM Microservices!
   - [Install AI Workbench](#install-ai-workbench)
   - [Download this project](#download-this-project)
   - [Configure this project](#configure-this-project)
-  - [Start This Project](#start-this-project)
+  - [Start this project](#start-this-project)
   - [Populating the Knowledge Base](#populating-the-knowledge-base)
 - [Developing Your Own Applications](#developing-your-own-applications)
 - [Application Configuration](#application-configuration)

From cce258a8c6c702f062bed8a6bbb8b06b049838a4 Mon Sep 17 00:00:00 2001
From: Sophie Watson <sophwats@gmail.com>
Date: Mon, 27 Jan 2025 11:22:33 +0000
Subject: [PATCH 07/10] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 58a018e..1d9d318 100644
--- a/README.md
+++ b/README.md
@@ -50,7 +50,7 @@ instructions and build your first RAG application using NIM Microservices!
   - [Configure this project](#configure-this-project)
   - [Start this project](#start-this-project)
   - [Populating the Knowledge Base](#populating-the-knowledge-base)
-- [Developing Your Own Applications](#developing-your-own-applications)
+- [Developing your own applications](#developing-your-own-applications)
 - [Application Configuration](#application-configuration)
   - [Config from a file](#config-from-a-file)
   - [Config from a custom file](#config-from-a-custom-file)

From b628e0f1a29a32bfdab9d03a74b2b19b402f126b Mon Sep 17 00:00:00 2001
From: Sophie Watson <sophwats@gmail.com>
Date: Mon, 27 Jan 2025 11:22:43 +0000
Subject: [PATCH 08/10] Update docs/_SUMMARY.md

---
 docs/_SUMMARY.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/_SUMMARY.md b/docs/_SUMMARY.md
index 0c83b1b..a162e91 100644
--- a/docs/_SUMMARY.md
+++ b/docs/_SUMMARY.md
@@ -2,7 +2,7 @@ Please join #cdd-nim-anywhere slack channel if you are a internal user, open an
 
 One of the primary benefit of using AI for Enterprises is their ability to work with and learn from their internal data. Retrieval-Augmented Generation ([RAG](https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/)) is one of the best ways to do so. NVIDIA has developed a set of micro-services called [NIM micro-service](https://docs.nvidia.com/nim/large-language-models/latest/introduction.html) to help our partners and customers build effective RAG pipeline with ease. 
 
-NIM Anywhere contains all the tooling required to start integrating NIMs for RAG. It natively scales out to full-sized labs and up to production environments. This is great news for building a RAG architecture and easily adding NIMs as needed. If you're unfamiliar with RAG, it dynamically retrieves relevant
+NIM Anywhere contains all the tooling required to start integrating NIM microservices for RAG. It natively scales out to full-sized labs and up to production environments. This is great news for building a RAG architecture and easily adding NIM microservices as needed. If you're unfamiliar with RAG, it dynamically retrieves relevant
 external information during inference without modifying the model
 itself. Imagine you're the tech lead of a company with a local database containing confidential, up-to-date information. You don’t want OpenAI to access your data, but you need the model to understand it to answer questions accurately. The solution is to connect your language model to the database and feed them with the information. 
 

From bb0d498755e7fe2e2c6bc0546ce99496fae15044 Mon Sep 17 00:00:00 2001
From: GitHub Actions <actions@github.com>
Date: Mon, 27 Jan 2025 11:23:17 +0000
Subject: [PATCH 09/10] Auto-generated: README.md and related content

---
 README.md | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/README.md b/README.md
index 1d9d318..118311c 100644
--- a/README.md
+++ b/README.md
@@ -22,24 +22,24 @@ micro-service](https://docs.nvidia.com/nim/large-language-models/latest/introduc
 to help our partners and customers build effective RAG pipeline with
 ease.
 
-NIM Anywhere contains all the tooling required to start integrating
-NIMs. It natively scales out to full-sized labs and up to production
-environments. This is great news for building a RAG architecture and
-easily adding NIMs as needed. If you're unfamiliar with RAG, it
-dynamically retrieves relevant external information during inference
-without modifying the model itself. Imagine you're the tech lead of a
-company with a local database containing confidential, up-to-date
-information. You don’t want OpenAI to access your data, but you need the
-model to understand it to answer questions accurately. The solution is
-to connect your language model to the database and feed them with the
-information.
+NIM Anywhere contains all the tooling required to start integrating NIM
+microservices for RAG. It natively scales out to full-sized labs and up
+to production environments. This is great news for building a RAG
+architecture and easily adding NIM microservices as needed. If you're
+unfamiliar with RAG, it dynamically retrieves relevant external
+information during inference without modifying the model itself. Imagine
+you're the tech lead of a company with a local database containing
+confidential, up-to-date information. You don’t want OpenAI to access
+your data, but you need the model to understand it to answer questions
+accurately. The solution is to connect your language model to the
+database and feed them with the information.
 
 To learn more about why RAG is an excellent solution for boosting the
 accuracy and reliability of your generative AI models, [read this
 blog](https://developer.nvidia.com/blog/enhancing-rag-applications-with-nvidia-nim/).
 
 Get started with NIM Anywhere now with the [quick-start](#quick-start)
-instructions and build your first RAG application using NIM Microservices!
+instructions and build your first RAG application using NIMs!
 
 ![NIM Anywhere Screenshot](.static/_static/nim-anywhere.png)
  
@@ -48,9 +48,9 @@ instructions and build your first RAG application using NIM Microservices!
   - [Install AI Workbench](#install-ai-workbench)
   - [Download this project](#download-this-project)
   - [Configure this project](#configure-this-project)
-  - [Start this project](#start-this-project)
+  - [Start This Project](#start-this-project)
   - [Populating the Knowledge Base](#populating-the-knowledge-base)
-- [Developing your own applications](#developing-your-own-applications)
+- [Developing Your Own Applications](#developing-your-own-applications)
 - [Application Configuration](#application-configuration)
   - [Config from a file](#config-from-a-file)
   - [Config from a custom file](#config-from-a-custom-file)

From a96385a607d685a742d8449ebfd19e8a6933f19a Mon Sep 17 00:00:00 2001
From: GitHub Actions <actions@github.com>
Date: Mon, 27 Jan 2025 17:43:54 +0000
Subject: [PATCH 10/10] Auto-generated: README.md and related content

---
 README.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md
index 19b9915..1329d4b 100644
--- a/README.md
+++ b/README.md
@@ -21,18 +21,18 @@ micro-service](https://docs.nvidia.com/nim/large-language-models/latest/introduc
 to help our partners and customers build effective RAG pipeline with
 ease.
 
-
 NIM Anywhere contains all the tooling required to start integrating NIM
 microservices for RAG. It natively scales out to full-sized labs and up
 to production environments. This is great news for building a RAG
 architecture and easily adding NIM microservices as needed. If you're
-unfamiliar with RAG, it dynamically retrieves relevant external
-information during inference without modifying the model itself. Imagine
-you're the tech lead of a company with a local database containing
-confidential, up-to-date information. You don’t want OpenAI to access
-your data, but you need the model to understand it to answer questions
-accurately. The solution is to connect your language model to the
-database and feed them with the information.
+unfamiliar with RAG, it dynamically retrieves relevant
+
+external information during inference without modifying the model
+itself. Imagine you're the tech lead of a company with a local database
+containing confidential, up-to-date information. You don’t want OpenAI
+to access your data, but you need the model to understand it to answer
+questions accurately. The solution is to connect your language model to
+the database and feed them with the information.
 
 To learn more about why RAG is an excellent solution for boosting the
 accuracy and reliability of your generative AI models, [read this