aws-samples · supreetkt · May 14, 2024
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,12 @@
+# Change Log
+
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+
 [2.0.0] - 2024-03-27
-- Added code for GAAB integration with Amazon OpenSearch Enbeddings Search and Neo4j
+- Added code for GAAB integration with Amazon OpenSearch Embeddings Search and Neo4j
 
 [1.0.0] - 2024-02-20
 - Added code for GAAB integration with Amazon OpenSearch Text Search

diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
@@ -1,4 +1,5 @@
 ## Code of Conduct
 This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
+
 For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
 opensource-codeofconduct@amazon.com with any additional questions or comments.
diff --git a/Instructions/Amazon_OpenSearch.md b/Instructions/Amazon_OpenSearch.md
@@ -1,37 +1,38 @@
-# generative-ai-application-builder-on-aws-plugin-sample-vector-databases
-This sample code explains how to add Amazon OpenSearch (text search and embeddings search) as a knowledgebase for the "generative ai application builder on aws" solution.
-https://aws.amazon.com/solutions/implementations/generative-ai-application-builder-on-aws/
+# Generative AI Application Builder on AWS Plugin for Vector Databases
 
-**Pre-requisites:**
+This sample code explains how to add Amazon OpenSearch (text search and embeddings search) for [Amazon Bedrock](https://aws.amazon.com/bedrock/knowledge-bases/) in the [Generative AI Application Builder on AWS](https://aws.amazon.com/solutions/implementations/generative-ai-application-builder-on-aws/) solution for Retrieval Augmented Generation (Retrieval Augmented Generation) use cases.
 
-Have the aws cli installed and python installed.
+ **_NOTE:_**
+This is not a production-ready code base, but should rather be used for testing and proof of concepts.
 
+## Pre-requisites
+See the pre-requisites section in [README.md](https://github.com/aws-samples/generative-ai-application-builder-on-aws-plugin-sample-vector-databases?tab=readme-ov-file#pre-requisites)
 
-## Steps to integrate OpenSearch for Retrieval Augmented Generation with Generative AI App builder on AWS.
+
+## Steps to integrate OpenSearch for RAG with Generative AI App builder on AWS.
 
 **Step 1: Clone the repository.**
 
 **Step 2: Create initial resources through the publish script available inside source folder.**
 
-
 ```
-cd source/publishAssets
+cd source/publish_assets
+chmod +x ./publish_amazon_opensearch.sh
+./publish_amazon_opensearch.sh
 
-chmod +x ./publish_Amazon_OpenSearch.sh
+```
 
-./publish_Amazon_OpenSearch.sh
+The `publish_amazon_opensearch.sh` script would help create the following:
 
-```
-This would help create the following:
--  An SSM parameter for storing a string in the SSM Parameter Store.
--  An S3 bucket created for storing the lambda function assets and layers.
--  Logic to zip the contents of lambda function, layers and upload to the s3 bucket. 
+- An SSM parameter for storing a string in the SSM Parameter Store.
+- An S3 bucket created for storing the lambda function assets and layers.
+- The lambda and lambda layer assets zipped and uploaded to the S3 bucket.
 
-**Step 3: Deploy the cloudformtaion stack**
+**Step 3: Deploy the CloudFormation stack**
 
-From the deployment folder, create a cloudformation stack using provision_Amazon_OpenSearch.yaml. Name the cloudformation stack as rag22. At the end of this deployment the following resources will be created.
+From the deployment folder, create a CloudFormation stack using `provision_amazon_opensearch.yaml`. Name the CloudFormation stack as rag22. At the end of this deployment the following resources will be created:
 
-- SageMaker endpoint for the embedding mdodel
+- SageMaker endpoint to create an embeddings model to store the data in the OpenSearch cluster
 - OpenSearch Service cluster
 - SageMaker Notebook
 - IAM roles
@@ -40,37 +41,38 @@ From the deployment folder, create a cloudformation stack using provision_Amazon
 
 - You can use your own data or use sample data using the following steps:
 
-- Navigate to SageMaker in the console. Open the notebook instance "aws-llm-apps-blog" and run it with Jupyter. Choose conda_python3 for the kernel and "restart and run all cells" from the 'kernel' tab
+- Navigate to SageMaker in the console. Open the notebook instance `aws-llm-apps-blog` and run it with Jupyter. Choose conda_python3 for the kernel and "restart and run all cells" from the 'kernel' tab
 
-- Path to follow is: llm-apps-workshop->blogs->opensearch-data-ingestion->2_kb_to_vectordb_opensearch.ipynb
+- Path to follow is: llm-apps-workshop > blogs > opensearch-data-ingestion > 2_kb_to_vectordb_opensearch.ipynb
 
-- note: In the sagemaker notebook, make sure that the cloudformation name parameter is the same as stack name created above.
-CFN_STACK_NAME = ""
+- Note: In the sagemaker notebook, make sure that the CloudFormation name parameter is the same as stack name created above.
+  `CFN_STACK_NAME = ""`
 
-- This notebook will ingest data from SageMaker docs (https://sagemaker.readthedocs.io/en/stable/) into an OpenSearch Service index. It also creates embeddings. In the lambda provided, a text search is performed to retrieve the documents from OpenSearch, but since the embeddings are created you can create a similar function for document search using embeddings as well. For more information also refer to https://aws.amazon.com/solutions/guidance/chatbots-with-vector-databases-on-aws/
+- This notebook will ingest data from [SageMaker docs](https://sagemaker.readthedocs.io/en/stable/) into an OpenSearch Service index. It also creates embeddings. In the lambda provided, a text search is performed to retrieve the documents from OpenSearch, but since the embeddings are created you can create a similar function for document search using embeddings as well. For more information also refer [here](https://aws.amazon.com/solutions/guidance/chatbots-with-vector-databases-on-aws/).
 
-
-**Step 5: Deploy the cloudformation template bedrock_Amazon_OpenSearch.template in the deployment folder..**
+**Step 5: Deploy the CloudFormation template `bedrock_amazon_opensearch.template` in the deployment folder**
 
 In the parameters,
 
 1. Give the name of the OpenSearchChatLambdaBucket the name of the newly created s3 bucket in your AWS account from step 2.
 
-2. For ExistingOpenSearchHost and OpenSearchSecret enter the value of OpenSearchDomainEndpoint and OpenSearchSecret output values from the provision_Amazon_OpenSearch.yaml stack deployment.
+2. For `ExistingOpenSearchHost` and `OpenSearchSecret` enter the value of `OpenSearchDomainEndpoint` and `OpenSearchSecret` output values from the `provision_amazon_opensearch.yaml` stack deployment.
 
-3. You can either create a new kendra index or provide an existing kendra index. 
+3. You can either create a new kendra index or provide an existing kendra index.
 
-note: Once deployed, In the parameter store, you can toggle between Kendra and OpenSearch ("KnowledgeBaseType":"OpenSearch") for the parameter to test the differences between Kendra and Amazon OpenSearch. 
+ **_NOTE:_** Once deployed, In the parameter store, you can toggle between Kendra and OpenSearch ("KnowledgeBaseType":"OpenSearch") for the parameter to test the differences between Kendra and Amazon OpenSearch.
 
 ## Testing
-Once deployed, you can get the UI url from the "Outputs" tab of the cloudformation stack. In the conversation interface, you can enter sagemaker related questions and receive responses. For example, "What is Sagemaker Model Monitor?"
+
+Once deployed, you can get the CloudFront UI URL from the "Outputs" tab of the CloudFormation stack. In the conversation interface, you can ask SageMaker related questions and receive responses. For example, "What is Sagemaker Model Monitor?"
 
 ## Cleanup
 
-- Delete the cloudformation stacks
+- Delete the CloudFormation stacks
 
-- Delete the s3 buckets and the index created.
+- Delete the S3 buckets
 
+- Delete the index created
 
 ## Security
 
@@ -79,4 +81,3 @@ See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more inform
 ## License
 
 This library is licensed under the MIT-0 License. See the LICENSE file.
-
diff --git a/Instructions/Neo4j_search.md b/Instructions/Neo4j_search.md
@@ -1,46 +1,49 @@
-# generative-ai-application-builder-on-aws-plugin-sample-vector-databases
-This sample code explains how to add Neo4j as a knowledgebase for the "generative ai application builder on aws" solution.
-https://aws.amazon.com/solutions/implementations/generative-ai-application-builder-on-aws/
+# Generative AI Application Builder on AWS Plugin for Vector Databases
 
-**Pre-requisites:**
+This sample code explains how to add Neo4j for [Amazon Bedrock](https://aws.amazon.com/bedrock/knowledge-bases/) in the [Generative AI Application Builder on AWS](https://aws.amazon.com/solutions/implementations/generative-ai-application-builder-on-aws/) solution for Retrieval Augmented Generation (Retrieval Augmented Generation) use cases.
 
-Have the aws cli installed and python installed.
+**_NOTE:_**
+This is not a production-ready code base, but should rather be used for testing and proof of concepts.
 
-A free instance can be created for neo4j aura DB. 
-https://neo4j.com/cloud/platform/aura-graph-database/?utm_medium=PaidSearch&utm_source=Google&utm_campaign=UCGenAIutm_content=AMS-Search-SEMBrand-UCGenAI-None-SEM-SEM-NonABM&utm_adgroup=genai-llm&utm_term=neo4j%20ai&gclid=Cj0KCQjwwMqvBhCtARIsAIXsZpaStf3cKwDB7CTKLjpuJRx80URq3WBweTCmU7-r2PsdshNi1o0Y8u0aAl2iEALw_wcB
+## Pre-requisites
 
+See the pre-requisites section in [README.md](https://github.com/aws-samples/generative-ai-application-builder-on-aws-plugin-sample-vector-databases?tab=readme-ov-file#pre-requisites)
+
+A free instance can be created for neo4j AuraDB is available [here](https://neo4j.com/cloud/platform/aura-graph-database/?utm_medium=PaidSearch&utm_source=Google&utm_campaign=UCGenAIutm_content=AMS-Search-SEMBrand-UCGenAI-None-SEM-SEM-NonABM&utm_adgroup=genai-llm&utm_term=neo4j%20ai&gclid=Cj0KCQjwwMqvBhCtARIsAIXsZpaStf3cKwDB7CTKLjpuJRx80URq3WBweTCmU7-r2PsdshNi1o0Y8u0aAl2iEALw_wcB).
 
 ## Steps to integrate Neo4j for Retrieval Augmented Generation with Generative AI App builder on AWS.
 
 **Step 1: Clone the repository.**
 
-In the terminal, use the following aws cli command to create a secret in secrets manager. Provide the username, password within the strings.
+In the terminal, use the following AWS CLI command to create a secret in secrets manager. Provide the username, password within the strings.
 
 ```
 aws secretsmanager create-secret --name 'Neo4j-secret' --secret-string '{"username":"", "password":""}'
 
 ```
+
 **Step 2: Create initial resources through the publish script available inside source folder.**
 
 ```
-cd source/publishAssets
+cd source/publish_assets
 
-chmod +x ./publish_Neo4j.sh
+chmod +x ./publish_neo4j.sh
 
-./publish_Neo4j.sh
+./publish_neo4j.sh
 
 ```
+
 This would help create the following:
--  An SSM parameter for storing a string in the SSM Parameter Store.
--  An S3 bucket created for storing the lambda function assets and layers.
--  Logic to zip the contents of lambda function, layers and upload to the s3 bucket. 
 
+- An SSM parameter for storing a string in the SSM Parameter Store.
+- An S3 bucket created for storing the lambda function assets and layers.
+- Logic to zip the contents of lambda function, layers and upload to the s3 bucket.
 
 **Step 3:Ingest data into Neo4j.**
 
-- To use sample data to ingest in Neo4j using the following steps:
+To use sample data to ingest in Neo4j using the following steps:
 
-- Once you connect to the neo4j aura instance, please run the following cypher queries. Each of these queries would create 1 node, will set 2 properties, and 1 label.
+- Once you connect to the neo4j Aura instance, please run the following cypher queries. Each of these queries would create 1 node, will set 2 properties, and 1 label.
 
 ```
 CREATE (p:Recs {title: "Amazon Q", description: 'Amazon Q is a fully managed, generative-AI powered assistant that you can configure to answer questions, provide summaries, generate content, and complete tasks based on data in your enterprise. Amazon Q provides immediate and relevant information to employees, and helps streamline tasks and accelerate problem solving.'})
@@ -55,33 +58,35 @@ CREATE (p:Recs {title: "Benefits of Amazon Q", description: 'Amazon Q generates
 CREATE (p:Recs {title: "How Amazon Q works", description: "With Amazon Q, you can build an interactive chat application for your organization’s end users, using a combination of your enterprise data and large language model knowledge, or enterprise data only."}
 )
 ```
+
 ```
 CREATE (p:Recs {title: "Enhancing an Amazon Q application", description: "After you finish configuring your application, you can optionally choose to enhance it.You can choose from the following available enhancements:Document enrichment – Control document attribute ingestion and build customized data solutions.Guardrails – Customize blocked topics and choose the knowledge sources your web experience uses for responses.
 Plugins – Enable your end users to perform specific tasks related to third-party services from within their web experience chat—like creating Jira tickets."})
 ```
 
-
-**Step 4: Deploy the cloudformation template bedrock_Neo4j.template.**
+**Step 4: Deploy the CloudFormation template `bedrock_neo4j.template`**
 
 In the parameters,
 
-1. Give the name of the Neo4jLambdaBucket the name of the newly created s3 bucket in your AWS account from step 2.
+1. Give the name of the `Neo4jLambdaBucket` the name of the newly created s3 bucket in your AWS account from step 2.
+
+2. For `ExistingNeo4jHost` enter the Neo4j instance URI.
 
-2. For ExistingNeo4jHost enter the Neo4j instance uri. 
-
-3. You can either create a new kendra index or provide an existing kendra index. 
+3. You can either create a new kendra index or provide an existing kendra index.
 
-note: Once deployed, In the parameter store, you can toggle between Kendra and Neo4j ("KnowledgeBaseType":"Neo4j") for the parameter to test the differences between Kendra and Neo4j. 
+ **_NOTE:_**  Once deployed, In the parameter store, you can toggle between Kendra and Neo4j ("KnowledgeBaseType":"Neo4j") for the parameter to test the differences between Kendra and Neo4j.
 
 ## Testing
-Once deployed, you can get the UI url from the "Outputs" tab of the cloudformation stack. In the conversation interface, you can enter questions related to content in Neo4j and receive responses. For example, "what is amazon Q and how does it work?"
+
+Once deployed, you can get the CloudFront UI URL from the "Outputs" tab of the CloudFormation stack. In the conversation interface, you can enter questions related to content in Neo4j and receive responses. For example, "What is amazon Q and how does it work?"
 
 ## Cleanup
 
-- Delete the cloudformation stacks
+- Delete the CloudFormation stacks
 
-- Delete the s3 buckets and the index created.
+- Delete the S3 buckets
 
+- Delete the created index
 
 ## Security
 
@@ -90,4 +95,3 @@ See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more inform
 ## License
 
 This library is licensed under the MIT-0 License. See the LICENSE file.
-
diff --git a/README.md b/README.md
@@ -1,28 +1,41 @@
-# generative-ai-application-builder-on-aws-plugin-sample-vector-databases
-This sample code explains how to add a Knowledgebase in addition to Amazon Kendra for the "Generative AI Application Builder on AWS" solution.
-https://aws.amazon.com/solutions/implementations/generative-ai-application-builder-on-aws/
+# Generative AI Application Builder on AWS Plugin for Vector Databases
 
-Note: This is not a production ready code base, but should rather be used for testing and proof of concepts.
+This sample code explains how to add a Knowledge Base in addition to Amazon Kendra and Knowledge Bases for [Amazon Bedrock](https://aws.amazon.com/bedrock/knowledge-bases/) for the [Generative AI Application Builder on AWS](https://aws.amazon.com/solutions/implementations/generative-ai-application-builder-on-aws/) solution for Retrieval Augmented Generation (Retrieval Augmented Generation) use cases.
 
-**Pre-requisites:**
+ **_NOTE:_**
+This is not a production-ready code base, but should rather be used for testing and proof of concepts.
 
-Have the aws cli installed and python installed.
 
+## Pre-requisites
 
-## Steps to integrate a Knowledgebase for Retrieval Augmented Generation with Generative AI App builder on AWS.
+Following are pre-requisites to build and deploy locally:
 
-The steps to ingest data into the knowledgebase and deploy the stack for the text use case is under Instructions folder.
+-   [Docker](https://www.docker.com/get-started/)
+-   [Nodejs 20.x](https://nodejs.org/en)
+-   [CDK v2.118.0](https://github.com/aws/aws-cdk)
+-   [Python >= 3.11, <=3.12.1](https://www.python.org/)
+    -   _Note: normal python installations should include support for `ensurepip` and `pip`; however, if running in an environment without these packages you will need to manually install them (e.g. a minimal docker image). See [pip's installation guide](https://pip.pypa.io/en/stable/installation/) for details._
+-   [AWS CLI](https://aws.amazon.com/cli/)
+-   [jq](https://jqlang.github.io/jq/)
 
-The currently supported knowledge base are 
+**Note: Configure the AWS CLI with your AWS credentials or have them exported in the CLI terminal environment. In case the credentials are invalid or expired, running `cdk deploy` produces an error.**
 
-1)Amazon Opensearch text search
-2)Amazon OpenSearch embeddings search
-3)Neo4j Search
+**Also, if you have not run `cdk bootstrap` in this account and region, please follow the instructions [here](https://docs.aws.amazon.com/cdk/v2/guide/bootstrapping.html) to execute cdk bootstrap as a one time process before proceeding with the below steps.**
 
-Choose the instructions for the knowledgebase that you would like to use from the Instructions folder.
 
-Note:Once deployed and testing is complete, do not forget to cleanup resources so that no cost is incurred after testing.
+## Steps to integrate a Knowledge Base for Retrieval Augmented Generation with Generative AI App builder on AWS.
 
+The steps to ingest data into the Knowledge Base and deploy the stack for the text use case are under Instructions folder.
+
+The currently supported knowledge base are:
+
+1. Amazon OpenSearch Text Search
+2. Amazon OpenSearch Embeddings Search
+3. Neo4j Search
+
+Choose the instructions for the Knowledge Base that you would like to use from the Instructions folder.
+
+ **_NOTE:_** Once deployed and testing is complete, do not forget to cleanup resources so that no cost is incurred after testing.
 
 ## Security
 

diff --git a/source/LambdaLayers/.DS_Store → source/Lambda/layers/.DS_Store b/source/LambdaLayers/.DS_Store → source/Lambda/layers/.DS_Store
diff --git a/...iderLambdaLayerAmazonOpenSearch/.DS_Store → ...iderLambdaLayerAmazonOpenSearch/.DS_Store b/...iderLambdaLayerAmazonOpenSearch/.DS_Store → ...iderLambdaLayerAmazonOpenSearch/.DS_Store
diff --git a/...bdaLayerAmazonOpenSearch/python/.DS_Store → ...bdaLayerAmazonOpenSearch/python/.DS_Store b/...bdaLayerAmazonOpenSearch/python/.DS_Store → ...bdaLayerAmazonOpenSearch/python/.DS_Store
diff --git a/...ChatLlmProviderLambdaLayerNeo4j/.DS_Store → ...ChatLlmProviderLambdaLayerNeo4j/.DS_Store b/...ChatLlmProviderLambdaLayerNeo4j/.DS_Store → ...ChatLlmProviderLambdaLayerNeo4j/.DS_Store
diff --git a/...ProviderLambdaLayerNeo4j/python/.DS_Store → ...ProviderLambdaLayerNeo4j/python/.DS_Store b/...ProviderLambdaLayerNeo4j/python/.DS_Store → ...ProviderLambdaLayerNeo4j/python/.DS_Store
diff --git a/source/publishAssets/.DS_Store → source/publish_assets/.DS_Store b/source/publishAssets/.DS_Store → source/publish_assets/.DS_Store
diff --git a/...ublishAssets/publish_Amazon_OpenSearch.sh → ...blish_assets/publish_amazon_opensearch.sh b/...ublishAssets/publish_Amazon_OpenSearch.sh → ...blish_assets/publish_amazon_opensearch.sh
diff --git a/source/publishAssets/publish_Neo4j.sh → source/publish_assets/publish_neo4j.sh b/source/publishAssets/publish_Neo4j.sh → source/publish_assets/publish_neo4j.sh