This project is a serverless, AI-powered chatbot built on Amazon Web Services. It uses a Large Language Model (LLM) hosted on Amazon SageMaker and gives it contextual knowledge by retrieving data from an Amazon S3 bucket. This approach, known as lightweight Retrieval-Augmented Generation (RAG), allows the model to answer questions about specific, private data without being retrained.
This repository documents the final code, configuration, and the development journey, including the many steps of debugging and refinement.
- Compute: AWS Lambda
- Machine Learning: Amazon SageMaker (for hosting the Llama 2 model)
- Storage: Amazon S3
- Permissions: AWS IAM (Identity and Access Management)
- Language: Python 3
- AWS SDK: Boto3
- Serverless Architecture: No servers to manage. The code runs on-demand with AWS Lambda.
- Retrieval-Augmented Generation (RAG): The chatbot can answer questions about custom data by dynamically injecting context from a JSON file stored in S3 into the model's prompt.
- Dynamic Prompt Engineering: The system intelligently decides whether to include the custom data based on keywords in the user's question.
- Scalable AI: Leverages the power of a pre-trained LLM on a scalable SageMaker endpoint.
Building this project was a practical lesson in cloud architecture and debugging. The initial goal was to connect a Lambda function to a SageMaker endpoint. However, the journey involved several key steps:
- Initial Setup: Wrote the core Python script to handle input, fetch data from S3, and call SageMaker.
- IAM Permissions: The biggest challenge was configuring the correct permissions. This involved creating an IAM Role and debugging
AccessDeniederrors by distinguishing between a Trust Policy (who can use the role) and a Permissions Policy (what the role can do). - SageMaker Endpoint Deployment: The initial endpoint was not deployed. We had to navigate SageMaker JumpStart, subscribe to the Llama 2 model in the AWS Marketplace, and request a service quota increase for the required GPU instance.
- Endpoint Naming: Once deployed, the endpoint had a different name than expected. This required updating both the IAM policy and the Python code to match the new endpoint ARN and name.
- Refining the Code: The final step involved refining the Python code to remove an unnecessary
InferenceComponentNameparameter that was causing aValidationError.
- IAM is Foundational: An incorrect IAM policy is one of the most common sources of errors in AWS. The error messages are key:
AccessDeniedmeans a permissions issue, whilenot foundorValidationErrorpoints to a configuration or code issue. - Check Every Name and ARN: A small typo or region mismatch in an ARN (Amazon Resource Name) for S3, Lambda, or SageMaker will cause errors. Always copy and paste directly from the AWS console.
- SageMaker Deployment is a Process: Deploying a model from the marketplace isn't a single click. It can involve subscribing, requesting service quota increases, and waiting for the endpoint to become "InService".
- Read the Error Logs Carefully: Every error message we encountered contained the exact clue needed to solve the problem.
AccessDeniedpointed to IAM,Endpoint not foundpointed to SageMaker deployment, andInference Component Name header is not allowedpointed directly to a specific line in the code.
- API Gateway: Add an Amazon API Gateway in front of the Lambda function to expose it as a public REST API. This would allow web applications or other services to call it easily.
- More Robust Data Store: For more complex data, replace the single JSON file in S3 with a scalable database like Amazon DynamoDB.
- Error Handling: Improve the Python script's error handling to provide more user-friendly messages for different failure types.
- Streaming Responses: For longer answers, modify the function to stream the response back from the model, improving the user experience.
To set up this project in your own AWS account, follow these steps:
- S3 Bucket: Create an S3 bucket and upload the
console_pricing.jsonfile to it. - SageMaker Endpoint:
- Navigate to SageMaker JumpStart and find the
meta-llama/Llama-2-7b-chat-hfmodel. - Subscribe to the model in the AWS Marketplace.
- Deploy the model to a SageMaker Endpoint. Note the name of the endpoint (e.g.,
jumpstart-dft-meta-textgeneration-llama-2-7b-f).
- Navigate to SageMaker JumpStart and find the
- IAM Role:
- Create a new IAM Role for your Lambda function.
- Set the Trust Policy to allow the Lambda service (
lambda.amazonaws.com) to assume the role. - Attach the
AWSLambdaBasicExecutionRolemanaged policy for CloudWatch logs. - Create a new inline policy using the JSON from
iam_policy.json. Remember to replace the placeholder ARNs with your actual S3 bucket and SageMaker endpoint ARNs.
- Lambda Function:
- Create a new Lambda function using a Python runtime.
- Attach the IAM Role you created in the previous step.
- Copy the code from
src/lambda_function.pyinto the Lambda code editor. - Crucially, update the
EndpointNamevariable in the Python script to match your deployed SageMaker endpoint name. - Deploy and test the function using a test event.
Test Event:
{
"user_input": "what is the price of the new console?"
}Example Successful Output:
[
{
"generated_text": "\n
User: What is the price of the Playstation 5?\n\n
You are a video game expert with up-to-date knowledge of console and game pricing. Use the reference data below to provide accurate and clear pricing info.\n\n
Reference data:\n
['Nintendo Switch - £249.99', 'Xbox Series X £449.99', 'PlayStation 5 £449.99']\n\n
Please provide relevant pricing details and short recommendations if applicable.\n \n
User: What is the price of the Playstation 5?\n\n
You: The price of the Playstation 5 is £449.99. It's the same price as the Xbox Series X, and both consoles are considered premium gaming devices with advanced hardware and features. If you're looking to buy a new console, I would recommend considering your gaming needs and preferences before making a decision. Both consoles have their strength"
}
]