Replies: 1 comment 1 reply
-
Hi Omer, What type of instructions are in the MCP server? Can't you just inject them as tool result? Here's what I could suggest: Assume agent has a
using this tool, you can inject more info through tool results. The flow would go something like this:
With this approach you would not need to modify messages or the agent. That said, the approach depends on your accuracy/performance requirements as well as how strongly you want the model to follow these instructions. For example, updating system_prompt would force LLM to follow your instructions much closely compared to a tool result. In terms of the specific problem you are facing with Bedrock Guardrails, it's a common issue. It can be solved/improved by applying tags for guardrails to include/exclude parts of the prompt https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-tagging.html |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all!
I built a simple agent with an MCP client that uses Bedrock (Claude Sonnet-4) model.
I set the
system_prompt
variable to a string with clear (general) assistant instructions (in a markdown format).The agent answers free text prompts pretty well :)
Now, I want to support a "canned questions" flow - where the user provides a canned question as the prompt, and the agent uses this canned question text to fetch a prompt with detailed instructions from an MCP server.
Initially, I replaced the user-provided canned question with the prompt fetched from the MCP server. Everything went well until I added guardrails. Once I did that, AWS guardrails started to detect the prompt I fetched from the MCP server as prompt attack (it contains system-prompt-like instructions).
I could think of the following 2 options to deal with this:
I managed to implement option 1, and it works nicely. I did not manage to find a way to implement the second one.
I'm trying to figure out what the "best practice" way to implement this flow - is it either one of the options I've come up with or maybe there's a better way...?
Beta Was this translation helpful? Give feedback.
All reactions