Python: Bug: In Handoff orchestration - when handling a request message from an agent in the handoff group, the persona adoption message is sent by the User which triggers openAI jailbreak guardrails #12294

SaurabhNiket231 · 2025-05-28T14:43:26Z

Describe the bug
when handling a request message from an agent in the handoff group, the persona adoption message is sent by the User which triggers openAI jailbreak guardrails

if self._agent_thread is None:
self._chat_history.add_message(
ChatMessageContent(
role=AuthorRole.USER,
content=f"Transferred to {self._agent.name}, adopt the persona immediately.",
)
)
response_item = await self._agent.get_response(
messages=self._chat_history.messages, # type: ignore[arg-type]
kernel=self._kernel,
)
else:
response_item = await self._agent.get_response(
messages=ChatMessageContent(
role=AuthorRole.USER,
content=f"Transferred to {self._agent.name}, adopt the persona immediately.",
),
thread=self._agent_thread,
kernel=self._kernel,
)

Changing the role to Assistant fixes the issue.

To Reproduce
Steps to reproduce the behavior:

Create a handoff orchestration
And make sure the open ai guardrails are deployed.
Then run the orchestration.
See error

Expected behavior
It should handoff to other agents on runtime without Jailbreak issues

Screenshots
If applicable, add screenshots to help explain your problem.

Platform

Language: Python
Source: pip package semantic-kernel 1.31.0
AI model: OpenAI:GPT-4o
IDE: VS Code
OS: Windows

Additional context
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "d:\repo\CG_GenAI_Accelerator_repos\sk-agentic-demo.venv\Lib\site-packages\openai_base_client.py", line 1549, in request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", 'type': None, 'param': 'prompt', 'code': 'content_filter', 'status': 400, 'innererror': {'code': 'ResponsibleAIPolicyViolation', 'content_filter_result': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': True, 'detected': True}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}}}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "d:\repo\CG_GenAI_Accelerator_repos\sk-agentic-demo.venv\Lib\site-packages\semantic_kernel\agents\runtime\in_process\in_process_runtime.py", line 470, in _on_message
return await agent.on_message(
^^^^^^^^^^^^^^^^^^^^^^^
...<2 lines>...
)
^
File "d:\repo\CG_GenAI_Accelerator_repos\sk-agentic-demo.venv\Lib\site-packages\semantic_kernel\agents\runtime\core\base_agent.py", line 129, in on_message
return await self.on_message_impl(message, ctx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "d:\repo\CG_GenAI_Accelerator_repos\sk-agentic-demo.venv\Lib\site-packages\semantic_kernel\agents\orchestration\agent_actor_base.py", line 35, in on_message_impl
return await super().on_message_impl(message, ctx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "d:\repo\CG_GenAI_Accelerator_repos\sk-agentic-demo.venv\Lib\site-packages\semantic_kernel\agents\runtime\core\routed_agent.py", line 488, in on_message_impl
return await h(self, message, ctx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "d:\repo\CG_GenAI_Accelerator_repos\sk-agentic-demo.venv\Lib\site-packages\semantic_kernel\agents\runtime\core\routed_agent.py", line 156, in wrapper
return_value = await func(self, message, ctx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "d:\repo\CG_GenAI_Accelerator_repos\sk-agentic-demo.venv\Lib\site-packages\semantic_kernel\agents\orchestration\handoffs.py", line 285, in _handle_request_message
response_item = await self._agent.get_response(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<2 lines>...
)
^

SaurabhNiket231 · 2025-05-28T14:48:28Z

I pasted the solution image actually that changing the role to Assitant fixes it.

TaoChenOSU · 2025-05-28T23:12:58Z

Hey @SaurabhNiket231, thanks for reporting the issue!

Please help me better understand the issue. Are you saying that if you have Azure content filtering enabled on your model endpoint then the persona adoption message would trigger a policy violation?

SaurabhNiket231 · 2025-05-29T07:02:38Z

@TaoChenOSU That is correct. Because the handoff message is being created as USER role it raises the policy violation. But if I change it to ASSISTANT then it can’t raise this issue. It took me a while to understand this and I thought there is some prompt in my agents causing this. To verify that I created a planner agent with very simple prompt like one line prompt. Then also it did that. So I started debugging and found this. Please let me know if you need more details.

TaoChenOSU · 2025-05-30T16:46:47Z

Got it! It makes sense that the content filtering will think the user message is trying to jailbreak the model. We are actually debating if the persona adaptation message is even needed. Not sure if changing it to an Assistant message would have the same effect as a user message. Will experiment with it more.

SaurabhNiket231 added the bug Something isn't working label May 28, 2025

markwallace-microsoft added python Pull requests for the Python Semantic Kernel triage labels May 28, 2025

moonbox3 assigned TaoChenOSU May 28, 2025

moonbox3 added agents and removed triage labels May 28, 2025

moonbox3 added this to Semantic Kernel May 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Python: Bug: In Handoff orchestration - when handling a request message from an agent in the handoff group, the persona adoption message is sent by the User which triggers openAI jailbreak guardrails #12294

Python: Bug: In Handoff orchestration - when handling a request message from an agent in the handoff group, the persona adoption message is sent by the User which triggers openAI jailbreak guardrails #12294

SaurabhNiket231 commented May 28, 2025

SaurabhNiket231 commented May 28, 2025

Uh oh!

TaoChenOSU commented May 28, 2025

Uh oh!

SaurabhNiket231 commented May 29, 2025

Uh oh!

TaoChenOSU commented May 30, 2025

Uh oh!

Python: Bug: In Handoff orchestration - when handling a request message from an agent in the handoff group, the persona adoption message is sent by the User which triggers openAI jailbreak guardrails #12294

Python: Bug: In Handoff orchestration - when handling a request message from an agent in the handoff group, the persona adoption message is sent by the User which triggers openAI jailbreak guardrails #12294

Comments

SaurabhNiket231 commented May 28, 2025

SaurabhNiket231 commented May 28, 2025

Uh oh!

TaoChenOSU commented May 28, 2025

Uh oh!

SaurabhNiket231 commented May 29, 2025

Uh oh!

TaoChenOSU commented May 30, 2025

Uh oh!