Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 21 additions & 2 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
AGENTUITY_SDK_KEY=
AGENTUITY_PROJECT_KEY=
AGENTUITY_PROJECT_KEY=

# example-composio
# COMPOSIO_API_KEY=
Expand All @@ -11,5 +11,24 @@ AGENTUITY_PROJECT_KEY=
# SLACK_SIGNING_SECRET=
# SLACK_BOT_TOKEN=

# example-telegram
# TELEGRAM_BOT_TOKEN=

# example-teams
# TEAMS_BOT_APP_ID=
# TEAMS_BOT_APP_PASSWORD=
# TEAMS_BOT_TENANT_ID=
# TEAMS_TEST_USER_KEY= # Get this by chatting with the bot, then check logs for "userKey" field

# gateway-byo-token - Only use if you want to bypass the AI gateway
# ANTHROPIC_API_KEY=
# ANTHROPIC_API_KEY=

# io-email - Email address for the io-email agent (configure in Agentuity Console)
# IO_EMAIL_ADDRESS=

# Checkly Heartbeat URLs - Create monitors at https://app.checklyhq.com
# CHECKLY_TEST_SUITE_URL=
# CHECKLY_EXAMPLE_SLACK_URL=
# CHECKLY_EXAMPLE_TEAMS_URL=
# CHECKLY_IO_EMAIL_URL=
# CHECKLY_IO_SMS_URL=
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .github/example-teams/teams-setup-5-channels.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,11 @@ Each agent demonstrates specific Agentuity features. Here's what you can explore
| **example-chat** | Conversational AI with persistent chat history |
| **example-composio** | Integration with Composio tools (Hacker News) |
| **example-discord** | Discord webhook notifications |
| **example-llm-judge** | LLM-as-a-judge pattern for evaluating AI outputs |
| **example-slack** | Slack bot integration with thread support |
| **example-streaming** | Real-time data streaming |
| **example-teams** | Microsoft Teams bot integration with persistent chat history |
| **example-telegram** | Telegram bot integration |

## How to Use in DevMode

Expand Down
18 changes: 15 additions & 3 deletions agentuity.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,11 @@ deployment:
# You should tune the resources for the deployment
resources:
# The memory requirements
memory: 2Gi
memory: 4Gi
# The CPU requirements
cpu: 2000M
cpu: 4000M
# The disk size requirements
disk: 250Mi
disk: 3Gi
# The deployment mode
mode:
# on-demand or provisioned
Expand Down Expand Up @@ -145,3 +145,15 @@ agents:
- id: agent_fee72e13a3fdc7f0783abd65220d352d
name: test-suite
description: Tests the functionality of all Kitchen Sink example agents
- id: agent_52f73139a881ee0e1cdfafb3c6404e70
name: example-telegram
description: Demonstrates how to integrate with a Telegram bot
- id: agent_c38f9a6d8d3edc6bde56047cbfd16c6f
name: example-streaming
description: Demonstrates advanced agent streaming patterns
- id: agent_4268cac212e32dae1f6a7c394d2c6b9d
name: example-llm-judge
description: Demonstrates LLM-as-a-judge pattern for evaluating AI outputs
- id: agent_ebc87cc45db854eff103e3d54cefa24a
name: example-teams
description: Demonstrates how to integrate with a Microsoft Teams bot
25 changes: 12 additions & 13 deletions bun.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 6 additions & 3 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,21 @@
"typescript": "^5"
},
"dependencies": {
"@agentuity/sdk": "^0.0.146",
"@agentuity/sdk": "^0.0.157",
"@ai-sdk/anthropic": "^2.0.17",
"@ai-sdk/google": "^2.0.11",
"@ai-sdk/groq": "^2.0.24",
"@ai-sdk/openai": "^2.0.23",
"@composio/core": "^0.1.52",
"@composio/vercel": "^0.2.8",
"@mastra/core": "^0.15.2",
"@slack/types": "^2.16.0",
"@slack/web-api": "^7.10.0",
"ai": "^5.0.29",
"botbuilder": "^4.23.3",
"crypto": "^1.0.1",
"source-map-js": "^1.2.1"
"source-map-js": "^1.2.1",
"zod": "^4.1.12"
},
"module": "index.ts"
}
}
2 changes: 1 addition & 1 deletion src/agents/example-composio/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ This example uses the HackerNews toolkit, but Composio offers many more, includi
![Composio project settings showing API key](/.github/example-composio/composio-setup-2-api-key.png)

Copy the API key and add it to your `.env` file:
```
```env
COMPOSIO_API_KEY=your-api-key-here
```

Expand Down
2 changes: 1 addition & 1 deletion src/agents/example-discord/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
2. **Configure Environment Variable**

For local development, add to your `.env` file:
```
```env
DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/YOUR_WEBHOOK_ID/YOUR_WEBHOOK_TOKEN
```

Expand Down
82 changes: 82 additions & 0 deletions src/agents/example-llm-judge/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
import type { AgentContext, AgentRequest, AgentResponse } from '@agentuity/sdk';
import { openai } from '@ai-sdk/openai';
import { generateObject } from 'ai';
import { evaluationSchema, formatReport } from './story-eval';

export default async function Agent(
req: AgentRequest,
resp: AgentResponse,
ctx: AgentContext
) {
try {
// Get the prompt from request, or use a default
const prompt =
(await req.data.text()) ||
'Write a short story about an AI learning to paint';

ctx.logger.info('Starting LLM-as-a-judge evaluation');

// Get stories from gateway-provider
const gatewayAgent = await ctx.getAgent({ name: 'gateway-provider' });
const stories = await gatewayAgent.run({ data: prompt });
const storiesText = await stories.data.text();

ctx.logger.debug('Received stories from gateway-provider');

// Create evaluation prompt
const evaluationPrompt = `
You are evaluating two AI-generated short stories.

Here are the stories:

${storiesText}

Extract each story text:
- OpenAI story: appears after "### OpenAI (GPT-5 Nano)"
- Google story: appears after "### Google (Gemini 2.0 Flash)"

For each story, provide:
1. Creativity score (1-10): How original and imaginative is it?
2. Quality score (1-10): Overall writing quality
3. Strengths: What works well (1-2 sentences)

Finally, provide a verdict declaring which story is better and why (2-3 sentences).`;

ctx.logger.info('Generating structured evaluation');

// Generate structured evaluation
const evaluation = await generateObject({
model: openai('gpt-5-nano'),
schema: evaluationSchema,
system:
'You are a literary critic evaluating short AI-generated stories.',
prompt: evaluationPrompt,
});

// Log key metrics
ctx.logger.debug('Evaluation scores', {
openai: evaluation.object.openai.quality,
google: evaluation.object.google.quality,
});

// Return formatted report
return resp.markdown(formatReport(evaluation.object));
} catch (error) {
ctx.logger.error('Error in LLM judge evaluation:', error);
return resp.text(
'Sorry, there was an error running the evaluation. Please ensure the gateway-provider agent is available.'
);
}
}
Comment on lines +11 to +70
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Verify the model name and consider more robust parsing.

The implementation follows structured error handling guidelines and correctly integrates with the evaluation schema. However, there are two concerns:

  1. Model name verification needed: Line 49 uses openai('gpt-5-nano'), which doesn't match known OpenAI model naming conventions (typically gpt-4, gpt-4-turbo, gpt-3.5-turbo, etc.). If this model doesn't exist, the agent will fail at runtime.

  2. Fragile format coupling: Lines 35-36 hardcode markdown section headers to extract stories from gateway-provider output. If the upstream agent changes its output format, the evaluation prompt won't correctly identify the stories, leading to poor extraction results or evaluation failures.

For issue 1, verify the correct model name:

What OpenAI models are available through the OpenAI API as of 2025? Is gpt-5-nano a valid model?

For issue 2, consider adding format validation or using a more robust parsing approach:

     // Get stories from gateway-provider
     const gatewayAgent = await ctx.getAgent({ name: 'gateway-provider' });
     const stories = await gatewayAgent.run({ data: prompt });
     const storiesText = await stories.data.text();
 
     ctx.logger.debug('Received stories from gateway-provider');
+    
+    // Validate expected format
+    if (!storiesText.includes('### OpenAI') || !storiesText.includes('### Google')) {
+      throw new Error('Unexpected format from gateway-provider agent');
+    }
 
     // Create evaluation prompt
🤖 Prompt for AI Agents
In src/agents/example-llm-judge/index.ts around lines 11 to 70, the review flags
two problems: the call to openai('gpt-5-nano') may reference a non-existent
model and the evaluation prompt relies on brittle hard-coded markdown headers to
locate stories. Fix by replacing the model name with a verified supported model
(e.g., gpt-4 or gpt-4o or configurable via env/ctx config) or make the model
string configurable and validate it at startup; and make story extraction
resilient by having the gateway-provider return structured JSON (preferred) or
include explicit delimiters you parse robustly (e.g., regex with safe fallbacks
and validation checks) and add error handling/logging when parsing fails so the
evaluation uses validated story texts instead of assuming exact headers.


export const welcome = () => {
return {
welcome: `Welcome to the <span style="color: light-dark(#0AA, #0FF);">LLM-as-a-Judge</span> example agent.\n\n### About\n\nThis agent demonstrates the LLM-as-a-judge pattern, where one AI model evaluates the outputs of other models. It generates content using the gateway-provider agent (which uses two different AI models), then evaluates both outputs with structured scoring and feedback.\n\n### Testing\n\nTry the default prompt about AI learning to paint, or send your own story prompt. The agent calls \`gateway-provider\` to generate two stories (one from each AI model), then provides an evaluation comparing their creativity, quality, and strengths.\n\n### Questions?\n\nThe "Help" command is not available for this agent, as it's a specific example demonstration.`,
prompts: [
{
data: 'Write a short story about an AI learning to paint',
contentType: 'text/plain',
},
],
};
};
Loading