A high-performance Discord AI bot built with Bun and Drizzle ORM featuring codebase embeddings, GitHub integration, and conversation history.
- Bun - Ultra-fast JavaScript runtime & server
- Drizzle ORM - TypeScript-first SQL ORM
- PostgreSQL + pgvecto.rs (VectorChord) - High-performance vector database supporting 3072-dimensional embeddings
- discord.js - Discord bot client
- Vercel AI SDK - OpenAI text-embedding-3-large integration
- Octokit - GitHub REST API integration
- Docker - Containerized deployment
- 🔍 Semantic Code Search - Natural language codebase queries
- 👥 Repository Metadata - Automatically indexes contributors, maintainers, and project stats
- 💬 Context-Aware Chat - Maintains last 100 messages per channel
- 🔗 GitHub Integration - Fetch issues and PRs in real-time
- ⚡ Real-time Processing - Immediate message responses
- 🚫 Read-Only - Information only, never modifies code
- 🐳 Docker Ready - Easy deployment with docker-compose
- 📊 Drizzle Studio - Visual database management
- Bun installed
- Docker & Docker Compose
- Discord Bot Token
- OpenAI API Key
- GitHub Personal Access Token
bun installcp env.example .envFill in your credentials in .env
docker-compose -f docker-compose.local.yml up -d# Generate migrations from schema
bun run db:generate
# Run migrations
bun run db:migratebun startOnce the bot is running, trigger the sync:
curl -X POST http://localhost:3000/api/streamyfin/syncThis will fetch and embed the entire Streamyfin codebase from GitHub automatically (no local cloning needed).
Mention your bot:
@YourBot what is the authentication flow?
bun start # Start bot server
bun dev # Start with auto-reload
bun run db:generate # Generate Drizzle migrations
bun run db:migrate # Run database migrations
bun run db:studio # Open Drizzle Studio (database GUI)
bun run embeddings:init # Generate embeddings
bun run bot:test # Test bot connection onlyVisual database management UI:
bun run db:studioOpens at https://local.drizzle.studio
View and edit:
- Embeddings
- Message history
- GitHub cache
Located in src/lib/db/schema.ts:
// Embeddings with vector search
export const embeddings = pgTable('embeddings', {
id: serial('id').primaryKey(),
filePath: text('file_path').notNull(),
content: text('content').notNull(),
vector: vector('vector', { dimensions: 1536 }),
metadata: jsonb('metadata'),
// ...
});
// Message history
export const messageHistory = pgTable('message_history', {
id: serial('id').primaryKey(),
channelId: text('channel_id').notNull(),
messageId: text('message_id').notNull(),
content: text('content').notNull(),
// ...
});
// GitHub cache
export const githubCache = pgTable('github_cache', {
id: serial('id').primaryKey(),
cacheKey: text('cache_key').notNull().unique(),
data: jsonb('data').notNull(),
// ...
});curl http://localhost:3000/healthcurl -X POST http://localhost:3000/api/embeddings/search \
-H "Content-Type: application/json" \
-d '{"query": "authentication", "limit": 5}'curl -X POST http://localhost:3000/api/test/message \
-H "Content-Type: application/json" \
-d '{
"message": "Who is cagemaster?",
"channelId": "test-channel",
"username": "test-user"
}'This endpoint simulates a Discord message without needing Discord. Perfect for:
- Testing bot responses locally
- Debugging AI behavior
- CI/CD integration tests
- Rapid development iteration
curl -X POST http://localhost:3000/api/streamyfin/sync \
-H "Content-Type: application/json" \
-d '{
"owner": "fredrikburmester",
"repo": "streamyfin",
"branch": "develop",
"forceRegenerate": false
}'This will:
- Fetch repository metadata (contributors, stats, etc.)
- Fetch all repository files via GitHub API
- Generate embeddings for all files (on-demand, no cloning)
- Make the codebase searchable by the bot
Benefits:
- ✅ No disk space needed
- ✅ Always up-to-date
- ✅ No git dependencies
- ✅ Faster initial setup
- ✅ Indexes contributor information
curl -X POST http://localhost:3000/api/embeddings/generate \
-H "Content-Type: application/json" \
-d '{
"owner": "username",
"repo": "repository",
"branch": "main",
"forceRegenerate": false
}'The bot can:
- Search codebase: "Find the user authentication code"
- Get file content: "Show me the database schema"
- Answer about contributors: "Who is cagemaster?", "Who are the main developers?"
- Repository info: "How many contributors does the project have?"
- List GitHub issues: "What are the open issues?"
- Get specific issue: "Tell me about issue #123"
- List PRs: "What pull requests are open?"
- Get specific PR: "Details on PR #45"
- Go to Discord Developer Portal
- Create application → Add Bot
- Enable intents:
- ✅ MESSAGE CONTENT INTENT
- ✅ SERVER MEMBERS INTENT
- Get credentials for
.env - Invite bot with this URL:
https://discord.com/api/oauth2/authorize?client_id=YOUR_APP_ID&permissions=274878221376&scope=bot
/
├── server.ts # Main Bun server
├── drizzle.config.ts # Drizzle configuration
├── drizzle/ # Generated migrations
├── src/
│ ├── lib/
│ │ ├── ai/ # AI chat, tools, prompts
│ │ ├── db/
│ │ │ ├── schema.ts # Drizzle schema
│ │ │ ├── client.ts # Drizzle client
│ │ │ └── migrate.ts # Migration runner
│ │ ├── discord/ # Discord bot client
│ │ ├── embeddings/ # Vector embeddings
│ │ ├── github/
│ │ │ ├── api.ts # GitHub API for file operations
│ │ │ └── client.ts # GitHub API (Octokit) for issues/PRs
│ │ └── message-history/ # Chat history
├── scripts/ # Utility scripts
└── docker-compose.yml # Multi-container setup
docker-compose up -d- Push to Git repository
- Create new app in Dokploy
- Select Docker Compose deployment
- Set environment variables
- Deploy
- ✅ High-dimensional vectors - Supports full 3072 dimensions (vs pgvector's 2000 limit)
- ✅ Better performance - Rust-based, optimized for large-scale vector operations
- ✅ HNSW indexing - Fast approximate nearest neighbor search
- ✅ PostgreSQL native - Drop-in replacement for pgvector
- ✅ Better for AI - Perfect for text-embedding-3-large (3072 dims)
- ✅ TypeScript-first - Full type safety
- ✅ SQL-like API - If you know SQL, you know Drizzle
- ✅ Zero dependencies - Lightweight and fast
- ✅ Serverless-ready - Perfect for edge deployments
- ✅ Drizzle Studio - Visual database management
- ✅ Auto-migrations - Generate migrations from schema
- ⚡ ~3x faster server startup vs Node.js
- ⚡ ~2x faster HTTP requests
- ⚡ Native TypeScript support (no transpilation)
- ⚡ Type-safe queries with Drizzle
Edit src/lib/ai/prompt.ts
Edit src/lib/embeddings/chunker.ts
Current settings:
- Model: text-embedding-3-large (3072 dimensions)
- Chunk size: 24,000 characters (~6000 tokens)
- Overlap: 1,200 characters (~300 tokens)
Edit src/lib/db/schema.ts then run bun run db:generate
- Enable MESSAGE CONTENT INTENT in Discord Dev Portal
- Verify
DISCORD_TOKENis correct
# Check PostgreSQL
docker ps | grep postgres
# Open Drizzle Studio
bun run db:studiobun run db:generate
bun run db:migrate- Quick Start Guide - Detailed setup instructions
- Drizzle ORM Guide - Database operations and patterns
MIT