Skip to content

Fork of LLMCord; bridging LLMs to Discord bots, focused on DeSci DAOs

Notifications You must be signed in to change notification settings

ResearchCollective/daocord

 
 

Repository files navigation

DAOcord Knowledge & Research Bots

DAOcord combines two cooperating Discord automations that share an LLM-first workflow: a knowledge bot that answers questions from your organization's documentation, and a report bot that monitors external sources to produce research updates. This README explains how the repository is structured, how to configure and run each bot, and which follow-up tasks remain before production deployment.

Quick Start for New Maintainers

If you're picking up this codebase:

  1. Read the Overview (below) to understand the two-bot architecture.
  2. Check Prerequisites and install dependencies via requirements.txt.
  3. Review existing configs (config.yaml, config_report.yaml) to see what environment variables are expected.
  4. Set up Discord bots at discord.com/developers/applications — you need two separate applications (one for knowledge, one for reports), each with MESSAGE CONTENT INTENT enabled.
  5. Populate environment variables for your deployment platform (Railway, local .env, etc.) — see Configuration sections below.
  6. Test locally before deploying: run python main.py and python main_report.py run --all to verify connectivity.
  7. Consult supporting docs: README_BOTS.md has detailed setup steps, ASSESSMENT.md documents known Reddit config issues, REPORTING_GUIDE.md covers report workflows.

Overview

Script Role Primary configuration Highlights
main.py Knowledge bot that serves DAO documentation to Discord users config.yaml Document chunking via tools/dao_docs, /model switching, optional Google Docs sync, relevance gating
main_report.py Research report bot that monitors X/Twitter & Reddit and posts curated updates config_report.yaml Cron-style scheduling, approval workflow, Reddit & X data collectors, LLM-powered summaries

Repository layout

  • main.py – Discord bot answering questions with DAO docs.
  • main_report.py – Discord bot generating scheduled research reports.
  • config.yaml / config_report.yaml – Runtime configuration for the two bots.
  • tools/ – Support utilities (DAO doc ingestion, Reddit/X monitors, report generation helpers).
  • diagnose_*.py – Utilities for debugging report pipelines.
  • docs/ – Knowledge base consumed by the main bot (ignored in git; populated locally).
  • data/, logs/, reports/ – Persistent state for monitors and generated output.

Prerequisites

  • Python 3.10+ (3.11 recommended).
  • Discord applications for each bot with “MESSAGE CONTENT INTENT”.
  • API keys for whichever LLM providers you intend to use (OpenAI, Anthropic, Google, etc.).
  • Reddit and X/Twitter API credentials for the report bot if those sources are enabled.

Installation

  1. Clone the project and create a virtual environment (recommended).
  2. Install dependencies:
    python -m pip install -r requirements.txt
  3. Copy config-example.yaml to config.yaml and tailor it, or start from the provided config.yaml/config_report.yaml templates and fill in environment variables.

Configuration

Both bots expand environment variables referenced in their YAML config files, so secrets stay outside of version control.

Knowledge bot (main.py)

  • Core settings live in config.yaml, which is loaded with environment variable expansion.@config.yaml#1-128
  • DAO documentation is sourced from the docs/ directory. If the folder is empty, tools/dao_docs.py can clone the remote repository defined by REPO_URL and chunk markdown into searchable pieces.@tools/dao_docs.py#7-144
  • Supports hybrid corpora: GitHub-hosted markdown (via the default repo clone) and optional Google Docs exports synchronized through GDocsCache (JSON service-account credentials referenced in config.yaml). Google sync has regressed recently and may require debugging before production use; the GitHub flow is the current, primarily tested path.@tools/gdocs_cache.py#1-192 @main.py#261-295
  • Railway deployment notes: configs lean on environment-variable expansion so you can inject secrets via the Railway dashboard. For Google credentials, either supply individual GOOGLE_* vars or mount the JSON blob and point google_application_credentials accordingly.@config.yaml#19-39
  • Important environment variables:
Variable Purpose
KNOWLEDGE_BOT_DISCORD_TOKEN, KNOWLEDGE_BOT_CLIENT_ID, KNOWLEDGE_BOT_STATUS_MESSAGE Discord authentication and status text
SYSTEM_PROMPT_DAOCORD or SYSTEM_PROMPT_DAOCORD_B64 Required persona for the bot; no fallback is bundled, so one must be provided.@main.py#156-199
LLM_PRIMARY_PROVIDER, LLM_PRIMARY_MODEL Selects the default provider/model pair used by the bot.@config.yaml#88-127
Provider API keys (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, OPENROUTER_API_KEY) Enables each configured LLM endpoint.@config.yaml#56-86
Optional Google service account fields (e.g. USE_GOOGLE_DOCS, GOOGLE_*) Enable Google Docs synchronization via tools/gdocs_cache.py when set.@config.yaml#19-39
  • Adjust min_relevance_score in config.yaml to control how strict document lookup should be for DAO answers.@main.py#489-510

Report bot (main_report.py)

  • Operates from config_report.yaml, which also resolves $VARNAME placeholders via environment variables.@main_report.py#41-150
  • Data source coverage: Reddit monitoring is the most exercised path today; X/Twitter ingestion follows the same interface but remains largely untested, so sandbox it before turning on production posting.
  • Key environment variables:
Variable Purpose
REPORT_BOT_DISCORD_TOKEN, REPORT_BOT_CLIENT_ID, REPORT_BOT_STATUS_MESSAGE Discord credentials for the report bot
REPORT_CHANNEL_IDS Comma-separated channel IDs where reports will be posted.@config_report.yaml#15-44
REPORT_CONTENT_INTENTION, REPORTGEN_SYSTEM_PROMPT Direct the LLM summarization pipeline.@config_report.yaml#20-21
REPORT_BOT_APPROVERS_ROLE_ID Grants the approval workflow access to a specific Discord role.@config_report.yaml#26-37
REDDIT_CLIENT_ID, REDDIT_CLIENT_SECRET, REDDIT_REPORTGEN_SUBREDDITS, REDDIT_REPORTGEN_KEYWORDS, REDDIT_REPORTGEN_EXCLUDE_KEYWORDS, REDDIT_REPORTS_ENABLED Reddit API authentication and monitoring scope.@config_report.yaml#78-107
X_BEARER_TOKEN, X_REPORTGEN_KEYWORDS, X_REPORTGEN_USERS, X_REPORTGEN_EXCLUDE_KEYWORDS, TWITTER_REPORTS_ENABLED Enables and configures X/Twitter monitoring.@config_report.yaml#52-75
LLM provider keys (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, etc.) Used for report summarization.@config_report.yaml#107-143
  • The approval workflow stores pending reports in data/quarantine and approved ones in data/approved. Ensure those directories exist and are writable.@config_report.yaml#26-45

Running the bots

Knowledge bot (main.py)

  1. Ensure the DAO documentation exists in docs/ (or allow tools/dao_docs to clone it on first run).
  2. Verify the system prompt and provider settings are set via environment variables.
  3. Launch the bot:
    python main.py
  4. Mention the bot in Discord or DM it. It will:
    • Retrieve relevant markdown snippets with RapidFuzz scoring before issuing an LLM request.@main.py#469-539
    • Reply with LLM output plus cited file sources.@main.py#555-561
    • Support /model for swapping among configured models.@main.py#297-324
  5. Optional: enable the Google Docs cache to sync additional knowledge sources on startup (expect to verify the integration when changing credentials or deploying to new environments).@main.py#261-295

Report bot (main_report.py)

  1. Populate the Reddit and/or X/Twitter environment variables noted above.
  2. Confirm the approval role and target channels exist in Discord.
  3. Run the scheduled worker:
    python main_report.py run --all
    • The command loads every configured monitor, fetches new events, stores bundles under data/reddit/events / data/x/events, and posts summaries when approvals permit.@main_report.py#151-571
  4. Use python main_report.py preview --subreddits longevity to inspect recent Reddit posts without generating bundles.@tools/reddit_monitor.py#398-506
  5. Schedule recurring runs by honoring report_interval_cron; Railway users typically pair this with python main_report.py run --all under a process supervisor (e.g., PM2) or leverage Railway’s cron add-on.
  6. Trigger ad-hoc collection with the built-in slash commands: /run_reddit_now, /run_x_now, or /run_all_now. Access is limited to approvers or user IDs listed under approval.manual_run_allowlist in config_report.yaml.@main_report.py#759-913
  7. The bot records run history in data/last_report_run.json so you can audit schedules across restarts.@main_report.py#151-269
  8. Deployments often run both bots in separate processes or via a supervisor (PM2, systemd, etc.).

Supporting tools

  • tools/dao_docs.py – Manages cloning/pulling the external DAO documentation repository, chunking markdown for search, and serving retrieval requests.@tools/dao_docs.py#7-145
  • tools/reddit_monitor.py – Provides CLI helpers for running and testing Reddit monitors outside the bot context (supports run and preview modes, caching, filtering, and JSON exports). Tune subreddit lists, keyword filters, and exclusion rules iteratively to avoid empty datasets or high-noise pulls.@tools/reddit_monitor.py#151-569
  • tools/event_report.py, tools/x_monitor.py, tools/reddit_search.py – Shared logic used by the report bot to build event bundles and request LLM summaries.
  • diagnose_reddit.py, diagnose_report_generation.py – Stand-alone diagnostics for checking API credentials and summarization pipelines before enabling automation.

Data directories & logs

  • data/ – Cached API responses, event bundles, and approval state (e.g. data/reddit/events, data/x/events, data/quarantine).
  • reports/ – JSON artifacts emitted by monitoring runs when not posted directly to Discord.@tools/reddit_monitor.py#547-566
  • logs/ – Runtime logs for both bots (e.g. logs/report_bot.log).@main_report.py#75-101
  • docs/ – Markdown corpus for the knowledge bot (ignored by git).

Troubleshooting checklist

  • Knowledge bot won’t start: verify SYSTEM_PROMPT_DAOCORD (or SYSTEM_PROMPT_DAOCORD_B64) is set; the loader raises if no prompt is found.@main.py#156-199
  • Responses lack citations: ensure docs/ contains markdown files; empty corpora will yield “No relevant documentation found.” replies.@main.py#469-487
  • Report bot posts nothing: confirm Reddit and X environment variables resolve to lists (no literal $VAR strings) and that REDDIT_REPORTS_ENABLED/TWITTER_REPORTS_ENABLED are truthy; if results remain empty, relax filters or expand keyword/user lists incrementally.@config_report.yaml#52-107
  • Approval queue stalls: check that the approver role ID is set and that users have permissions to react with the configured emojis.@config_report.yaml#26-45

Architecture & Code Organization

Core files

  • main.py (632 lines) – Knowledge bot entry point. Loads config.yaml, initializes DAODocsTool and optional GDocsCache, registers Discord event handlers, and serves LLM-powered answers with document citations.
  • main_report.py (1388 lines) – Report bot entry point. Loads config_report.yaml, runs self-tests for Reddit/X APIs, schedules cron-based report generation, registers slash commands for manual triggers, and manages approval workflows.
  • llm_config.py (13473 chars) – Unified LLM provider abstraction. Supports OpenAI-compatible APIs (OpenAI, Anthropic, Google, Groq, Mistral, OpenRouter, xAI, local Ollama/LMStudio/vLLM). Handles provider selection, model switching, and fallback logic.
  • config.yaml – Knowledge bot configuration (Discord tokens, LLM providers, system prompt, permissions, Google Docs settings).
  • config_report.yaml – Report bot configuration (Discord tokens, cron schedule, approval workflow, Reddit/X credentials, LLM providers for summarization).

Tools directory

  • tools/dao_docs.py – Clones/pulls a GitHub repo into docs/, chunks markdown files (1500 words per chunk), and provides fuzzy search via RapidFuzz. Used by knowledge bot for document retrieval.
  • tools/gdocs_cache.py – Google Drive integration for syncing Google Docs as markdown exports. Requires service account credentials. Currently regressed and needs debugging.
  • tools/reddit_monitor.py – CLI and library for Reddit monitoring. Supports run (fetch + bundle events) and preview (inspect recent posts) modes. Configurable filters, caching, and JSON export.
  • tools/reddit_search.py – Low-level Reddit API wrapper using praw. Handles subreddit scanning, keyword searches, and caching.
  • tools/x_monitor.py – X/Twitter monitoring (largely untested). Parallel structure to Reddit monitor.
  • tools/x_search.py – Low-level X API v2 wrapper. Requires bearer token.
  • tools/event_report.py – Bundles events from Reddit/X monitors and generates LLM summaries for Discord posting.

Diagnostic scripts

  • diagnose_reddit.py – Tests Reddit API credentials and searches. Run before enabling Reddit monitoring.
  • diagnose_report_generation.py – End-to-end test of report generation pipeline (fetch events, summarize, format for Discord).

Data persistence

  • data/reddit/events/ – JSON bundles of Reddit posts + comments + metadata.
  • data/x/events/ – JSON bundles of X/Twitter posts + threads.
  • data/quarantine/ – Reports pending approval.
  • data/approved/ – Approved reports (can be ingested by knowledge bot if desired).
  • data/last_report_run.json – Tracks last successful run time and next scheduled run.
  • logs/report_bot.log – Report bot runtime logs.
  • reports/ – JSON artifacts from CLI monitor runs.
  • docs/ – Markdown corpus for knowledge bot (gitignored; populated via tools/dao_docs.py or manual copy).

Outstanding work

Knowledge bot (main.py)

  • System prompt – Provide a production-ready system prompt via SYSTEM_PROMPT_DAOCORD (mandatory on startup). No fallback is bundled.
  • Documentation source – Populate or point docs/ to the DAO knowledge source the organization wants to expose, and review REPO_URL in tools/dao_docs.py if a different repository should be cloned.
  • Relevance tuning – Tune min_relevance_score in config.yaml once you observe real queries to balance coverage and hallucination risk.@main.py#489-510
  • Model selection – Review LLM model lists in config.yaml and set primary_provider/primary_model appropriately.@config.yaml#88-127
  • Google Docs sync – If enabling Google Docs integration, debug tools/gdocs_cache.py and verify service account credentials are correctly mounted.

Report bot (main_report.py)

  • API credentials – Fill Reddit and X/Twitter environment variables (REDDIT_*, X_*) so monitors can authenticate; current defaults assume they remain to be provided.@config_report.yaml#52-107
  • Twitter validation – Decide when to flip twitter_reports_enabled to true after validating Twitter credentials. X/Twitter monitoring is largely untested.@config_report.yaml#52-75
  • Approval workflow – Finalize the approval workflow by creating the approver role in Discord and verifying the data/quarantine / data/approved directories are monitored as part of operations.@config_report.yaml#26-45
  • Scheduling – Schedule the bot (cron, PM2, or other supervisor) so report_interval_cron actually triggers regular runs; calibrate the cron frequency against API quotas because each run re-collects from scratch.@config_report.yaml#15-45 @main_report.py#151-571
  • Filter tuning – Iterate on monitor definitions (subreddits, keyword allow/exclude lists, tracked accounts) to balance recall vs noise. Reddit in particular can oscillate between zero hits and low-signal floods without tuning.@tools/reddit_monitor.py#151-569
  • Testing – Run python diagnose_reddit.py and python diagnose_report_generation.py to validate the pipeline before production deployment.
  • Prompt hygieneREPORTGEN_SYSTEM_PROMPT sets the base formatting + tone, REPORT_CONTENT_INTENTION appends topical focus instructions, and the custom_instructions block in config_report.yaml is currently unused. Merge important guidance into the env-driven strings until code explicitly consumes custom_instructions.

Common Pitfalls & Debugging

Knowledge bot issues

  • Bot won't start: Verify SYSTEM_PROMPT_DAOCORD (or SYSTEM_PROMPT_DAOCORD_B64) is set; the loader raises if no prompt is found.@main.py#156-199
  • No responses: Ensure docs/ contains markdown files; empty corpora will yield "No relevant documentation found." replies.@main.py#469-487
  • Google Docs sync fails: Check service account credentials and folder permissions. This integration is currently regressed and may need debugging.

Report bot issues

  • No reports generated: Confirm Reddit and X environment variables resolve to lists (no literal $VAR strings) and that REDDIT_REPORTS_ENABLED/TWITTER_REPORTS_ENABLED are true.@config_report.yaml#52-107
  • Empty results: If results remain empty, relax filters or expand keyword/user lists incrementally. Check data/reddit/events/ and data/x/events/ for event bundles.
  • Approval queue stalls: Check that the approver role ID is set and that users have permissions to react with the configured emojis.@config_report.yaml#26-45
  • Cron not triggering: Verify report_interval_cron syntax and ensure the bot process stays running (use PM2, systemd, or Railway's process manager. A lightweight supervisor may be worth coding.).

Environment variable debugging

  • Railway: Set variables in the Railway dashboard. Multi-line values (like system prompts) are supported.
  • Local: Use a .env file (not tracked in git) or export variables in your shell.
  • Validation: Run python -c "import os; print(os.getenv('REDDIT_CLIENT_ID'))" to verify variables are accessible.

Logs and diagnostics

  • Knowledge bot: Logs to stdout/stderr. Set debug_prompt: true in config.yaml to see full prompts.
  • Report bot: Logs to logs/report_bot.log. Check data/last_report_run.json for scheduling info.
  • Reddit API: Check data/logs/reddit_calls.jsonl for API call history.
  • X API: Check data/logs/x_calls.jsonl for API call history.

Deployment Notes

Railway-specific

  • Both bots are designed for Railway deployment with environment-variable expansion.
  • For Google credentials, either supply individual GOOGLE_* vars or mount the JSON blob and point google_application_credentials accordingly.@config.yaml#19-39
  • Use Railway's cron add-on or a process supervisor (PM2) to schedule python main_report.py run --all.
  • Ensure persistent volumes are configured for data/, logs/, and reports/ if you want to preserve state across deploys.

Local development

  • Run both bots in separate terminals: python main.py and python main_report.py run --all.
  • Use a .env file or export variables in your shell.
  • Test with python main_report.py preview --subreddits longevity to inspect Reddit data without generating reports.

Production checklist

  1. Set all required environment variables (see Configuration sections).
  2. Create two Discord applications with MESSAGE CONTENT INTENT enabled.
  3. Populate docs/ with your organization's markdown documentation.
  4. Test locally before deploying.
  5. Configure approval role and channels for report bot.
  6. Schedule report bot runs (cron, PM2, or Railway cron add-on).
  7. Monitor logs and data/last_report_run.json for health checks.
  8. Tune filters and relevance scores based on real usage.

Additional Resources

  • README_BOTS.md – Detailed setup guide with environment variable examples, Discord bot creation steps, and troubleshooting.
  • ASSESSMENT.md – Documents known Reddit configuration issues and debugging steps.
  • REPORTING_GUIDE.md – Covers report generation workflows and approval process.
  • BUG_FIX_REPORT.md, FINAL_FIX_SUMMARY.md – Historical bug fixes and resolution notes.
  • config-example.yaml – Example configuration template for knowledge bot.

About

Fork of LLMCord; bridging LLMs to Discord bots, focused on DeSci DAOs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%