DAOcord combines two cooperating Discord automations that share an LLM-first workflow: a knowledge bot that answers questions from your organization's documentation, and a report bot that monitors external sources to produce research updates. This README explains how the repository is structured, how to configure and run each bot, and which follow-up tasks remain before production deployment.
If you're picking up this codebase:
- Read the Overview (below) to understand the two-bot architecture.
- Check Prerequisites and install dependencies via
requirements.txt. - Review existing configs (
config.yaml,config_report.yaml) to see what environment variables are expected. - Set up Discord bots at discord.com/developers/applications — you need two separate applications (one for knowledge, one for reports), each with MESSAGE CONTENT INTENT enabled.
- Populate environment variables for your deployment platform (Railway, local
.env, etc.) — see Configuration sections below. - Test locally before deploying: run
python main.pyandpython main_report.py run --allto verify connectivity. - Consult supporting docs:
README_BOTS.mdhas detailed setup steps,ASSESSMENT.mddocuments known Reddit config issues,REPORTING_GUIDE.mdcovers report workflows.
| Script | Role | Primary configuration | Highlights |
|---|---|---|---|
main.py |
Knowledge bot that serves DAO documentation to Discord users | config.yaml |
Document chunking via tools/dao_docs, /model switching, optional Google Docs sync, relevance gating |
main_report.py |
Research report bot that monitors X/Twitter & Reddit and posts curated updates | config_report.yaml |
Cron-style scheduling, approval workflow, Reddit & X data collectors, LLM-powered summaries |
main.py– Discord bot answering questions with DAO docs.main_report.py– Discord bot generating scheduled research reports.config.yaml/config_report.yaml– Runtime configuration for the two bots.tools/– Support utilities (DAO doc ingestion, Reddit/X monitors, report generation helpers).diagnose_*.py– Utilities for debugging report pipelines.docs/– Knowledge base consumed by the main bot (ignored in git; populated locally).data/,logs/,reports/– Persistent state for monitors and generated output.
- Python 3.10+ (3.11 recommended).
- Discord applications for each bot with “MESSAGE CONTENT INTENT”.
- API keys for whichever LLM providers you intend to use (OpenAI, Anthropic, Google, etc.).
- Reddit and X/Twitter API credentials for the report bot if those sources are enabled.
- Clone the project and create a virtual environment (recommended).
- Install dependencies:
python -m pip install -r requirements.txt
- Copy
config-example.yamltoconfig.yamland tailor it, or start from the providedconfig.yaml/config_report.yamltemplates and fill in environment variables.
Both bots expand environment variables referenced in their YAML config files, so secrets stay outside of version control.
- Core settings live in
config.yaml, which is loaded with environment variable expansion.@config.yaml#1-128 - DAO documentation is sourced from the
docs/directory. If the folder is empty,tools/dao_docs.pycan clone the remote repository defined byREPO_URLand chunk markdown into searchable pieces.@tools/dao_docs.py#7-144 - Supports hybrid corpora: GitHub-hosted markdown (via the default repo clone) and optional Google Docs exports synchronized through
GDocsCache(JSON service-account credentials referenced inconfig.yaml). Google sync has regressed recently and may require debugging before production use; the GitHub flow is the current, primarily tested path.@tools/gdocs_cache.py#1-192 @main.py#261-295 - Railway deployment notes: configs lean on environment-variable expansion so you can inject secrets via the Railway dashboard. For Google credentials, either supply individual
GOOGLE_*vars or mount the JSON blob and pointgoogle_application_credentialsaccordingly.@config.yaml#19-39 - Important environment variables:
| Variable | Purpose |
|---|---|
KNOWLEDGE_BOT_DISCORD_TOKEN, KNOWLEDGE_BOT_CLIENT_ID, KNOWLEDGE_BOT_STATUS_MESSAGE |
Discord authentication and status text |
SYSTEM_PROMPT_DAOCORD or SYSTEM_PROMPT_DAOCORD_B64 |
Required persona for the bot; no fallback is bundled, so one must be provided.@main.py#156-199 |
LLM_PRIMARY_PROVIDER, LLM_PRIMARY_MODEL |
Selects the default provider/model pair used by the bot.@config.yaml#88-127 |
Provider API keys (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, OPENROUTER_API_KEY) |
Enables each configured LLM endpoint.@config.yaml#56-86 |
Optional Google service account fields (e.g. USE_GOOGLE_DOCS, GOOGLE_*) |
Enable Google Docs synchronization via tools/gdocs_cache.py when set.@config.yaml#19-39 |
- Adjust
min_relevance_scoreinconfig.yamlto control how strict document lookup should be for DAO answers.@main.py#489-510
- Operates from
config_report.yaml, which also resolves$VARNAMEplaceholders via environment variables.@main_report.py#41-150 - Data source coverage: Reddit monitoring is the most exercised path today; X/Twitter ingestion follows the same interface but remains largely untested, so sandbox it before turning on production posting.
- Key environment variables:
| Variable | Purpose |
|---|---|
REPORT_BOT_DISCORD_TOKEN, REPORT_BOT_CLIENT_ID, REPORT_BOT_STATUS_MESSAGE |
Discord credentials for the report bot |
REPORT_CHANNEL_IDS |
Comma-separated channel IDs where reports will be posted.@config_report.yaml#15-44 |
REPORT_CONTENT_INTENTION, REPORTGEN_SYSTEM_PROMPT |
Direct the LLM summarization pipeline.@config_report.yaml#20-21 |
REPORT_BOT_APPROVERS_ROLE_ID |
Grants the approval workflow access to a specific Discord role.@config_report.yaml#26-37 |
REDDIT_CLIENT_ID, REDDIT_CLIENT_SECRET, REDDIT_REPORTGEN_SUBREDDITS, REDDIT_REPORTGEN_KEYWORDS, REDDIT_REPORTGEN_EXCLUDE_KEYWORDS, REDDIT_REPORTS_ENABLED |
Reddit API authentication and monitoring scope.@config_report.yaml#78-107 |
X_BEARER_TOKEN, X_REPORTGEN_KEYWORDS, X_REPORTGEN_USERS, X_REPORTGEN_EXCLUDE_KEYWORDS, TWITTER_REPORTS_ENABLED |
Enables and configures X/Twitter monitoring.@config_report.yaml#52-75 |
LLM provider keys (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, etc.) |
Used for report summarization.@config_report.yaml#107-143 |
- The approval workflow stores pending reports in
data/quarantineand approved ones indata/approved. Ensure those directories exist and are writable.@config_report.yaml#26-45
- Ensure the DAO documentation exists in
docs/(or allowtools/dao_docsto clone it on first run). - Verify the system prompt and provider settings are set via environment variables.
- Launch the bot:
python main.py
- Mention the bot in Discord or DM it. It will:
- Retrieve relevant markdown snippets with RapidFuzz scoring before issuing an LLM request.@main.py#469-539
- Reply with LLM output plus cited file sources.@main.py#555-561
- Support
/modelfor swapping among configured models.@main.py#297-324
- Optional: enable the Google Docs cache to sync additional knowledge sources on startup (expect to verify the integration when changing credentials or deploying to new environments).@main.py#261-295
- Populate the Reddit and/or X/Twitter environment variables noted above.
- Confirm the approval role and target channels exist in Discord.
- Run the scheduled worker:
python main_report.py run --all
- The command loads every configured monitor, fetches new events, stores bundles under
data/reddit/events/data/x/events, and posts summaries when approvals permit.@main_report.py#151-571
- The command loads every configured monitor, fetches new events, stores bundles under
- Use
python main_report.py preview --subreddits longevityto inspect recent Reddit posts without generating bundles.@tools/reddit_monitor.py#398-506 - Schedule recurring runs by honoring
report_interval_cron; Railway users typically pair this withpython main_report.py run --allunder a process supervisor (e.g., PM2) or leverage Railway’s cron add-on. - Trigger ad-hoc collection with the built-in slash commands:
/run_reddit_now,/run_x_now, or/run_all_now. Access is limited to approvers or user IDs listed underapproval.manual_run_allowlistinconfig_report.yaml.@main_report.py#759-913 - The bot records run history in
data/last_report_run.jsonso you can audit schedules across restarts.@main_report.py#151-269 - Deployments often run both bots in separate processes or via a supervisor (PM2, systemd, etc.).
tools/dao_docs.py– Manages cloning/pulling the external DAO documentation repository, chunking markdown for search, and serving retrieval requests.@tools/dao_docs.py#7-145tools/reddit_monitor.py– Provides CLI helpers for running and testing Reddit monitors outside the bot context (supportsrunandpreviewmodes, caching, filtering, and JSON exports). Tune subreddit lists, keyword filters, and exclusion rules iteratively to avoid empty datasets or high-noise pulls.@tools/reddit_monitor.py#151-569tools/event_report.py,tools/x_monitor.py,tools/reddit_search.py– Shared logic used by the report bot to build event bundles and request LLM summaries.diagnose_reddit.py,diagnose_report_generation.py– Stand-alone diagnostics for checking API credentials and summarization pipelines before enabling automation.
data/– Cached API responses, event bundles, and approval state (e.g.data/reddit/events,data/x/events,data/quarantine).reports/– JSON artifacts emitted by monitoring runs when not posted directly to Discord.@tools/reddit_monitor.py#547-566logs/– Runtime logs for both bots (e.g.logs/report_bot.log).@main_report.py#75-101docs/– Markdown corpus for the knowledge bot (ignored by git).
- Knowledge bot won’t start: verify
SYSTEM_PROMPT_DAOCORD(orSYSTEM_PROMPT_DAOCORD_B64) is set; the loader raises if no prompt is found.@main.py#156-199 - Responses lack citations: ensure
docs/contains markdown files; empty corpora will yield “No relevant documentation found.” replies.@main.py#469-487 - Report bot posts nothing: confirm Reddit and X environment variables resolve to lists (no literal
$VARstrings) and thatREDDIT_REPORTS_ENABLED/TWITTER_REPORTS_ENABLEDare truthy; if results remain empty, relax filters or expand keyword/user lists incrementally.@config_report.yaml#52-107 - Approval queue stalls: check that the approver role ID is set and that users have permissions to react with the configured emojis.@config_report.yaml#26-45
main.py(632 lines) – Knowledge bot entry point. Loadsconfig.yaml, initializesDAODocsTooland optionalGDocsCache, registers Discord event handlers, and serves LLM-powered answers with document citations.main_report.py(1388 lines) – Report bot entry point. Loadsconfig_report.yaml, runs self-tests for Reddit/X APIs, schedules cron-based report generation, registers slash commands for manual triggers, and manages approval workflows.llm_config.py(13473 chars) – Unified LLM provider abstraction. Supports OpenAI-compatible APIs (OpenAI, Anthropic, Google, Groq, Mistral, OpenRouter, xAI, local Ollama/LMStudio/vLLM). Handles provider selection, model switching, and fallback logic.config.yaml– Knowledge bot configuration (Discord tokens, LLM providers, system prompt, permissions, Google Docs settings).config_report.yaml– Report bot configuration (Discord tokens, cron schedule, approval workflow, Reddit/X credentials, LLM providers for summarization).
tools/dao_docs.py– Clones/pulls a GitHub repo intodocs/, chunks markdown files (1500 words per chunk), and provides fuzzy search via RapidFuzz. Used by knowledge bot for document retrieval.tools/gdocs_cache.py– Google Drive integration for syncing Google Docs as markdown exports. Requires service account credentials. Currently regressed and needs debugging.tools/reddit_monitor.py– CLI and library for Reddit monitoring. Supportsrun(fetch + bundle events) andpreview(inspect recent posts) modes. Configurable filters, caching, and JSON export.tools/reddit_search.py– Low-level Reddit API wrapper usingpraw. Handles subreddit scanning, keyword searches, and caching.tools/x_monitor.py– X/Twitter monitoring (largely untested). Parallel structure to Reddit monitor.tools/x_search.py– Low-level X API v2 wrapper. Requires bearer token.tools/event_report.py– Bundles events from Reddit/X monitors and generates LLM summaries for Discord posting.
diagnose_reddit.py– Tests Reddit API credentials and searches. Run before enabling Reddit monitoring.diagnose_report_generation.py– End-to-end test of report generation pipeline (fetch events, summarize, format for Discord).
data/reddit/events/– JSON bundles of Reddit posts + comments + metadata.data/x/events/– JSON bundles of X/Twitter posts + threads.data/quarantine/– Reports pending approval.data/approved/– Approved reports (can be ingested by knowledge bot if desired).data/last_report_run.json– Tracks last successful run time and next scheduled run.logs/report_bot.log– Report bot runtime logs.reports/– JSON artifacts from CLI monitor runs.docs/– Markdown corpus for knowledge bot (gitignored; populated viatools/dao_docs.pyor manual copy).
- System prompt – Provide a production-ready system prompt via
SYSTEM_PROMPT_DAOCORD(mandatory on startup). No fallback is bundled. - Documentation source – Populate or point
docs/to the DAO knowledge source the organization wants to expose, and reviewREPO_URLintools/dao_docs.pyif a different repository should be cloned. - Relevance tuning – Tune
min_relevance_scoreinconfig.yamlonce you observe real queries to balance coverage and hallucination risk.@main.py#489-510 - Model selection – Review LLM model lists in
config.yamland setprimary_provider/primary_modelappropriately.@config.yaml#88-127 - Google Docs sync – If enabling Google Docs integration, debug
tools/gdocs_cache.pyand verify service account credentials are correctly mounted.
- API credentials – Fill Reddit and X/Twitter environment variables (
REDDIT_*,X_*) so monitors can authenticate; current defaults assume they remain to be provided.@config_report.yaml#52-107 - Twitter validation – Decide when to flip
twitter_reports_enabledtotrueafter validating Twitter credentials. X/Twitter monitoring is largely untested.@config_report.yaml#52-75 - Approval workflow – Finalize the approval workflow by creating the approver role in Discord and verifying the
data/quarantine/data/approveddirectories are monitored as part of operations.@config_report.yaml#26-45 - Scheduling – Schedule the bot (cron, PM2, or other supervisor) so
report_interval_cronactually triggers regular runs; calibrate the cron frequency against API quotas because each run re-collects from scratch.@config_report.yaml#15-45 @main_report.py#151-571 - Filter tuning – Iterate on monitor definitions (subreddits, keyword allow/exclude lists, tracked accounts) to balance recall vs noise. Reddit in particular can oscillate between zero hits and low-signal floods without tuning.@tools/reddit_monitor.py#151-569
- Testing – Run
python diagnose_reddit.pyandpython diagnose_report_generation.pyto validate the pipeline before production deployment. - Prompt hygiene –
REPORTGEN_SYSTEM_PROMPTsets the base formatting + tone,REPORT_CONTENT_INTENTIONappends topical focus instructions, and thecustom_instructionsblock inconfig_report.yamlis currently unused. Merge important guidance into the env-driven strings until code explicitly consumescustom_instructions.
- Bot won't start: Verify
SYSTEM_PROMPT_DAOCORD(orSYSTEM_PROMPT_DAOCORD_B64) is set; the loader raises if no prompt is found.@main.py#156-199 - No responses: Ensure
docs/contains markdown files; empty corpora will yield "No relevant documentation found." replies.@main.py#469-487 - Google Docs sync fails: Check service account credentials and folder permissions. This integration is currently regressed and may need debugging.
- No reports generated: Confirm Reddit and X environment variables resolve to lists (no literal
$VARstrings) and thatREDDIT_REPORTS_ENABLED/TWITTER_REPORTS_ENABLEDare true.@config_report.yaml#52-107 - Empty results: If results remain empty, relax filters or expand keyword/user lists incrementally. Check
data/reddit/events/anddata/x/events/for event bundles. - Approval queue stalls: Check that the approver role ID is set and that users have permissions to react with the configured emojis.@config_report.yaml#26-45
- Cron not triggering: Verify
report_interval_cronsyntax and ensure the bot process stays running (use PM2, systemd, or Railway's process manager. A lightweight supervisor may be worth coding.).
- Railway: Set variables in the Railway dashboard. Multi-line values (like system prompts) are supported.
- Local: Use a
.envfile (not tracked in git) or export variables in your shell. - Validation: Run
python -c "import os; print(os.getenv('REDDIT_CLIENT_ID'))"to verify variables are accessible.
- Knowledge bot: Logs to stdout/stderr. Set
debug_prompt: trueinconfig.yamlto see full prompts. - Report bot: Logs to
logs/report_bot.log. Checkdata/last_report_run.jsonfor scheduling info. - Reddit API: Check
data/logs/reddit_calls.jsonlfor API call history. - X API: Check
data/logs/x_calls.jsonlfor API call history.
- Both bots are designed for Railway deployment with environment-variable expansion.
- For Google credentials, either supply individual
GOOGLE_*vars or mount the JSON blob and pointgoogle_application_credentialsaccordingly.@config.yaml#19-39 - Use Railway's cron add-on or a process supervisor (PM2) to schedule
python main_report.py run --all. - Ensure persistent volumes are configured for
data/,logs/, andreports/if you want to preserve state across deploys.
- Run both bots in separate terminals:
python main.pyandpython main_report.py run --all. - Use a
.envfile or export variables in your shell. - Test with
python main_report.py preview --subreddits longevityto inspect Reddit data without generating reports.
- Set all required environment variables (see Configuration sections).
- Create two Discord applications with MESSAGE CONTENT INTENT enabled.
- Populate
docs/with your organization's markdown documentation. - Test locally before deploying.
- Configure approval role and channels for report bot.
- Schedule report bot runs (cron, PM2, or Railway cron add-on).
- Monitor logs and
data/last_report_run.jsonfor health checks. - Tune filters and relevance scores based on real usage.
README_BOTS.md– Detailed setup guide with environment variable examples, Discord bot creation steps, and troubleshooting.ASSESSMENT.md– Documents known Reddit configuration issues and debugging steps.REPORTING_GUIDE.md– Covers report generation workflows and approval process.BUG_FIX_REPORT.md,FINAL_FIX_SUMMARY.md– Historical bug fixes and resolution notes.config-example.yaml– Example configuration template for knowledge bot.