DAOcord Knowledge & Research Bots

DAOcord combines two cooperating Discord automations that share an LLM-first workflow: a knowledge bot that answers questions from your organization's documentation, and a report bot that monitors external sources to produce research updates. This README explains how the repository is structured, how to configure and run each bot, and which follow-up tasks remain before production deployment.

Quick Start for New Maintainers

If you're picking up this codebase:

Read the Overview (below) to understand the two-bot architecture.
Check Prerequisites and install dependencies via requirements.txt.
Review existing configs (config.yaml, config_report.yaml) to see what environment variables are expected.
Set up Discord bots at discord.com/developers/applications — you need two separate applications (one for knowledge, one for reports), each with MESSAGE CONTENT INTENT enabled.
Populate environment variables for your deployment platform (Railway, local .env, etc.) — see Configuration sections below.
Test locally before deploying: run python main.py and python main_report.py run --all to verify connectivity.
Consult supporting docs: README_BOTS.md has detailed setup steps, ASSESSMENT.md documents known Reddit config issues, REPORTING_GUIDE.md covers report workflows.

Overview

Script	Role	Primary configuration	Highlights
`main.py`	Knowledge bot that serves DAO documentation to Discord users	`config.yaml`	Document chunking via `tools/dao_docs`, `/model` switching, optional Google Docs sync, relevance gating
`main_report.py`	Research report bot that monitors X/Twitter & Reddit and posts curated updates	`config_report.yaml`	Cron-style scheduling, approval workflow, Reddit & X data collectors, LLM-powered summaries

Repository layout

main.py – Discord bot answering questions with DAO docs.
main_report.py – Discord bot generating scheduled research reports.
config.yaml / config_report.yaml – Runtime configuration for the two bots.
tools/ – Support utilities (DAO doc ingestion, Reddit/X monitors, report generation helpers).
diagnose_*.py – Utilities for debugging report pipelines.
docs/ – Knowledge base consumed by the main bot (ignored in git; populated locally).
data/, logs/, reports/ – Persistent state for monitors and generated output.

Prerequisites

Python 3.10+ (3.11 recommended).
Discord applications for each bot with “MESSAGE CONTENT INTENT”.
API keys for whichever LLM providers you intend to use (OpenAI, Anthropic, Google, etc.).
Reddit and X/Twitter API credentials for the report bot if those sources are enabled.

Installation

Clone the project and create a virtual environment (recommended).

Install dependencies:

python -m pip install -r requirements.txt

Copy config-example.yaml to config.yaml and tailor it, or start from the provided config.yaml/config_report.yaml templates and fill in environment variables.

Configuration

Both bots expand environment variables referenced in their YAML config files, so secrets stay outside of version control.

Knowledge bot (`main.py`)

Core settings live in config.yaml, which is loaded with environment variable expansion.@config.yaml#1-128
DAO documentation is sourced from the docs/ directory. If the folder is empty, tools/dao_docs.py can clone the remote repository defined by REPO_URL and chunk markdown into searchable pieces.@tools/dao_docs.py#7-144
Supports hybrid corpora: GitHub-hosted markdown (via the default repo clone) and optional Google Docs exports synchronized through GDocsCache (JSON service-account credentials referenced in config.yaml). Google sync has regressed recently and may require debugging before production use; the GitHub flow is the current, primarily tested path.@tools/gdocs_cache.py#1-192 @main.py#261-295
Railway deployment notes: configs lean on environment-variable expansion so you can inject secrets via the Railway dashboard. For Google credentials, either supply individual GOOGLE_* vars or mount the JSON blob and point google_application_credentials accordingly.@config.yaml#19-39
Important environment variables:

Variable	Purpose
`KNOWLEDGE_BOT_DISCORD_TOKEN`, `KNOWLEDGE_BOT_CLIENT_ID`, `KNOWLEDGE_BOT_STATUS_MESSAGE`	Discord authentication and status text
`SYSTEM_PROMPT_DAOCORD` or `SYSTEM_PROMPT_DAOCORD_B64`	Required persona for the bot; no fallback is bundled, so one must be provided.@main.py#156-199
`LLM_PRIMARY_PROVIDER`, `LLM_PRIMARY_MODEL`	Selects the default provider/model pair used by the bot.@config.yaml#88-127
Provider API keys (e.g. `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, `OPENROUTER_API_KEY`)	Enables each configured LLM endpoint.@config.yaml#56-86
Optional Google service account fields (e.g. `USE_GOOGLE_DOCS`, `GOOGLE_*`)	Enable Google Docs synchronization via `tools/gdocs_cache.py` when set.@config.yaml#19-39

Adjust min_relevance_score in config.yaml to control how strict document lookup should be for DAO answers.@main.py#489-510

Report bot (`main_report.py`)

Operates from config_report.yaml, which also resolves $VARNAME placeholders via environment variables.@main_report.py#41-150
Data source coverage: Reddit monitoring is the most exercised path today; X/Twitter ingestion follows the same interface but remains largely untested, so sandbox it before turning on production posting.
Key environment variables:

Variable	Purpose
`REPORT_BOT_DISCORD_TOKEN`, `REPORT_BOT_CLIENT_ID`, `REPORT_BOT_STATUS_MESSAGE`	Discord credentials for the report bot
`REPORT_CHANNEL_IDS`	Comma-separated channel IDs where reports will be posted.@config_report.yaml#15-44
`REPORT_CONTENT_INTENTION`, `REPORTGEN_SYSTEM_PROMPT`	Direct the LLM summarization pipeline.@config_report.yaml#20-21
`REPORT_BOT_APPROVERS_ROLE_ID`	Grants the approval workflow access to a specific Discord role.@config_report.yaml#26-37
`REDDIT_CLIENT_ID`, `REDDIT_CLIENT_SECRET`, `REDDIT_REPORTGEN_SUBREDDITS`, `REDDIT_REPORTGEN_KEYWORDS`, `REDDIT_REPORTGEN_EXCLUDE_KEYWORDS`, `REDDIT_REPORTS_ENABLED`	Reddit API authentication and monitoring scope.@config_report.yaml#78-107
`X_BEARER_TOKEN`, `X_REPORTGEN_KEYWORDS`, `X_REPORTGEN_USERS`, `X_REPORTGEN_EXCLUDE_KEYWORDS`, `TWITTER_REPORTS_ENABLED`	Enables and configures X/Twitter monitoring.@config_report.yaml#52-75
LLM provider keys (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, etc.)	Used for report summarization.@config_report.yaml#107-143

The approval workflow stores pending reports in data/quarantine and approved ones in data/approved. Ensure those directories exist and are writable.@config_report.yaml#26-45

Running the bots

Knowledge bot (`main.py`)

Ensure the DAO documentation exists in docs/ (or allow tools/dao_docs to clone it on first run).
Verify the system prompt and provider settings are set via environment variables.
Launch the bot:
```
python main.py
```
Mention the bot in Discord or DM it. It will:
- Retrieve relevant markdown snippets with RapidFuzz scoring before issuing an LLM request.@main.py#469-539
- Reply with LLM output plus cited file sources.@main.py#555-561
- Support /model for swapping among configured models.@main.py#297-324
Optional: enable the Google Docs cache to sync additional knowledge sources on startup (expect to verify the integration when changing credentials or deploying to new environments).@main.py#261-295

Report bot (`main_report.py`)

Populate the Reddit and/or X/Twitter environment variables noted above.
Confirm the approval role and target channels exist in Discord.
Run the scheduled worker:
```
python main_report.py run --all
```
- The command loads every configured monitor, fetches new events, stores bundles under data/reddit/events / data/x/events, and posts summaries when approvals permit.@main_report.py#151-571
Use python main_report.py preview --subreddits longevity to inspect recent Reddit posts without generating bundles.@tools/reddit_monitor.py#398-506
Schedule recurring runs by honoring report_interval_cron; Railway users typically pair this with python main_report.py run --all under a process supervisor (e.g., PM2) or leverage Railway’s cron add-on.
Trigger ad-hoc collection with the built-in slash commands: /run_reddit_now, /run_x_now, or /run_all_now. Access is limited to approvers or user IDs listed under approval.manual_run_allowlist in config_report.yaml.@main_report.py#759-913
The bot records run history in data/last_report_run.json so you can audit schedules across restarts.@main_report.py#151-269
Deployments often run both bots in separate processes or via a supervisor (PM2, systemd, etc.).

Supporting tools

tools/dao_docs.py – Manages cloning/pulling the external DAO documentation repository, chunking markdown for search, and serving retrieval requests.@tools/dao_docs.py#7-145
tools/reddit_monitor.py – Provides CLI helpers for running and testing Reddit monitors outside the bot context (supports run and preview modes, caching, filtering, and JSON exports). Tune subreddit lists, keyword filters, and exclusion rules iteratively to avoid empty datasets or high-noise pulls.@tools/reddit_monitor.py#151-569
tools/event_report.py, tools/x_monitor.py, tools/reddit_search.py – Shared logic used by the report bot to build event bundles and request LLM summaries.
diagnose_reddit.py, diagnose_report_generation.py – Stand-alone diagnostics for checking API credentials and summarization pipelines before enabling automation.

Data directories & logs

data/ – Cached API responses, event bundles, and approval state (e.g. data/reddit/events, data/x/events, data/quarantine).
reports/ – JSON artifacts emitted by monitoring runs when not posted directly to Discord.@tools/reddit_monitor.py#547-566
logs/ – Runtime logs for both bots (e.g. logs/report_bot.log).@main_report.py#75-101
docs/ – Markdown corpus for the knowledge bot (ignored by git).

Troubleshooting checklist

Knowledge bot won’t start: verify SYSTEM_PROMPT_DAOCORD (or SYSTEM_PROMPT_DAOCORD_B64) is set; the loader raises if no prompt is found.@main.py#156-199
Responses lack citations: ensure docs/ contains markdown files; empty corpora will yield “No relevant documentation found.” replies.@main.py#469-487
Report bot posts nothing: confirm Reddit and X environment variables resolve to lists (no literal $VAR strings) and that REDDIT_REPORTS_ENABLED/TWITTER_REPORTS_ENABLED are truthy; if results remain empty, relax filters or expand keyword/user lists incrementally.@config_report.yaml#52-107
Approval queue stalls: check that the approver role ID is set and that users have permissions to react with the configured emojis.@config_report.yaml#26-45

Architecture & Code Organization

Core files

main.py (632 lines) – Knowledge bot entry point. Loads config.yaml, initializes DAODocsTool and optional GDocsCache, registers Discord event handlers, and serves LLM-powered answers with document citations.
main_report.py (1388 lines) – Report bot entry point. Loads config_report.yaml, runs self-tests for Reddit/X APIs, schedules cron-based report generation, registers slash commands for manual triggers, and manages approval workflows.
llm_config.py (13473 chars) – Unified LLM provider abstraction. Supports OpenAI-compatible APIs (OpenAI, Anthropic, Google, Groq, Mistral, OpenRouter, xAI, local Ollama/LMStudio/vLLM). Handles provider selection, model switching, and fallback logic.
config.yaml – Knowledge bot configuration (Discord tokens, LLM providers, system prompt, permissions, Google Docs settings).
config_report.yaml – Report bot configuration (Discord tokens, cron schedule, approval workflow, Reddit/X credentials, LLM providers for summarization).

Tools directory

tools/dao_docs.py – Clones/pulls a GitHub repo into docs/, chunks markdown files (1500 words per chunk), and provides fuzzy search via RapidFuzz. Used by knowledge bot for document retrieval.
tools/gdocs_cache.py – Google Drive integration for syncing Google Docs as markdown exports. Requires service account credentials. Currently regressed and needs debugging.
tools/reddit_monitor.py – CLI and library for Reddit monitoring. Supports run (fetch + bundle events) and preview (inspect recent posts) modes. Configurable filters, caching, and JSON export.
tools/reddit_search.py – Low-level Reddit API wrapper using praw. Handles subreddit scanning, keyword searches, and caching.
tools/x_monitor.py – X/Twitter monitoring (largely untested). Parallel structure to Reddit monitor.
tools/x_search.py – Low-level X API v2 wrapper. Requires bearer token.
tools/event_report.py – Bundles events from Reddit/X monitors and generates LLM summaries for Discord posting.

Diagnostic scripts

diagnose_reddit.py – Tests Reddit API credentials and searches. Run before enabling Reddit monitoring.
diagnose_report_generation.py – End-to-end test of report generation pipeline (fetch events, summarize, format for Discord).

Data persistence

data/reddit/events/ – JSON bundles of Reddit posts + comments + metadata.
data/x/events/ – JSON bundles of X/Twitter posts + threads.
data/quarantine/ – Reports pending approval.
data/approved/ – Approved reports (can be ingested by knowledge bot if desired).
data/last_report_run.json – Tracks last successful run time and next scheduled run.
logs/report_bot.log – Report bot runtime logs.
reports/ – JSON artifacts from CLI monitor runs.
docs/ – Markdown corpus for knowledge bot (gitignored; populated via tools/dao_docs.py or manual copy).

Outstanding work

Knowledge bot (`main.py`)

System prompt – Provide a production-ready system prompt via SYSTEM_PROMPT_DAOCORD (mandatory on startup). No fallback is bundled.
Documentation source – Populate or point docs/ to the DAO knowledge source the organization wants to expose, and review REPO_URL in tools/dao_docs.py if a different repository should be cloned.
Relevance tuning – Tune min_relevance_score in config.yaml once you observe real queries to balance coverage and hallucination risk.@main.py#489-510
Model selection – Review LLM model lists in config.yaml and set primary_provider/primary_model appropriately.@config.yaml#88-127
Google Docs sync – If enabling Google Docs integration, debug tools/gdocs_cache.py and verify service account credentials are correctly mounted.

Report bot (`main_report.py`)

API credentials – Fill Reddit and X/Twitter environment variables (REDDIT_*, X_*) so monitors can authenticate; current defaults assume they remain to be provided.@config_report.yaml#52-107
Twitter validation – Decide when to flip twitter_reports_enabled to true after validating Twitter credentials. X/Twitter monitoring is largely untested.@config_report.yaml#52-75
Approval workflow – Finalize the approval workflow by creating the approver role in Discord and verifying the data/quarantine / data/approved directories are monitored as part of operations.@config_report.yaml#26-45
Scheduling – Schedule the bot (cron, PM2, or other supervisor) so report_interval_cron actually triggers regular runs; calibrate the cron frequency against API quotas because each run re-collects from scratch.@config_report.yaml#15-45 @main_report.py#151-571
Filter tuning – Iterate on monitor definitions (subreddits, keyword allow/exclude lists, tracked accounts) to balance recall vs noise. Reddit in particular can oscillate between zero hits and low-signal floods without tuning.@tools/reddit_monitor.py#151-569
Testing – Run python diagnose_reddit.py and python diagnose_report_generation.py to validate the pipeline before production deployment.
Prompt hygiene – REPORTGEN_SYSTEM_PROMPT sets the base formatting + tone, REPORT_CONTENT_INTENTION appends topical focus instructions, and the custom_instructions block in config_report.yaml is currently unused. Merge important guidance into the env-driven strings until code explicitly consumes custom_instructions.

Common Pitfalls & Debugging

Knowledge bot issues

Bot won't start: Verify SYSTEM_PROMPT_DAOCORD (or SYSTEM_PROMPT_DAOCORD_B64) is set; the loader raises if no prompt is found.@main.py#156-199
No responses: Ensure docs/ contains markdown files; empty corpora will yield "No relevant documentation found." replies.@main.py#469-487
Google Docs sync fails: Check service account credentials and folder permissions. This integration is currently regressed and may need debugging.

Report bot issues

No reports generated: Confirm Reddit and X environment variables resolve to lists (no literal $VAR strings) and that REDDIT_REPORTS_ENABLED/TWITTER_REPORTS_ENABLED are true.@config_report.yaml#52-107
Empty results: If results remain empty, relax filters or expand keyword/user lists incrementally. Check data/reddit/events/ and data/x/events/ for event bundles.
Approval queue stalls: Check that the approver role ID is set and that users have permissions to react with the configured emojis.@config_report.yaml#26-45
Cron not triggering: Verify report_interval_cron syntax and ensure the bot process stays running (use PM2, systemd, or Railway's process manager. A lightweight supervisor may be worth coding.).

Environment variable debugging

Railway: Set variables in the Railway dashboard. Multi-line values (like system prompts) are supported.
Local: Use a .env file (not tracked in git) or export variables in your shell.
Validation: Run python -c "import os; print(os.getenv('REDDIT_CLIENT_ID'))" to verify variables are accessible.

Logs and diagnostics

Knowledge bot: Logs to stdout/stderr. Set debug_prompt: true in config.yaml to see full prompts.
Report bot: Logs to logs/report_bot.log. Check data/last_report_run.json for scheduling info.
Reddit API: Check data/logs/reddit_calls.jsonl for API call history.
X API: Check data/logs/x_calls.jsonl for API call history.

Deployment Notes

Railway-specific

Both bots are designed for Railway deployment with environment-variable expansion.
For Google credentials, either supply individual GOOGLE_* vars or mount the JSON blob and point google_application_credentials accordingly.@config.yaml#19-39
Use Railway's cron add-on or a process supervisor (PM2) to schedule python main_report.py run --all.
Ensure persistent volumes are configured for data/, logs/, and reports/ if you want to preserve state across deploys.

Local development

Run both bots in separate terminals: python main.py and python main_report.py run --all.
Use a .env file or export variables in your shell.
Test with python main_report.py preview --subreddits longevity to inspect Reddit data without generating reports.

Production checklist

Set all required environment variables (see Configuration sections).
Create two Discord applications with MESSAGE CONTENT INTENT enabled.
Populate docs/ with your organization's markdown documentation.
Test locally before deploying.
Configure approval role and channels for report bot.
Schedule report bot runs (cron, PM2, or Railway cron add-on).
Monitor logs and data/last_report_run.json for health checks.
Tune filters and relevance scores based on real usage.

Additional Resources

README_BOTS.md – Detailed setup guide with environment variable examples, Discord bot creation steps, and troubleshooting.
ASSESSMENT.md – Documents known Reddit configuration issues and debugging steps.
REPORTING_GUIDE.md – Covers report generation workflows and approval process.
BUG_FIX_REPORT.md, FINAL_FIX_SUMMARY.md – Historical bug fixes and resolution notes.
config-example.yaml – Example configuration template for knowledge bot.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
scripts		scripts
tools		tools
.gitignore		.gitignore
README.md		README.md
README_BOTS.md		README_BOTS.md
REPORTING_GUIDE.md		REPORTING_GUIDE.md
config-example.yaml		config-example.yaml
config.yaml		config.yaml
config_report.yaml		config_report.yaml
diagnose_reddit.py		diagnose_reddit.py
diagnose_report_generation.py		diagnose_report_generation.py
llm_config.py		llm_config.py
llmcord license.md		llmcord license.md
main.py		main.py
main_report.py		main_report.py
requirements.txt		requirements.txt

ResearchCollective/daocord

Folders and files

Latest commit

History

Repository files navigation

DAOcord Knowledge & Research Bots

Quick Start for New Maintainers

Overview

Repository layout

Prerequisites

Installation

Configuration

Knowledge bot (main.py)

Report bot (main_report.py)

Running the bots

Knowledge bot (main.py)

Report bot (main_report.py)

Supporting tools

Data directories & logs

Troubleshooting checklist

Architecture & Code Organization

Core files

Tools directory

Diagnostic scripts

Data persistence

Outstanding work

Knowledge bot (main.py)

Report bot (main_report.py)

Common Pitfalls & Debugging

Knowledge bot issues

Report bot issues

Environment variable debugging

Logs and diagnostics

Deployment Notes

Railway-specific

Local development

Production checklist

Additional Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Knowledge bot (`main.py`)

Report bot (`main_report.py`)

Knowledge bot (`main.py`)

Report bot (`main_report.py`)

Knowledge bot (`main.py`)

Report bot (`main_report.py`)

Packages