โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ ๐ RECURSIVE KNOWLEDGE SYNTHESIS PIPELINE โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ Repository Analysis โ
โ ๐ท๏ธ Crawl โ ๐ง Analyze โ ๐ Generate โ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโ
โ
โญโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ ๐ณ RECURSIVE SUMMARIZATION STRATEGY โ
โ โ
โ Files โ Modules โ Packages โ Architecture โ
โ โ โ โ โ โ
โ Each summary enriches the embeddings database โ
โ Creating a self-reinforcing knowledge web ๐ธ๏ธ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
An AI-powered documentation generator that creates living, breathing documentation through recursive knowledge synthesis.
ConductDoc doesn't just analyze codeโit builds understanding recursively, creating a self-enriching knowledge ecosystem:
graph TD
A[๐ Individual Files] --> B[๐ง File Summaries]
B --> C[๐ Embeddings Database]
C --> D[๐ Enhanced Context]
D --> E[๐ฆ Module Summaries]
E --> C
C --> F[๐๏ธ Architecture Understanding]
F --> C
C --> G[๐ Final Documentation]
style C fill:#e1f5fe
style G fill:#c8e6c9
- Bottom-Up Understanding: Starts with individual files and builds toward architectural comprehension
- Recursive Feedback: Each generated summary feeds back into the embeddings database
- Context Amplification: Later analyses benefit from all previous insights
- Dual Summarization: Generates both concise abstracts and detailed explanations
# Discovers source structure automatically
src_path = find_source_directory(repo_path)
docs_path = find_docs_directory(repo_path)
# Builds comprehensive code understanding
elements, import_graph, ast_map = crawl_source_code(src_path)# The secret sauce: Recursive summarization that enriches itself
def generate_recursive_summary(elements, retriever, readme_content):
# 1. Build file-level summaries
for file_path, elements_in_file in grouped_elements:
abstractive_summary = ai_summarize_file(file_path, context)
detailed_summary = ai_detailed_analysis(file_path, context)
# ๐ฅ KEY: Feed summary back into retrieval system
retriever.add_chunks([f"AI Summary: {abstractive_summary}"])
# 2. Build module-level understanding (now enriched with file summaries)
for module_path in modules:
module_summary = ai_summarize_module(module_path, enhanced_context)
retriever.add_chunks([f"Module Summary: {module_summary}"])
# 3. Build architectural understanding (enriched with everything)
final_summary = ai_architectural_analysis(all_enriched_context)# Creates stunning interactive documentation
create_documentation_file(analysis_result, all_elements_with_docs)- D3.js powered tree visualizations
- Hover tooltips with AI-generated summaries
- Collapsible nodes for exploration
- Zoom and pan for large codebases
- Executive Summary: High-level overview for stakeholders
- Interactive Architecture: Visual code exploration
- Code Examples: AI-curated, ready-to-run snippets
- API Reference: Complete documentation with context
- Intelligent Caching: LLM responses cached by content hash
- Deterministic Operations: Consistent results across runs
- Incremental Updates: Only re-analyzes changed components
# Clone and install
git clone <repository-url>
cd ConductAI
pip install -r requirements.txt
# Configure your LLM (OpenRouter recommended)
echo "OPENROUTER_API_KEY=your_key_here" > .env# Analyze any Python repository
python main.py --repo-url https://github.com/ManimCommunity/manim.git --llm-mode openrouter
# Or use local Ollama
python main.py --repo-url https://github.com/your/repo.git --llm-mode local# Open the beautiful documentation
open output/documentation.htmlโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ง RECURSIVE RAG PIPELINE โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ ๐ Repository โ
โ โโโ ๐ AST Analysis โ
โ โโโ ๐ Import Graph โ
โ โโโ ๐ Existing Docs โ
โ โ โ
โ โผ โ
โ ๐๏ธ Embeddings Database โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โโโ Original documentation โ โ
โ โโโ Code structure โ โ
โ โโโ README context โ โ
โ โโโ Generated summaries (recursive!) โโโโโโโโ โ
โ โ โ
โ โผ โ
โ ๐ Recursive Summarization โ
โ โโโ ๐ File Analysis โ Enhanced Context โ
โ โโโ ๐ฆ Module Synthesis โ More Enhanced Context โ
โ โโโ ๐๏ธ Architecture โ Fully Enhanced Context โ
โ โโโ ๐ Final Documentation โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Component | Purpose | Magic |
|---|---|---|
| ๐ท๏ธ Crawler | Repository analysis | AST parsing, import tracking, doc ingestion |
| ๐ง Analyzer | Recursive RAG engine | Self-enriching embeddings, dual summarization |
| ๐ Generator | Beautiful HTML creation | D3.js diagrams, responsive design, copy buttons |
| โก Retriever | Semantic search | Vector embeddings, context enhancement |
| ๐พ Cache | Performance optimization | Content-based caching, deterministic keys |
Each analysis phase enriches the knowledge base for subsequent phases:
- File summaries help understand modules
- Module summaries help understand architecture
- Architectural insights help generate examples
- Everything together creates coherent, contextual documentation
Unlike traditional documentation generators, ConductDoc understands:
- Intent: What the code is trying to achieve
- Relationships: How components work together
- Patterns: Common usage scenarios
- Evolution: How the codebase is structured and why
The caching system ensures:
- Consistency: Identical inputs always produce identical outputs
- Speed: Subsequent runs are lightning fast
- Efficiency: Only changed content is re-analyzed
# Use different LLM providers
python main.py --repo-url <repo> --llm-mode openrouter
python main.py --repo-url <repo> --llm-mode local
# Override auto-detection
python main.py --repo-url <repo> --src-dir custom/source --docs-dir docs/
# Debug and optimize
python main.py --repo-url <repo> --save-debug-data --clear-cache# Saves intermediate data for analysis
python main.py --repo-url <repo> --save-debug-data
# Generates:
# .temp/debug_import_graph.json - Code dependencies
# .temp/debug_module_map.json - Module structure
# .temp/debug_docs_context.html - Processed documentation# Clear LLM cache for fresh analysis
python main.py --repo-url <repo> --clear-cache
# Cache directory: .cache/
# Each response cached by content hash for consistency- Multi-language support: Beyond Python to JavaScript, TypeScript, Go
- Cross-repository analysis: Understanding dependencies and relationships
- Evolutionary documentation: Tracking how codebases change over time
- Interactive Q&A: Natural language queries about the codebase
- Code quality insights: Automated suggestions for improvements
- Documentation validation: Detecting outdated or incorrect documentation
- IDE extensions: Real-time documentation in your editor
- CI/CD integration: Automated documentation updates
- Team collaboration: Shared understanding across development teams
The generated documentation includes:
A beautiful D3.js visualization showing:
- ๐ Directory structure with collapsible nodes
- ๐ Python modules with type indicators
- ๐ Configuration files with distinct styling
- ๐ฌ Hover tooltips with AI-generated summaries
- Executive Overview: Perfect for stakeholders and new team members
- Code Examples: Curated, runnable examples for common tasks
- API Reference: Complete documentation with source code
- Navigation: Smooth scrolling, sticky navigation, responsive design
This project demonstrates the power of recursive AI analysis for code understanding. The architecture is designed for:
- ๐ง Extensibility: Easy to add new analysis types
- ๐งช Testability: Clear separation of concerns
- ๐ Scalability: Efficient caching and incremental updates
- ๐จ Beauty: Modern, responsive, interactive output
MIT License - Feel free to use this as inspiration for your own documentation automation projects!
Built with โค๏ธ and recursive intelligence - where each analysis makes the next one smarter.