Skip to content

johnkord/agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Agents Research & Knowledge Base

A comprehensive research repository on AI agent design, engineering, and implementation.

Started: 2026-03-07 | Last updated: 2026-04-10


๐Ÿ“– Research Documents

Fundamentals

# Document Description
01 What Are AI Agents? First principles: definitions, the agent loop, workflows vs agents distinction (Anthropic), autonomy spectrum, failure modes, and the three eras of agent design

Techniques

# Document Description
02 Context Engineering The core discipline โ€” "the art of providing the right information in the right format at the right time" (Schmid). Context as a system, the seven principles, token budgets, pre-fetching, error compaction
04 Tool Use & Function Calling ACI (Agent-Computer Interface), tool design principles, poka-yoke, tools as structured outputs, MCP protocol, and the tool explosion problem
05 Memory Systems CoALA memory taxonomy (working, episodic, semantic, procedural), storage backends, retrieval strategies, consolidation patterns, and the stateless-reducer tension
08 Planning & Reasoning Chain-of-thought, tree-of-thought, reflexion, hierarchical decomposition, adaptive re-planning, metacognition, and knowing when to stop
10 Language Selection for Agents Language analysis for agent-generated code: static vs dynamic typing, GC vs manual memory, Rust, Go, C#, Java, TypeScript, compile times, and the context engineering framing
11 Self-Reflection & Verification Self-correction, verification strategies, reflection loops, and when self-repair helps vs hurts

Architecture Patterns

# Document Description
03 Architecture Patterns Workflows (prompt chaining, routing, parallelization) and Agents (ReAct, Plan-Execute, Autonomous) โ€” with Anthropic's simplicity-first principle and anti-patterns
06 Multi-Agent Systems Communication topologies, human-as-tool-call, the framework landscape (2026), and why most teams move away from frameworks

Evaluation

# Document Description
07 Evaluation & Reliability Benchmarking, LLM-as-judge, compounding error math, sandboxed testing, guardrails, and the reliability ladder

๐Ÿ—๏ธ Blueprints

Implementation guides for building specific agent types:

Blueprint Description Status
Generic Agent Full implementation blueprint with agent loop, context assembler, tool executor, guardrails, and tools library โœ… Complete
Research Agent General-purpose research agent (.NET 10 + Microsoft Agent Framework) โœ… Complete
Coding Agent (Forge) SWE-bench-style coding agent with verification, guardrails, and session management ๐Ÿ”ง In Progress
Life Agent Audio lifelogging and life-augmentation agent ๐Ÿ”ง In Progress
MCP Server Model Context Protocol server implementation ๐Ÿ”ง In Progress

๐Ÿ“š Knowledge Base

Reference materials and taxonomies:

Document Description
Agent & Copilot Taxonomy Comprehensive classification by autonomy level, domain, architecture, and design decision matrix
Research Paper Catalogue 91 papers organized by theme with intellectual landscape analysis, core tensions, schools of thought, and timeline view

Additional Knowledge Base

Document Description
Audio Lifelogging Research Continuous recording, transcription, and memory augmentation
Human Wellness Research What a life agent should track โ€” health, habits, longevity
Long-Running Life Augmentation Agents Persistent, cloud-hosted, proactive AI agents for daily life

Guides

Document Description
GitHub Copilot Customization Guide Custom instructions, agent skills, prompt files, MCP servers, and hooks

๐Ÿ“„ Research Papers (126 papers)

126 academic papers downloaded from arXiv and converted to Markdown via Docling. Full conversions are in papers/docling/. See the Research Paper Catalogue for the complete organized listing with intellectual landscape analysis.

Below are highlights from the foundational papers:

Foundational Agent Frameworks

Paper Year Key Contribution
Chain-of-Thought Prompting 2022 Foundational reasoning technique; emergent ability at scale; CoT as context engineering
ReAct: Synergizing Reasoning and Acting 2022 Interleaved Thoughtโ†’Actionโ†’Observation loop; foundation for most modern agents
Reflexion: Verbal Reinforcement Learning 2023 Self-reflection as episodic memory; 91% HumanEval without weight updates
Tree of Thoughts 2023 Tree search over reasoning paths; 4%โ†’74% on Game of 24
CoALA: Cognitive Architectures for Language Agents 2023 Formal framework unifying all agent architectures; memory taxonomy
LATS: Language Agent Tree Search 2023 MCTS + LM agents; 92.7% HumanEval
Generative Agents: Stanford Smallville 2023 25 agents in a sandbox; memory streams, reflection, emergent social behavior
Voyager: Lifelong Learning Agent 2023 First LLM-powered embodied lifelong learner; skill library as code; 3.3x more items discovered
DSPy: Compiling Declarative LM Pipelines 2023 Programming model replacing prompt engineering with modules + compiler; automated context engineering
MemGPT: LLMs as Operating Systems 2023 OS-inspired virtual context management; self-directed memory paging
SELF-DISCOVER: Self-Composed Reasoning 2024 LLMs self-compose task-specific reasoning structures; +32% over CoT; 10-40x fewer inference calls
Brittle Foundations of ReAct 2024 Critical analysis: ReAct gains come from exemplar similarity, not interleaved reasoning

Tool Use & Interface Design

Paper Year Key Contribution
Toolformer 2023 Self-supervised tool learning; 6.7B model matches 175B GPT-3 with tools
Gorilla: LLM Connected with Massive APIs 2023 Retrieval-augmented API calling; smaller model + docs beats larger model
SWE-agent: Agent-Computer Interface 2024 ACI design principles; interface design > prompt optimization
DynaSaur: Dynamic Action Creation 2024 Agents write new tools as code; growing action library
OpenHands: Open Platform for AI Software Developers 2024 CodeAct paradigm (code as universal action); sandboxed execution; 53%+ SWE-bench

Multi-Agent Systems

Paper Year Key Contribution
MetaGPT: Multi-Agent with SOPs 2023 SOPs from human software engineering; document-centric communication; ~60% reduced hallucination cascading
AutoGen: Multi-Agent Conversation 2023 ConversableAgent abstraction; conversation patterns (two-agent, sequential, group, nested); human-in-the-loop native
Mixture-of-Agents (MoA) 2024 Layered multi-model architecture; open-source ensemble beats GPT-4
ADAS: Automated Design of Agentic Systems 2024 Meta agent discovers novel agent architectures; 7-9% improvement over hand-designed
Agent Protocols Survey 2025 First comprehensive survey of agent communication protocols; MCP + A2A convergence

Memory & Context

Paper Year Key Contribution
Pensieve: Stateful Context Management 2026 StateLM for self-directed memory management; 83.9% on needle-in-a-haystack at 2M tokens
Active Context Compression 2026 22.7% token savings with 0% accuracy loss via information-density-preserving compression
SWE-Pruner: Context Pruning for Code Agents 2026 Goal-driven pruning cuts 23-54% of tokens while improving accuracy

Evaluation & Benchmarks

Paper Year Key Contribution
SWE-bench: Real GitHub Issues 2023 2,294 real software engineering tasks; best model went from 1.96% โ†’ 53%+ with agentic tools
AgentBench: Evaluating LLMs as Agents 2023 8-environment benchmark; massive gap between commercial and open-source LLMs as agents
AgentBoard 2024 Progress rate metric; fine-grained multi-turn evaluation across 9 environments
OSWorld: Real Computer Benchmarks 2024 Real OS environments; best model 12.24% vs human 72.36%; GUI grounding is the bottleneck
ฯ„-bench: Tool-Agent-User Interaction 2024 pass^k reliability metric; even GPT-4o <50%; by ReAct/Reflexion authors
Agent-as-a-Judge 2024 Evaluate agents with agents; ~0.85 human correlation vs 0.65 for LLM-as-a-Judge
Agentic RAG Survey 2025 Agentic design patterns (reflection, planning, tool use) applied to RAG pipelines

Surveys & Taxonomies

Paper Year Key Contribution
LLM-Based Agents Survey (Xi et al.) 2023 Comprehensive 86-page survey; Brain-Perception-Action framework
Autonomous Agents Survey (Wang et al.) 2023 Unified framework: Profiling + Memory + Planning + Action modules; applications survey

Deep Research & Verification

Paper Year Key Contribution
Deep Research Survey 2025 Comprehensive survey of deep research systems
DeepVerifier: Self-Evolving Verification 2026 Decompose verification into sub-questions per failure taxonomy

๐Ÿ“ Research Notes & Design Documents

Internal design documents and experiment observations from building agents:

Document Description
MCP Server Transport Modes Critical comparison of stdio, HTTP, and Streamable HTTP transport modes
Improving Task Tracking Research and design for Forge's task tracking system
PDF-to-Markdown Tools Comparison Evaluation of tools for converting academic papers
Operational Observations Design Design for tracking agent operational patterns
Phase 5 Experiment Observations Findings from Forge coding agent experiments
Phase 5A: Hypothesis-Driven Debugging Design for structured debugging approach
Phase 5B: Proactive Clarification Design for agents that ask before acting
Phase 5C: Episodic Consolidation Design for session memory consolidation
Phase 6D: Trajectory Analysis Design for progress metrics and trajectory analysis

๐Ÿ—บ๏ธ Reading Order

If you're new to agent building, read in this order:

  1. What Are Agents? โ€” Foundations and the workflows-vs-agents distinction
  2. Context Engineering โ€” The most important skill ("agent failures are context failures")
  3. Architecture Patterns โ€” Know your options; start with the simplest
  4. Tool Use โ€” How agents take action; invest more here than in prompts
  5. Memory Systems โ€” How agents remember
  6. Planning & Reasoning โ€” How agents think
  7. Multi-Agent Systems โ€” Scaling with teams (but try single-agent first)
  8. Evaluation โ€” Making agents reliable
  9. Implementation Blueprint โ€” Build one
  10. Language Selection โ€” Choosing the right language for agent-generated code

๐Ÿ”‘ Key Insights (Quick Reference)

Core Mental Models

  • Agents = LLM + Tools + Loop + Memory + Goal โ€” but most production "agents" are actually workflows (predefined code paths with LLM steps)
  • Context engineering > prompt engineering โ€” context is a system, not a string (Schmid)
  • Agent failures are context failures, not model failures โ€” Anthropic spent more time optimizing tools than prompts for SWE-bench
  • ACI (Agent-Computer Interface) > UI โ€” tool descriptions, error messages, and output formats are the agent's interface
  • The model is the engine, context is the fuel โ€” garbage in, garbage out, regardless of model capability
  • Own your prompts, own your context window, own your control flow (12-Factor Agents)

The One-Liners

  • Build for failure first, success second โ€” $0.95^{10}$ per-step accuracy = 60% end-to-end
  • Start as a copilot, graduate to autonomous โ€” trust must be earned through evidence from evals
  • Use the simplest architecture that achieves your reliability goals โ€” don't use a framework where a while-loop suffices (Anthropic)
  • Tool descriptions are a form of context engineering โ€” they're the primary way you communicate intent to the model
  • Memory is the scaffolding that turns a stateless function into a stateful agent
  • The evaluator's rubric IS the specification
  • Working memory is the most underappreciated memory type
  • The handoff summary is the most important piece of multi-agent communication
  • Metacognition turns a reactive system into a self-monitoring system โ€” the best tool is sometimes request_clarification

The Design Principles

  1. Relevance over recency โ€” Not everything recent is relevant
  2. Compression without loss โ€” Fit more signal into less tokens
  3. Structure signals intent โ€” How you format context changes how the model uses it
  4. Examples > instructions โ€” Show, don't tell
  5. Single responsibility tools โ€” One tool, one job
  6. Rich error messages โ€” Compact errors into context; the error IS the model's feedback
  7. Verify after modify โ€” Always check your work
  8. Graceful degradation โ€” Agents that say "I can't" are better than agents that hallucinate
  9. Human contact is a tool call โ€” Not a failure state, but a smart decision (12-Factor)
  10. Poka-yoke your tools โ€” Make incorrect usage impossible (absolute paths > relative paths)

Directory Structure

agents/
โ”œโ”€โ”€ README.md                          โ† You are here
โ”œโ”€โ”€ LICENSE                            โ† MIT License
โ”œโ”€โ”€ .github/
โ”‚   โ”œโ”€โ”€ copilot-instructions.md        โ† Copilot custom instructions
โ”‚   โ””โ”€โ”€ skills/
โ”‚       โ”œโ”€โ”€ convert-paper/SKILL.md     โ† Paper conversion skill
โ”‚       โ”œโ”€โ”€ forge-improve/SKILL.md     โ† Forge session analysis skill
โ”‚       โ””โ”€โ”€ research-agent-investigation/SKILL.md
โ”œโ”€โ”€ research/
โ”‚   โ”œโ”€โ”€ fundamentals/
โ”‚   โ”‚   โ””โ”€โ”€ 01-what-are-agents.md
โ”‚   โ”œโ”€โ”€ techniques/
โ”‚   โ”‚   โ”œโ”€โ”€ 02-context-engineering.md
โ”‚   โ”‚   โ”œโ”€โ”€ 04-tool-use-function-calling.md
โ”‚   โ”‚   โ”œโ”€โ”€ 05-memory-systems.md
โ”‚   โ”‚   โ”œโ”€โ”€ 08-planning-reasoning.md
โ”‚   โ”‚   โ”œโ”€โ”€ 10-language-selection-for-agents.md
โ”‚   โ”‚   โ””โ”€โ”€ 11-self-reflection-verification.md
โ”‚   โ”œโ”€โ”€ patterns/
โ”‚   โ”‚   โ”œโ”€โ”€ 03-architecture-patterns.md
โ”‚   โ”‚   โ””โ”€โ”€ 06-multi-agent-systems.md
โ”‚   โ”œโ”€โ”€ evaluation/
โ”‚   โ”‚   โ””โ”€โ”€ 07-evaluation-reliability.md
โ”‚   โ””โ”€โ”€ copilot-customization-guide.md
โ”œโ”€โ”€ knowledge-base/
โ”‚   โ”œโ”€โ”€ agent-taxonomy.md
โ”‚   โ”œโ”€โ”€ paper-catalogue.md             โ† 126 papers with intellectual landscape
โ”‚   โ”œโ”€โ”€ audio-lifelogging-research.md
โ”‚   โ”œโ”€โ”€ human-wellness-research.md
โ”‚   โ””โ”€โ”€ long-running-life-augmentation-agents.md
โ”œโ”€โ”€ papers/
โ”‚   โ”œโ”€โ”€ pdfs/                          โ† Source PDFs from arXiv
โ”‚   โ””โ”€โ”€ docling/                       โ† Markdown conversions (126 papers)
โ”œโ”€โ”€ blueprints/
โ”‚   โ”œโ”€โ”€ generic-agent/
โ”‚   โ”‚   โ””โ”€โ”€ 09-implementation-blueprint.md
โ”‚   โ”œโ”€โ”€ research-agent/               โ† .NET 10 + Microsoft Agent Framework
โ”‚   โ”œโ”€โ”€ coding-agent/                 โ† Forge coding agent (in progress)
โ”‚   โ”œโ”€โ”€ life-agent/                   โ† Audio lifelogging agent (in progress)
โ”‚   โ””โ”€โ”€ mcp-server/                   โ† MCP server implementation (in progress)
โ””โ”€โ”€ scripts/
    โ””โ”€โ”€ convert_papers.py              โ† arXiv download + Docling pipeline

๐Ÿ”ฎ Roadmap

Phase 1: Research Foundation โœ…

  • 11 research documents covering fundamentals, techniques, patterns, and evaluation
  • Agent & Copilot taxonomy
  • Generic agent implementation blueprint (Python)

Phase 2: Academic Research Integration โœ…

  • 126 research papers collected from arXiv (2022โ€“2026)
  • Automated PDF โ†’ Markdown pipeline via Docling
  • Research paper catalogue with intellectual landscape analysis
  • Research findings integrated throughout knowledge base documents

Phase 3: Implementation Blueprints (In Progress)

  • Research Agent โ€” .NET 10 + Microsoft Agent Framework (complete, clean build)
  • Language selection analysis for agent-generated code
  • Copilot customization guide
  • Coding agent (Forge) โ€” verification, guardrails, session management
  • Life agent โ€” audio lifelogging and life-augmentation
  • MCP server โ€” Model Context Protocol implementation
  • Multi-agent orchestrator

Phase 4: Future

  • Evaluation harness and benchmark suite
  • MCP server implementations for common tools
  • Self-improving agents (learning from trajectories)
  • Cost optimization strategies

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors