Secure autograder for LLM agent development and evaluation
Warning: This project is currently in alpha development and not ready for production use. Core features are being actively developed and the API may change significantly.
Gowlin is transitioning to a framework-agnostic evaluation platform for LLM agents. This alpha release introduces the foundation for integrating open-source tools like LLM-Sandbox and DeepEval, with initial framework adapter architecture.
- Framework Adapter Architecture: Base classes for framework-specific evaluation
- LangChain Detection: Basic pattern matching for LangChain usage
- Adapter Structure: Foundation for LLM-Sandbox and DeepEval integration
- Modular Design: Separated concerns between frameworks, sandboxing, and evaluation
- Full LLM-Sandbox integration for secure execution
- DeepEval metrics implementation
- Additional framework adapters (AutoGen, CrewAI)
- Production readiness scoring
Gowlin is not yet published to PyPI. Install from source for development:
git clone https://github.com/aidevelopertraining/gowlin.git
cd gowlin
./scripts/setup-dev.sh
source .venv/bin/activate
# Submit a solution
gowlin submit solution.py --mission hello-agent
# Check evaluation status
gowlin status
# View detailed results
gowlin results --detailed
Current implementation structure:
src/gowlin/
├── frameworks/ # Framework adapter system
│ ├── base.py # Abstract base classes
│ └── langchain_adapter.py # Basic LangChain detection
├── integrations/ # Placeholder adapters
│ ├── llm_sandbox_adapter.py # LLM-Sandbox stub
│ └── deepeval_adapter.py # DeepEval stub
└── [existing modules]
- FrameworkAdapter: Abstract base class for framework-specific adapters
- FrameworkEvaluationResult: Data model for evaluation results
- LangChainAdapter: Basic framework detection (patterns only)
- Integration Stubs: Placeholder classes for LLM-Sandbox and DeepEval
Current Phase: Transitioning to open-source integrations
- ✓ Framework adapter architecture (base classes)
- ✓ Basic project structure reorganization
- ✓ Placeholder integration adapters
- ⚠ LangChain adapter (detection only, no evaluation)
- ⚠ Integration implementations (stubs only)
- ✗ LLM-Sandbox integration
- ✗ DeepEval integration
- ✗ AutoGen, CrewAI adapters
- ✗ Evaluation logic
- Python 3.12+ (required for security consistency with CI)
- Docker (for future sandbox testing)
- Firecracker (planned for production sandboxing)
The new dependencies (llm-sandbox, deepeval) are commented out in requirements.txt due to version conflicts with existing packages. They will be added when the integration is implemented.
git clone https://github.com/aidevelopertraining/gowlin.git
cd gowlin
./scripts/setup-dev.sh
source .venv/bin/activate
pytest
We welcome contributions. Please read our Contributing Guide for details on our development process and code of conduct.
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request
Security features are planned through LLM-Sandbox integration (not yet implemented).
For security vulnerabilities, please email security@aidevelopertraining.com.
Licensed under the Apache License 2.0. See LICENSE for details.
Built by the AI Developer Training community