An advanced prompt engineering laboratory powered by Claude
Prompt Forge Studio is a desktop/web application that transforms prompt engineering from trial-and-error into a systematic, data-driven process. Using Claude's meta-cognitive capabilities, it analyzes, optimizes, tests, and refines system prompts through intelligent feedback loops.
- Intelligent Prompt Editor: Full-featured editor with version control and history tracking
- Multi-Dimensional Quality Analysis: Automated analysis across clarity, completeness, efficiency, and safety dimensions
- Automated Testing Suite: Create test cases and run A/B comparisons between prompt versions
- Version Management: Git-like versioning system with diff visualization and rollback capabilities
- Variant Generation: AI-powered generation of optimized prompt variants
- Performance Metrics: Track quality scores and test performance across iterations
The system analyzes prompts across multiple dimensions:
- Clarity: Identifies ambiguities and unclear instructions
- Completeness: Detects missing edge cases and logical gaps
- Efficiency: Finds redundancies and optimization opportunities
- Safety: Evaluates ethical considerations and potential risks
- Comprehensive: Overall quality assessment with prioritized recommendations
- Python 3.8 or higher
- Anthropic API key (Get one here)
- Clone the repository:
git clone https://github.com/yourusername/PromptForge.git
cd PromptForge- Install dependencies:
pip install -r requirements.txt- Configure your API key:
cp .env.example .env
# Edit .env and add your Anthropic API keypython run.pyOr directly with Streamlit:
streamlit run src/ui/app.pyThe application will open in your default browser at http://localhost:8501
- Start the Application: Launch Prompt Forge Studio
- Configure API: Enter your Anthropic API key in the sidebar
- Create New Prompt: Click "β New Prompt" in the sidebar
- Write Your Prompt: Use the editor to craft your system prompt
- Analyze Quality: Click "π Analyze Quality" to get comprehensive feedback
- Iterate and Improve: Use the insights to refine your prompt
The Analysis page offers multiple analysis types:
- Comprehensive: Runs all analysis dimensions and shows a quality radar chart
- Clarity: Focuses on ambiguity and precision
- Completeness: Identifies missing edge cases
- Efficiency: Suggests token optimizations
- Safety: Evaluates ethical and safety concerns
Each analysis provides:
- Numeric quality score (0-100)
- Specific issues found
- Concrete improvement suggestions
- Create Test Cases: Define inputs and evaluation criteria
- Run Tests: Execute all test cases against your prompt
- Review Results: See scores, outputs, and AI evaluations
- Compare Versions: Test multiple prompt versions side-by-side
- Save Versions: Create new versions with descriptive notes
- View History: Browse all previous versions
- Load Versions: Restore any previous version
- Compare: See what changed between versions
PromptForge/
βββ src/
β βββ api/
β β βββ anthropic_client.py # API client with retry logic
β βββ core/
β β βββ analyzer.py # Quality analysis engine
β β βββ prompt.py # Data models
β β βββ tester.py # Testing system
β βββ db/
β β βββ database.py # SQLite database manager
β βββ ui/
β β βββ app.py # Streamlit interface
β βββ config.py # Configuration management
βββ requirements.txt
βββ .env.example
βββ run.py
βββ README.md
- Backend: Python 3.8+, SQLAlchemy, Pydantic
- Frontend: Streamlit, Plotly
- AI: Anthropic Claude API (Sonnet 4.5)
- Database: SQLite
Create a .env file with the following:
# Anthropic API Configuration
ANTHROPIC_API_KEY=your_api_key_here
# Default Model Settings
DEFAULT_MODEL=claude-sonnet-4-5-20250929
ANALYSIS_MODEL=claude-sonnet-4-5-20250929
# API Limits
MAX_TOKENS=4096
TEMPERATURE=1.0
# Database
DATABASE_PATH=./promptforge.db
# App Settings
DEBUG=falseEdit src/config.py to customize:
- Retry logic parameters
- Model selections
- Token limits
- Database location
Create and refine system prompts for production AI applications with confidence.
Systematically improve existing prompts by identifying and fixing weaknesses.
Ensure prompts meet quality standards before deployment.
Compare different prompt approaches with quantitative metrics.
Learn prompt engineering best practices through AI-powered feedback.
Scenario: Creating a medical report analyzer
-
Initial Creation
- Create new prompt "Medical Report Analyzer v1"
- Write initial system prompt with basic instructions
- Add medical ethics component from library
-
Quality Analysis
- Run comprehensive analysis
- Discover missing edge case handling
- Get suggestions for improving clarity
-
Generate Variants
- Request robustness-focused variants
- Review 3 AI-generated alternatives
- Select most promising variant
-
Testing
- Create test cases with sample reports
- Include edge cases (incomplete data, ambiguous results)
- Run tests across all variants
-
Selection and Refinement
- Compare test results
- Select best-performing variant (23% better on edge cases)
- Make final tweaks
- Save as v2
-
Export
- Export finalized prompt for production use
- Document test results and decisions
The analyzer uses specialized meta-prompts to evaluate your prompts:
- Automated Scoring: Each dimension receives a 0-100 score
- Issue Detection: Specific problems are identified with examples
- Actionable Suggestions: Concrete recommendations for improvement
- Historical Tracking: All analyses are saved for longitudinal comparison
- Flexible Test Cases: Define inputs with expected outputs or evaluation criteria
- AI-as-Judge: Claude evaluates outputs based on your criteria
- Batch Testing: Run all tests with one click
- Comparative Analysis: Side-by-side comparison of different prompts
- Score Tracking: Historical performance metrics
Generate optimized variants focusing on:
- Clarity: Maximum precision and explicitness
- Conciseness: Token-optimized versions
- Robustness: Edge case handling
- Balanced: Overall quality improvement
- API keys are stored securely in environment variables
- Local SQLite database keeps all data on your machine
- No data is sent to third parties except Anthropic API
- Prompts may contain sensitive information - ensure proper API key security
- Component library with reusable prompt blocks
- Evolutionary optimization mode (genetic algorithms)
- Multi-model testing (GPT-4, Gemini comparison)
- Export to multiple formats (Python, TypeScript, etc.)
- Collaborative features and prompt sharing
- Advanced visualization and analytics
- CI/CD integration for automated testing
- Production monitoring integration
Contributions are welcome! Areas of interest:
- Additional analysis dimensions
- New testing capabilities
- UI/UX improvements
- Documentation
- Example prompts and use cases
MIT License - see LICENSE file for details
- Built with Anthropic Claude
- UI powered by Streamlit
- Inspired by the prompt engineering community
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Anthropic Prompt Engineering Guide
- Best Practices for System Prompts
- Testing and Evaluation Strategies
Built with β€οΈ using Claude to improve Claude
Meta-recursion at its finest