A customer engineering application that automatically triages and responds to Github issues on public repositories (defaulted to LangChain). The application classifies issues, assigns severity, and provides relevant documentation-based responses. It also supports comprehensive evaluation using LangSmith.
The application processes support queries through a series of chains:
- Issue Type Classification: Identifies if the query is a bug report, feature request, or support question
- Severity Assessment: Assigns a severity score (1-4) based on impact
- Category Classification: Categorizes the query (setup, chains, agents, memory, retrieval, other)
- Documentation Retrieval: Finds relevant documentation from LangChain docs
- Response Generation: Provides a helpful response with documentation links
The application processes issues from a CSV file (langchain_issues_dataset.csv), which is pulled from the Github API. Each issue is analyzed and classified, with support questions receiving documentation-based responses.
- Create a
.envfile with your API keys:
OPENAI_API_KEY=your_openai_key
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT=your_project_name
GITHUB_TOKEN=your_github_token # Optional, for higher rate limits
- Install dependencies:
pip install -r requirements.txtpython chains.py- This will process issues from the dataset and print out the triage and response results for each issue.
python eval.py- This will run the comprehensive evaluation workflow using LangSmith, applying multiple LLM-as-a-judge evaluators to all outputs.
- After completion, a link to the experiment will be printed. Open this link to view and compare results in the LangSmith UI.
python vectorize_docs.py- This script builds or updates the vector store for LangChain documentation, used for retrieval in the main chain.
The application includes a comprehensive evaluation system with multiple evaluators:
- Issue Type Accuracy: Evaluates the accuracy of issue type classification
- Severity Accuracy: Assesses the correctness of severity assignments
- Response Action Accuracy: Evaluates if the response correctly addresses the issue
- Tone Appropriateness: Assesses professionalism, empathy, clarity, and positivity
- Response Completeness: Evaluates technical details, explanation quality, and references
- Technical Accuracy: Assesses code references, documentation usage, and terminology
- Relevance Score: Evaluates how relevant retrieved documents are to the issue
- Coverage Score: Assesses if retrieved documents cover all necessary information
All evaluators use LLM-as-a-judge for robust, context-aware scoring and provide detailed explanations for their assessments.
chains.py: Main application logic and chain definitionseval.py: Comprehensive evaluation pipeline using LangSmithvectorize_docs.py: Vector store setup for LangChain documentationget_github_issues.py: GitHub issue fetching and dataset creationlangchain_issues_dataset.csv: Sample issues for testing.env: Environment variables for API keysrequirements.txt: Python dependencies
- Make sure your dataset (
langchain_issues_dataset.csv) includes both the issue description and URL for each example. - All evaluation results and experiment comparisons are available in the LangSmith UI.
- You can add or modify evaluators in
eval.pyas needed for your use case. - The system uses GPT-4 for evaluation to ensure high-quality assessments.