Enhance Agents and Refine Prompt Templates for Improved Output Quality#4
Enhance Agents and Refine Prompt Templates for Improved Output Quality#4ibadurrehmandg wants to merge 3 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to improve LLM agent output quality and robustness by refining prompt templates and adding chunking/retry/logging behavior to the tech/legal/orchestrator service layers.
Changes:
- Added chunking, basic logging, and retry loops to the Tech Gap and Cross-check orchestrator services.
- Reworked the Legal leverage agent prompt and added output “schema guard” logic.
- Refined council prompt templates for more structured JSON-only outputs.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| specgap/specgap_audits.db | Adds a SQLite DB file to the repo (appears to be an audit persistence artifact). |
| specgap/app/services/tech_engine.py | Adds helper utilities, chunking, and retry logic around tech gap analysis. |
| specgap/app/services/cross_check.py | Adds chunking, retries, JSON parsing helper, and a patch-pack extraction helper. |
| specgap/app/services/biz_engine.py | Updates the legal agent prompt, introduces schema normalization, and adds retries/chunking. |
| specgap/app/core/prompts.py | Refines prompt templates to enforce JSON-only structured outputs. |
Comments suppressed due to low confidence (5)
specgap/app/services/tech_engine.py:100
validate_json()returns an{error: ...}dict on JSON decode failure, but the retry loop treats that as success (logs "successful" and returns immediately). This prevents retries from ever running for invalid JSON responses; consider making JSON validation failures raise/trigger another attempt and only log success after validation passes.
NOTE: The input may contain MULTIPLE documents (e.g., Requirements and Proposals),
separated by '=== SOURCE DOCUMENT: [Name] ==='.
Instructions:
specgap/app/services/cross_check.py:3
asynciois imported but not used in this module. Please remove the unused import to avoid lint failures and keep dependencies minimal.
"""
Cross-Check Engine - Orchestrator Agent
Synthesizes findings from Tech and Legal agents into actionable outputs.
specgap/app/services/cross_check.py:90
- Chunk pairing uses
zip(tech_chunks, proposal_chunks), which silently drops remaining chunks if the tech/proposal lengths differ. This can truncate input text and lead to incorrect synthesis. Iterate over the longer list (e.g., by index with bounds checks) so all chunks are included.
}
],
"strategic_synthesis": "Executive summary (2-3 paragraphs) explaining overall deal quality",
"reality_diagram_mermaid": "graph TD\\n A[Start] --> B[Process]\\n B --> C[End]",
specgap/app/services/cross_check.py:109
- Similar to tech_engine:
validate_json()returns an error dict on JSON decode failure, but the retry loop logs success and returns it immediately. That means retries won't run when the model returns invalid JSON. Treat JSON validation failures as retryable errors and only log "successful" when parsed output is valid.
def _clean_json_response(text: str) -> str:
"""Clean AI response to extract valid JSON"""
cleaned = text.strip()
specgap/app/services/cross_check.py:2
- Import of 'asyncio' is not used.
Cross-Check Engine - Orchestrator Agent
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| # ----------------------------- | ||
| # Helper Functions | ||
| # ----------------------------- | ||
|
|
||
| def chunk_text(text: str, max_len: int = 40000) -> List[str]: | ||
| """Split large text into manageable chunks.""" | ||
| return [text[i:i+max_len] for i in range(0, len(text), max_len)] | ||
|
|
||
| def validate_json(raw_text: str) -> Dict[str, Any]: | ||
| """Safely parse JSON from model output.""" | ||
| try: | ||
| if raw_text.startswith("```json"): | ||
| raw_text = raw_text[7:] | ||
| if raw_text.endswith("```"): | ||
| raw_text = raw_text[:-3] | ||
| return json.loads(raw_text) | ||
| except json.JSONDecodeError: |
There was a problem hiding this comment.
Helper functions like chunk_text / validate_json / log_step are duplicated across multiple service modules (tech_engine, cross_check, biz_engine), and they already differ in behavior (e.g., code-fence stripping). Consider centralizing them in a shared utility module to avoid drift and bugs.
|
|
||
| cleaned = response.text.strip() | ||
| if cleaned.startswith("```"): | ||
| cleaned = cleaned.split("```")[1] |
There was a problem hiding this comment.
Code-fence stripping is incorrect: for a typical response like json\n{...}\n, cleaned.split("```")[1] yields json\n{...} which is not valid JSON and will reliably cause json.loads to fail. Use a more robust fence remover (strip leading json/ and surrounding whitespace) before parsing.
| cleaned = cleaned.split("```")[1] | |
| # Remove opening fence and optional language identifier (e.g. ```json) | |
| if cleaned.startswith("```json"): | |
| cleaned = cleaned[len("```json"):].strip() | |
| else: | |
| cleaned = cleaned[3:].strip() | |
| # Remove trailing fence if present | |
| if cleaned.endswith("```"): | |
| cleaned = cleaned[:-3].strip() |
| log_step("JSON parse failed, returning raw output snippet") | ||
| return { | ||
| "error": "Model output was not valid JSON", | ||
| "raw_output": response.text[:1500] | ||
| } |
There was a problem hiding this comment.
The retry loop doesn't actually retry on JSON parse errors: except json.JSONDecodeError: returns immediately on the first failure, so retries is ignored for the most common failure mode (model emits non-JSON). Consider incrementing attempt and continuing (optionally with a reprompt) instead of returning immediately.
| log_step("JSON parse failed, returning raw output snippet") | |
| return { | |
| "error": "Model output was not valid JSON", | |
| "raw_output": response.text[:1500] | |
| } | |
| log_step(f"JSON parse failed on attempt {attempt+1}") | |
| attempt += 1 | |
| if attempt > retries: | |
| log_step("Max retries reached after JSON parse failures, returning raw output snippet") | |
| return { | |
| "error": "Model output was not valid JSON", | |
| "raw_output": response.text[:1500] | |
| } |
No description provided.