Skip to content

Enhance Agents and Refine Prompt Templates for Improved Output Quality#4

Open
ibadurrehmandg wants to merge 3 commits intomainfrom
IR
Open

Enhance Agents and Refine Prompt Templates for Improved Output Quality#4
ibadurrehmandg wants to merge 3 commits intomainfrom
IR

Conversation

@ibadurrehmandg
Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings February 8, 2026 19:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve LLM agent output quality and robustness by refining prompt templates and adding chunking/retry/logging behavior to the tech/legal/orchestrator service layers.

Changes:

  • Added chunking, basic logging, and retry loops to the Tech Gap and Cross-check orchestrator services.
  • Reworked the Legal leverage agent prompt and added output “schema guard” logic.
  • Refined council prompt templates for more structured JSON-only outputs.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
specgap/specgap_audits.db Adds a SQLite DB file to the repo (appears to be an audit persistence artifact).
specgap/app/services/tech_engine.py Adds helper utilities, chunking, and retry logic around tech gap analysis.
specgap/app/services/cross_check.py Adds chunking, retries, JSON parsing helper, and a patch-pack extraction helper.
specgap/app/services/biz_engine.py Updates the legal agent prompt, introduces schema normalization, and adds retries/chunking.
specgap/app/core/prompts.py Refines prompt templates to enforce JSON-only structured outputs.
Comments suppressed due to low confidence (5)

specgap/app/services/tech_engine.py:100

  • validate_json() returns an {error: ...} dict on JSON decode failure, but the retry loop treats that as success (logs "successful" and returns immediately). This prevents retries from ever running for invalid JSON responses; consider making JSON validation failures raise/trigger another attempt and only log success after validation passes.

NOTE: The input may contain MULTIPLE documents (e.g., Requirements and Proposals), 
separated by '=== SOURCE DOCUMENT: [Name] ==='.

Instructions:

specgap/app/services/cross_check.py:3

  • asyncio is imported but not used in this module. Please remove the unused import to avoid lint failures and keep dependencies minimal.
"""
Cross-Check Engine - Orchestrator Agent
Synthesizes findings from Tech and Legal agents into actionable outputs.

specgap/app/services/cross_check.py:90

  • Chunk pairing uses zip(tech_chunks, proposal_chunks), which silently drops remaining chunks if the tech/proposal lengths differ. This can truncate input text and lead to incorrect synthesis. Iterate over the longer list (e.g., by index with bounds checks) so all chunks are included.
        }
    ],
    "strategic_synthesis": "Executive summary (2-3 paragraphs) explaining overall deal quality",
    "reality_diagram_mermaid": "graph TD\\n    A[Start] --> B[Process]\\n    B --> C[End]",

specgap/app/services/cross_check.py:109

  • Similar to tech_engine: validate_json() returns an error dict on JSON decode failure, but the retry loop logs success and returns it immediately. That means retries won't run when the model returns invalid JSON. Treat JSON validation failures as retryable errors and only log "successful" when parsed output is valid.


def _clean_json_response(text: str) -> str:
    """Clean AI response to extract valid JSON"""
    cleaned = text.strip()

specgap/app/services/cross_check.py:2

  • Import of 'asyncio' is not used.
Cross-Check Engine - Orchestrator Agent

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +10 to +27

# -----------------------------
# Helper Functions
# -----------------------------

def chunk_text(text: str, max_len: int = 40000) -> List[str]:
"""Split large text into manageable chunks."""
return [text[i:i+max_len] for i in range(0, len(text), max_len)]

def validate_json(raw_text: str) -> Dict[str, Any]:
"""Safely parse JSON from model output."""
try:
if raw_text.startswith("```json"):
raw_text = raw_text[7:]
if raw_text.endswith("```"):
raw_text = raw_text[:-3]
return json.loads(raw_text)
except json.JSONDecodeError:
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper functions like chunk_text / validate_json / log_step are duplicated across multiple service modules (tech_engine, cross_check, biz_engine), and they already differ in behavior (e.g., code-fence stripping). Consider centralizing them in a shared utility module to avoid drift and bugs.

Copilot uses AI. Check for mistakes.

cleaned = response.text.strip()
if cleaned.startswith("```"):
cleaned = cleaned.split("```")[1]
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-fence stripping is incorrect: for a typical response like json\n{...}\n, cleaned.split("```")[1] yields json\n{...} which is not valid JSON and will reliably cause json.loads to fail. Use a more robust fence remover (strip leading json/ and surrounding whitespace) before parsing.

Suggested change
cleaned = cleaned.split("```")[1]
# Remove opening fence and optional language identifier (e.g. ```json)
if cleaned.startswith("```json"):
cleaned = cleaned[len("```json"):].strip()
else:
cleaned = cleaned[3:].strip()
# Remove trailing fence if present
if cleaned.endswith("```"):
cleaned = cleaned[:-3].strip()

Copilot uses AI. Check for mistakes.
Comment on lines +114 to +118
log_step("JSON parse failed, returning raw output snippet")
return {
"error": "Model output was not valid JSON",
"raw_output": response.text[:1500]
}
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry loop doesn't actually retry on JSON parse errors: except json.JSONDecodeError: returns immediately on the first failure, so retries is ignored for the most common failure mode (model emits non-JSON). Consider incrementing attempt and continuing (optionally with a reprompt) instead of returning immediately.

Suggested change
log_step("JSON parse failed, returning raw output snippet")
return {
"error": "Model output was not valid JSON",
"raw_output": response.text[:1500]
}
log_step(f"JSON parse failed on attempt {attempt+1}")
attempt += 1
if attempt > retries:
log_step("Max retries reached after JSON parse failures, returning raw output snippet")
return {
"error": "Model output was not valid JSON",
"raw_output": response.text[:1500]
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants