Enhance Agents and Refine Prompt Templates for Improved Output Quality by ibadurrehmandg · Pull Request #4 · aunraza19/SpecGap

ibadurrehmandg · 2026-02-08T19:58:32Z

No description provided.

Copilot

Pull request overview

This PR aims to improve LLM agent output quality and robustness by refining prompt templates and adding chunking/retry/logging behavior to the tech/legal/orchestrator service layers.

Changes:

Added chunking, basic logging, and retry loops to the Tech Gap and Cross-check orchestrator services.
Reworked the Legal leverage agent prompt and added output “schema guard” logic.
Refined council prompt templates for more structured JSON-only outputs.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
specgap/specgap_audits.db	Adds a SQLite DB file to the repo (appears to be an audit persistence artifact).
specgap/app/services/tech_engine.py	Adds helper utilities, chunking, and retry logic around tech gap analysis.
specgap/app/services/cross_check.py	Adds chunking, retries, JSON parsing helper, and a patch-pack extraction helper.
specgap/app/services/biz_engine.py	Updates the legal agent prompt, introduces schema normalization, and adds retries/chunking.
specgap/app/core/prompts.py	Refines prompt templates to enforce JSON-only structured outputs.

Comments suppressed due to low confidence (5)

specgap/app/services/tech_engine.py:100

validate_json() returns an {error: ...} dict on JSON decode failure, but the retry loop treats that as success (logs "successful" and returns immediately). This prevents retries from ever running for invalid JSON responses; consider making JSON validation failures raise/trigger another attempt and only log success after validation passes.


NOTE: The input may contain MULTIPLE documents (e.g., Requirements and Proposals), 
separated by '=== SOURCE DOCUMENT: [Name] ==='.

Instructions:

specgap/app/services/cross_check.py:3

asyncio is imported but not used in this module. Please remove the unused import to avoid lint failures and keep dependencies minimal.

"""
Cross-Check Engine - Orchestrator Agent
Synthesizes findings from Tech and Legal agents into actionable outputs.

specgap/app/services/cross_check.py:90

Chunk pairing uses zip(tech_chunks, proposal_chunks), which silently drops remaining chunks if the tech/proposal lengths differ. This can truncate input text and lead to incorrect synthesis. Iterate over the longer list (e.g., by index with bounds checks) so all chunks are included.

        }
    ],
    "strategic_synthesis": "Executive summary (2-3 paragraphs) explaining overall deal quality",
    "reality_diagram_mermaid": "graph TD\\n    A[Start] --> B[Process]\\n    B --> C[End]",

specgap/app/services/cross_check.py:109

Similar to tech_engine: validate_json() returns an error dict on JSON decode failure, but the retry loop logs success and returns it immediately. That means retries won't run when the model returns invalid JSON. Treat JSON validation failures as retryable errors and only log "successful" when parsed output is valid.



def _clean_json_response(text: str) -> str:
    """Clean AI response to extract valid JSON"""
    cleaned = text.strip()

specgap/app/services/cross_check.py:2

Import of 'asyncio' is not used.

Cross-Check Engine - Orchestrator Agent

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-08T20:02:52Z

+
+# -----------------------------
+# Helper Functions
+# -----------------------------
+
+def chunk_text(text: str, max_len: int = 40000) -> List[str]:
+    """Split large text into manageable chunks."""
+    return [text[i:i+max_len] for i in range(0, len(text), max_len)]
+
+def validate_json(raw_text: str) -> Dict[str, Any]:
+    """Safely parse JSON from model output."""
+    try:
+        if raw_text.startswith("```json"):
+            raw_text = raw_text[7:]
+        if raw_text.endswith("```"):
+            raw_text = raw_text[:-3]
+        return json.loads(raw_text)
+    except json.JSONDecodeError:


Helper functions like chunk_text / validate_json / log_step are duplicated across multiple service modules (tech_engine, cross_check, biz_engine), and they already differ in behavior (e.g., code-fence stripping). Consider centralizing them in a shared utility module to avoid drift and bugs.

Copilot · 2026-02-08T20:02:52Z

+
+            cleaned = response.text.strip()
+            if cleaned.startswith("```"):
+                cleaned = cleaned.split("```")[1]


Code-fence stripping is incorrect: for a typical response like json\n{...}\n, cleaned.split("```")[1] yields json\n{...} which is not valid JSON and will reliably cause json.loads to fail. Use a more robust fence remover (strip leading json/ and surrounding whitespace) before parsing.

Suggested change

cleaned = cleaned.split("```")[1]

# Remove opening fence and optional language identifier (e.g. ```json)

if cleaned.startswith("```json"):

cleaned = cleaned[len("```json"):].strip()

else:

cleaned = cleaned[3:].strip()

# Remove trailing fence if present

if cleaned.endswith("```"):

cleaned = cleaned[:-3].strip()

Copilot · 2026-02-08T20:02:53Z

+            log_step("JSON parse failed, returning raw output snippet")
+            return {
+                "error": "Model output was not valid JSON",
+                "raw_output": response.text[:1500]
+            }


The retry loop doesn't actually retry on JSON parse errors: except json.JSONDecodeError: returns immediately on the first failure, so retries is ignored for the most common failure mode (model emits non-JSON). Consider incrementing attempt and continuing (optionally with a reprompt) instead of returning immediately.

Suggested change

log_step("JSON parse failed, returning raw output snippet")

return {

"error": "Model output was not valid JSON",

"raw_output": response.text[:1500]

}

log_step(f"JSON parse failed on attempt {attempt+1}")

attempt += 1

if attempt > retries:

log_step("Max retries reached after JSON parse failures, returning raw output snippet")

return {

"error": "Model output was not valid JSON",

"raw_output": response.text[:1500]

}

enhanced agents and prompt templates

bf86cad

Copilot AI review requested due to automatic review settings February 8, 2026 19:58

Copilot started reviewing on behalf of ibadurrehmandg February 8, 2026 19:58 View session

Merge branch 'main' into IR

0ee32c0

Copilot AI reviewed Feb 8, 2026

View reviewed changes

Add files via upload

9aea73b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance Agents and Refine Prompt Templates for Improved Output Quality#4

Enhance Agents and Refine Prompt Templates for Improved Output Quality#4
ibadurrehmandg wants to merge 3 commits intomainfrom
IR

ibadurrehmandg commented Feb 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-                cleaned = cleaned.split("```")[1]
+                # Remove opening fence and optional language identifier (e.g. ```json)
+                if cleaned.startswith("```json"):
+                    cleaned = cleaned[len("```json"):].strip()
+                else:
+                    cleaned = cleaned[3:].strip()
+                # Remove trailing fence if present
+                if cleaned.endswith("```"):
+                    cleaned = cleaned[:-3].strip()

Conversation

ibadurrehmandg commented Feb 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants