-
Notifications
You must be signed in to change notification settings - Fork 4
Add Prompt Rewriter Agent system for autonomous prompt improvement #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,37 @@ | ||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||
| Prompt Rewriter Agent System | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| An autonomous system for improving prompts based on user feedback. | ||||||||||||||||||||||||||||||||||||||||||
| This module provides agents that analyze feedback, research claims, | ||||||||||||||||||||||||||||||||||||||||||
| propose prompt modifications, test changes, and deploy improvements. | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| Usage: | ||||||||||||||||||||||||||||||||||||||||||
| from prompt_rewriter import PromptRewriterOrchestrator | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| orchestrator = PromptRewriterOrchestrator() | ||||||||||||||||||||||||||||||||||||||||||
| result = await orchestrator.process_feedback(feedback_event) | ||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+8
to
+13
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The usage example in the docstring refers to
Suggested change
|
||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| from .models import ( | ||||||||||||||||||||||||||||||||||||||||||
| FeedbackEvent, | ||||||||||||||||||||||||||||||||||||||||||
| FeedbackIntent, | ||||||||||||||||||||||||||||||||||||||||||
| ProposalType, | ||||||||||||||||||||||||||||||||||||||||||
| ProposalStatus, | ||||||||||||||||||||||||||||||||||||||||||
| ResearchResult, | ||||||||||||||||||||||||||||||||||||||||||
| PromptProposal, | ||||||||||||||||||||||||||||||||||||||||||
| ExperimentResult, | ||||||||||||||||||||||||||||||||||||||||||
| EvaluationResult, | ||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| __all__ = [ | ||||||||||||||||||||||||||||||||||||||||||
| "FeedbackEvent", | ||||||||||||||||||||||||||||||||||||||||||
| "FeedbackIntent", | ||||||||||||||||||||||||||||||||||||||||||
| "ProposalType", | ||||||||||||||||||||||||||||||||||||||||||
| "ProposalStatus", | ||||||||||||||||||||||||||||||||||||||||||
| "ResearchResult", | ||||||||||||||||||||||||||||||||||||||||||
| "PromptProposal", | ||||||||||||||||||||||||||||||||||||||||||
| "ExperimentResult", | ||||||||||||||||||||||||||||||||||||||||||
| "EvaluationResult", | ||||||||||||||||||||||||||||||||||||||||||
| ] | ||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+26
to
+35
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sort Minor lint issue; sorting avoids CI noise. 💡 Suggested fix __all__ = [
- "FeedbackEvent",
- "FeedbackIntent",
- "ProposalType",
- "ProposalStatus",
- "ResearchResult",
- "PromptProposal",
- "ExperimentResult",
- "EvaluationResult",
+ "EvaluationResult",
+ "ExperimentResult",
+ "FeedbackEvent",
+ "FeedbackIntent",
+ "PromptProposal",
+ "ProposalStatus",
+ "ProposalType",
+ "ResearchResult",
]📝 Committable suggestion
Suggested change
🧰 Tools🪛 Ruff (0.14.13)26-35: Apply an isort-style sorting to (RUF022) 🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| __version__ = "0.1.0" | ||||||||||||||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| """ | ||
| Prompt Rewriter Agents | ||
|
|
||
| Each agent is responsible for a specific step in the prompt rewriting pipeline: | ||
|
|
||
| 1. FeedbackIntakeAgent - Analyzes user feedback to classify intent | ||
| 2. ResearchAgent - Conducts web research on disputed claims | ||
| 3. ProposalWriterAgent - Generates structured prompt modification proposals | ||
| 4. ExperimentRunnerAgent - Tests proposals against real snippets | ||
| 5. EvaluationAgent - Decides whether to accept, refine, or reject proposals | ||
| 6. SemanticSearchAgent - Finds similar snippets for broader testing | ||
| 7. DeploymentAgent - Applies approved changes to production | ||
| """ | ||
|
|
||
| from .base import BaseAgent | ||
|
|
||
| __all__ = ["BaseAgent"] |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,250 @@ | ||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||
| Base agent class for the Prompt Rewriter system. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| All agents inherit from BaseAgent, which provides: | ||||||||||||||||||||||||||||||
| - Logging infrastructure | ||||||||||||||||||||||||||||||
| - LLM calling utilities | ||||||||||||||||||||||||||||||
| - Error handling and retry logic | ||||||||||||||||||||||||||||||
| - Database access | ||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| import logging | ||||||||||||||||||||||||||||||
| import time | ||||||||||||||||||||||||||||||
| from abc import ABC, abstractmethod | ||||||||||||||||||||||||||||||
| from datetime import datetime | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| from datetime import datetime | |
| from datetime import datetime, timezone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When proposal_id is None, it's defaulted to a nil UUID. This will cause a foreign key violation when _save_log_entry is called, as a proposal with a nil UUID won't exist in the prompt_rewrite_proposals table. The proposal_id should be passed as None to the AgentLogEntry constructor. I've added a separate comment on src/prompt_rewriter/models.py to make the proposal_id field optional in the AgentLogEntry model to support this change.
| log_entry = AgentLogEntry( | |
| agent_name=self.name, | |
| proposal_id=proposal_id or UUID("00000000-0000-0000-0000-000000000000"), | |
| input_summary=self._summarize_input(input_data), | |
| ) | |
| log_entry = AgentLogEntry( | |
| agent_name=self.name, | |
| proposal_id=proposal_id, | |
| input_summary=self._summarize_input(input_data), | |
| ) |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using datetime.utcnow() is deprecated as of Python 3.12. The existing codebase uses datetime.now(timezone.utc) (see src/processing_pipeline/stage_3.py:379 and src/processing_pipeline/supabase_utils.py:290). Please update to use datetime.now(timezone.utc) for consistency and to avoid deprecation warnings.
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using datetime.utcnow() is deprecated as of Python 3.12. The existing codebase uses datetime.now(timezone.utc) (see src/processing_pipeline/stage_3.py:379 and src/processing_pipeline/supabase_utils.py:290). Please update to use datetime.now(timezone.utc) for consistency and to avoid deprecation warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an inconsistency between the Pydantic model and the database schema. The AgentLogEntry model uses llm_tokens_used, but the database column is llm_total_tokens. The key used here for insertion is llm_total_tokens. To improve clarity and maintainability, it's best to use the same name in all places. I'd suggest renaming llm_tokens_used to llm_total_tokens in src/prompt_rewriter/models.py.
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Supabase insert operation is not awaited in this async function. The supabase-py client's execute() method is synchronous and will block the event loop. Consider using asyncio.to_thread() to run this in a thread pool, or verify if there's an async version of the Supabase client available.
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code imports google.generativeai (the older SDK), but the existing codebase uses from google import genai (the newer Google Genai SDK). This creates an inconsistency and may cause compatibility issues. Consider updating to use the newer SDK's genai.Client API pattern as shown in src/processing_pipeline/stage_1.py:570 and src/processing_pipeline/stage_3.py:373, which provides better async support and consistency with the rest of the codebase.
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model_instance.generate_content() call in call_llm() is not awaited, but this is an async function. The google.generativeai library may not support async operations, which could cause this async function to block. Consider either making this a synchronous function or using asyncio.to_thread() to run the blocking call in a thread pool to avoid blocking the event loop.
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code imports google.generativeai (the older SDK), but the existing codebase uses from google import genai (the newer Google Genai SDK). This creates an inconsistency and may cause compatibility issues. Consider updating to use the newer SDK's genai.Client API pattern for consistency with the rest of the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of call_llm_with_search has a couple of issues that will likely cause a runtime error:
GenerateContentConfigis passed to thegeneration_configparameter ofgenerate_content, which expects agenai.GenerationConfiginstance.- The
system_instructionis passed toGenerateContentConfigbut it should be passed to theGenerativeModelconstructor. The model is currently instantiated without it.
To fix this, you should pass system_instruction to the model constructor and tools directly to generate_content.
| config = GenerateContentConfig( | |
| tools=tools, | |
| system_instruction=system_instruction, | |
| ) | |
| model_instance = genai.GenerativeModel(model_name=model_name) | |
| response = model_instance.generate_content(prompt, generation_config=config) | |
| model_instance = genai.GenerativeModel( | |
| model_name=model_name, | |
| system_instruction=system_instruction, | |
| ) | |
| response = model_instance.generate_content(prompt, tools=tools) |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model_instance.generate_content() call in call_llm_with_search() is not awaited, but this is an async function. The google.generativeai library may not support async operations, which could cause this async function to block. Consider either making this a synchronous function or using asyncio.to_thread() to run the blocking call in a thread pool to avoid blocking the event loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
google-generativeai Python SDK async methods generate_content_async
💡 Result:
In the legacy google-generativeai Python SDK (the one you import as google.generativeai as genai), there is a method named GenerativeModel.generate_content_async(...), but Google’s own migration guide shows it being called without await (it returns an object directly), and there are open bug reports where trying to await it raises TypeError: object GenerateContentResponse can't be used in 'await' expression (notably in Colab / Python 3.11). [1], [2]
In the current SDK (google-genai, imported as from google import genai), you don’t use generate_content_async. Instead, every async method lives under client.aio and is awaited:
from google import genai
client = genai.Client()
response = await client.aio.models.generate_content(
model="gemini-2.0-flash",
contents="Tell me a story in 300 words."
)
print(response.text)This is the recommended replacement for generate_content_async, and streaming also has an async version under client.aio. [1], [3]
Sources: [1] Google migration guide, [2] legacy SDK issue #732, [3] googleapis/python-genai (Google Gen AI Python SDK) README.
🏁 Script executed:
# Search for any usage of response.candidates in the codebase
rg "response\.candidates" --type pyRepository: PublicDataWorks/verdad
Length of output: 1314
🏁 Script executed:
# Check the actual implementation around line 160-235
cd src && head -250 prompt_rewriter/agents/base.py | tail -100Repository: PublicDataWorks/verdad
Length of output: 3482
🏁 Script executed:
# Look for any imports or existing async patterns in the file
rg "asyncio|async def" src/prompt_rewriter/agents/base.pyRepository: PublicDataWorks/verdad
Length of output: 275
Avoid blocking the event loop in async LLM calls; handle empty response candidates.
The generate_content() method is synchronous and will block the event loop when called directly in async methods. Additionally, response.candidates[0] is accessed without ensuring the candidates list is non-empty, which can raise IndexError.
For the blocking I/O, use asyncio.to_thread() as a practical workaround for the legacy SDK. The long-term solution is to migrate to the newer google-genai SDK (imported as from google import genai), which provides proper async methods via client.aio.models.generate_content().
For response handling, check that response.candidates is non-empty before accessing response.candidates[0]:
💡 Suggested fixes
+import asyncio
...
- response = model_instance.generate_content(prompt)
+ response = await asyncio.to_thread(model_instance.generate_content, prompt)
return response.text
...
- response = model_instance.generate_content(prompt, generation_config=config)
+ response = await asyncio.to_thread(
+ model_instance.generate_content,
+ prompt,
+ generation_config=config,
+ )
# Extract grounding metadata
sources = []
- if hasattr(response.candidates[0], "grounding_metadata"):
- metadata = response.candidates[0].grounding_metadata
+ if response.candidates and hasattr(response.candidates[0], "grounding_metadata"):
+ metadata = response.candidates[0].grounding_metadata🤖 Prompt for AI Agents
In `@src/prompt_rewriter/agents/base.py` around lines 160 - 235,
call_llm_with_search currently calls the synchronous genai.generate_content
directly (blocking the event loop) and assumes response.candidates[0] exists;
change it to run the blocking call inside asyncio.to_thread(...) (wrapping the
model_instance.generate_content call) and after getting the response validate
that response.candidates is non-empty before accessing [0]; if empty, return a
sensible fallback (e.g., empty string and empty grounding list) and otherwise
extract response.candidates[0].content/text and grounding sources to return the
tuple (ensure you keep using model_instance.generate_content, generation config
construction with GenerateContentConfig/Tool/GoogleSearch, and check
response.candidates to avoid IndexError).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, find and examine the base.py file
find . -name "base.py" -path "*/prompt_rewriter/agents/*" -type fRepository: PublicDataWorks/verdad
Length of output: 102
🏁 Script executed:
# Check the repository structure
git ls-files | grep -E "(base\.py|requirements|setup\.py|pyproject\.toml)" | head -20Repository: PublicDataWorks/verdad
Length of output: 191
🌐 Web query:
google-generativeai SDK generate_content empty candidates list behavior
💡 Result:
In the Gemini API (which the google-generativeai SDK wraps), an empty response.candidates list is expected only when the prompt is blocked / rejected, not when only the model output is blocked.
- The API “returns either all requested candidates or none of them” and “returns no candidates at all only if there was something wrong with the prompt (check
promptFeedback).” IfpromptFeedback.blockReasonis set, “the prompt was blocked and no candidates are returned.” [1] - Separately, you can get a candidate object whose
contenthas no parts / no text when the candidate is blocked by safety filters, in which case you should inspectcandidate.finish_reasonandcandidate.safety_ratingsinstead of assumingcontent.parts[0].textexists. (This commonly shows up as “Response candidate content has no parts … likely blocked by the safety filters.”) [2] - If you’re seeing “sometimes empty text” in
output.candidates[0].content.parts[0].texteven for benign prompts, that has been reported as intermittent in Google’s SDK issue trackers; you should treat it as a case your code must handle (and log full response incl.promptFeedback,finish_reason, and safety ratings). [3]
Practical handling:
- If
not response.candidates: checkresponse.prompt_feedback.block_reason(+safety_ratings) and treat as “prompt blocked/invalid”. - Else if candidate exists but has no text/parts: treat as “output blocked” and use
candidate.finish_reason/candidate.safety_ratings. - Only read
response.text/candidate.content.parts[0].textafter those checks.
Sources: [1] Gemini API GenerateContentResponse / PromptFeedback reference, [2] example of blocked candidate with no parts in Vertex AI issue, [3] intermittent empty text report in Google’s SDK issue tracker.
🌐 Web query:
google-generativeai SDK response structure grounding_metadata documentation
💡 Result:
In the google-generativeai Python SDK, grounding information is returned per candidate. After a generate_content(...) call, you access it like:
response.candidates[0].grounding_metadata- then typically:
response.candidates[0].grounding_metadata.grounding_chunksresponse.candidates[0].grounding_metadata.grounding_supports
This access pattern is shown in Google’s Gemini API grounding guide (Python example). [1]
Where this is documented (authoritative schemas)
-
Gemini API (Generative Language) –
Candidateschema
CandidateincludesgroundingMetadata(“Grounding metadata for the candidate … populated for GenerateContent calls”). [2] -
Vertex AI –
GroundingMetadataschema (field meanings)
Documents the main fields you’ll see insidegrounding_metadata, including:
webSearchQueriesretrievalQueriesgroundingChunksgroundingSupportsretrievalMetadata(e.g.,googleSearchDynamicRetrievalScore) [3]
Quick mental model of the response structure
response.candidates[]→ each candidate may have its owngrounding_metadata. [1][2]grounding_chunks→ the retrieved sources (URLs / docs / maps places, depending on tool). [1][3]grounding_supports→ spans/segments of the generated text mapped to chunk indices (for inline citations). [1][3]
Sources:
[1] Gemini API “Grounding with Google Search” (example accessing response.candidates[0].grounding_metadata...)
[2] Gemini API “Candidate” reference (includes groundingMetadata, populated for GenerateContent)
[3] Vertex AI REST reference: GroundingMetadata fields and definitions
🏁 Script executed:
# Read the relevant section of base.py
sed -n '230,260p' src/prompt_rewriter/agents/base.pyRepository: PublicDataWorks/verdad
Length of output: 855
🏁 Script executed:
# Get more context around this code - look at the function definition
sed -n '200,260p' src/prompt_rewriter/agents/base.pyRepository: PublicDataWorks/verdad
Length of output: 1789
🏁 Script executed:
# Check imports to understand how response object is created
head -40 src/prompt_rewriter/agents/base.pyRepository: PublicDataWorks/verdad
Length of output: 1067
🏁 Script executed:
# Search for other uses of response.candidates in the codebase
rg "response\.candidates" --type python -B 2 -A 2Repository: PublicDataWorks/verdad
Length of output: 96
🏁 Script executed:
# Check if there's any existing error handling for empty candidates
rg "candidates" src/prompt_rewriter/agents/base.py -B 2 -A 2Repository: PublicDataWorks/verdad
Length of output: 364
🏁 Script executed:
# Verify there's no surrounding try-catch that might handle this
sed -n '220,270p' src/prompt_rewriter/agents/base.pyRepository: PublicDataWorks/verdad
Length of output: 1231
🏁 Script executed:
# Check if there are any tests for this error case
find . -name "*.py" -path "*/test*" -type f | xargs rg "call_llm_with_search|candidates\[0\]" 2>/dev/null | head -20Repository: PublicDataWorks/verdad
Length of output: 48
Guard against empty candidates before indexing.
response.candidates[0] will raise IndexError if the SDK returns no candidates (which occurs when the prompt is blocked). The defensive check prevents pipeline crashes.
Suggested fix
sources = []
- if hasattr(response.candidates[0], "grounding_metadata"):
- metadata = response.candidates[0].grounding_metadata
+ candidates = getattr(response, "candidates", None) or []
+ if candidates and hasattr(candidates[0], "grounding_metadata"):
+ metadata = candidates[0].grounding_metadata
if hasattr(metadata, "grounding_chunks"):
for chunk in metadata.grounding_chunks:
if hasattr(chunk, "web"):
sources.append(
{
"url": chunk.web.uri,
"title": chunk.web.title,
}
)🤖 Prompt for AI Agents
In `@src/prompt_rewriter/agents/base.py` around lines 236 - 249, The code accesses
response.candidates[0] without ensuring candidates is non-empty, causing
IndexError when the SDK returns no candidates; update the grounding metadata
extraction in src/prompt_rewriter/agents/base.py to first check that response
has a non-empty candidates list (e.g., if getattr(response, "candidates", None)
and len(response.candidates) > 0) before referencing response.candidates[0], and
only then proceed to inspect grounding_metadata and grounding_chunks (symbols:
response, candidates, grounding_metadata, grounding_chunks, sources); if there
are no candidates, skip extraction or leave sources empty to avoid crashing the
pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usage example references
PromptRewriterOrchestratorwhich is not exported. Update the docs or export the class if intended.