Add Prompt Rewriter Agent system for autonomous prompt improvement#55
Add Prompt Rewriter Agent system for autonomous prompt improvement#55rajivsinclair wants to merge 1 commit intomainfrom
Conversation
This adds a comprehensive system for dynamically improving prompts based on user feedback. The system includes: - Technical proposal document explaining the architecture and workflow - Implementation roadmap with detailed task breakdowns - Database migration for tracking proposals, experiments, and knowledge base - Initial module structure with models, config, and base agent class - Feedback Intake Agent implementation for classifying user feedback - Orchestrator Prefect flow coordinating the multi-agent pipeline The Prompt Rewriter Agent will: 1. Capture user feedback (thumbs down, comments, label disputes) 2. Research disputed claims via web search 3. Generate structured prompt modification proposals 4. Test proposals experimentally against real snippets 5. Evaluate results and decide whether to deploy 6. Update prompts and reprocess affected historical snippets
WalkthroughThis PR introduces a comprehensive Prompt Rewriter Agent system comprising architecture documentation, foundational models, agent framework, configuration infrastructure, database schema, and an orchestration pipeline to autonomously improve system prompts based on user feedback through iterative research, proposal generation, experimentation, and deployment. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Feedback as Feedback System
participant Pipeline as Orchestrator<br/>(Prefect)
participant FI as Feedback Intake<br/>Agent
participant LLM as LLM Service
participant Research as Research Agent<br/>(Future)
participant Writer as Proposal Writer<br/>Agent (Future)
participant Experiment as Experiment<br/>Runner (Future)
participant Evaluator as Evaluation<br/>Agent (Future)
participant Deploy as Deployment<br/>Agent (Future)
participant DB as Supabase
User->>Feedback: Submit feedback on snippet
Feedback->>DB: Create user_feedback_event
Pipeline->>DB: Fetch next pending proposal
Pipeline->>FI: run(FeedbackEvent)
FI->>DB: Retrieve snippet context
FI->>FI: Build classification prompt
FI->>LLM: call_llm(classification prompt)
LLM-->>FI: LLM response
FI->>FI: Parse structured analysis
FI-->>Pipeline: FeedbackAnalysis
Pipeline->>DB: Update status → researching
Pipeline->>Research: research_claim(analysis)
Research-->>Pipeline: ResearchResult (mock)
Pipeline->>DB: Update status → writing_proposal
Pipeline->>Writer: write_proposal(analysis, research)
Writer-->>Pipeline: PromptProposal (mock)
Pipeline->>DB: Update status → experimenting
Pipeline->>Experiment: run_experiments(proposal)
Experiment-->>Pipeline: ExperimentResult (mock)
Pipeline->>DB: Update status → evaluating
Pipeline->>Evaluator: evaluate_results(experiments)
Evaluator-->>Pipeline: EvaluationResult (accept/refine/reject)
alt Decision = ACCEPT
Pipeline->>DB: Update status → deploying
Pipeline->>Deploy: deploy_changes(proposal)
Deploy-->>Pipeline: DeploymentResult
Pipeline->>DB: Update status → deployed
else Decision = REFINE
Pipeline->>DB: Update status → refining
else Decision = REJECT
Pipeline->>DB: Update status → rejected
end
Pipeline->>DB: Save AgentLogEntry
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Substantive architectural foundation with dense logic across multiple interconnected systems (agent framework, LLM integration, Prefect orchestration, comprehensive database schema with RLS/triggers). Requires careful validation of data model consistency (Python ↔ SQL), error handling patterns, async execution safety, and security policies despite homogeneous coherence as a single feature. Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (4.0.4)src/prompt_rewriter/config.py************* Module .pylintrc src/prompt_rewriter/agents/feedback_intake.py************* Module .pylintrc ... [truncated 2019 characters] ... ntakeAgent._parse_response", src/prompt_rewriter/agents/base.py************* Module .pylintrc ... [truncated 8064 characters] ... se",
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @rajivsinclair, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a sophisticated, autonomous Prompt Rewriter Agent system designed to significantly enhance the VERDAD platform's ability to combat misinformation. By creating a closed-loop learning system, it transforms user feedback into actionable prompt improvements, ensuring that the system continuously adapts and becomes more accurate over time. The system orchestrates multiple specialized AI agents to analyze feedback, conduct research, propose and test prompt modifications, and deploy validated changes, ultimately leading to a more robust and self-improving fact-checking mechanism. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Caution
Changes requested ❌
Reviewed everything up to bf22842 in 2 minutes and 23 seconds. Click for details.
- Reviewed
2758lines of code in10files - Skipped
0files when reviewing. - Skipped posting
8draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. docs/PROMPT_REWRITER_AGENT_PROPOSAL.md:1
- Draft comment:
Consider adding language identifiers (e.g.python ormarkdown) to code blocks for improved syntax highlighting. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =20%<= threshold50%This comment is purely informative and does not provide a specific suggestion or highlight a potential issue with the code. It suggests an improvement for readability but does not address a functional aspect of the code.
2. docs/PROMPT_REWRITER_IMPLEMENTATION_ROADMAP.md:1
- Draft comment:
Consider linking roadmap tasks to external issue trackers for better traceability and clarity in effort estimates. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and suggests a process improvement rather than addressing a specific code issue or providing a code suggestion. It doesn't align with the rules for useful comments in a pull request review.
3. src/prompt_rewriter/agents/base.py:184
- Draft comment:
Consider adding more robust error handling in the call_llm function to deal with potential API errors or empty responses. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% The comment is suggesting a code quality improvement ("Consider adding more robust error handling"). While this might be valid advice, it falls into the category of speculative suggestions rather than identifying a clear bug. The comment uses "Consider" which is a soft suggestion, not a definitive issue. The docstring claims the function "handles model selection and basic error handling" but I don't see explicit error handling - however, the errors might be allowed to propagate up to theexecute()method which does have error handling. The comment doesn't specify what specific errors need to be handled or why the current approach is insufficient. This seems like a general code quality suggestion rather than identifying a specific problem. The function might intentionally let exceptions propagate to the caller (theexecute()method has comprehensive error handling). Without knowing the specific API errors that could occur or seeing evidence that the current approach is problematic, this might be an unnecessary suggestion. The comment is vague about what "more robust error handling" means. While the critique is valid, the comment is still speculative and not actionable. It doesn't identify a specific bug or provide clear guidance on what needs to be changed. The rules state "Do NOT make speculative comments" and "Comments that suggest code quality refactors are good! But only if they are actionable and clear." This comment is neither specific nor actionable - it just says "consider adding" without explaining what specific errors need handling. This comment should be deleted. It's a vague, speculative suggestion that isn't actionable. It doesn't identify a specific bug or provide clear guidance on what needs to be fixed. The comment uses soft language ("Consider") and doesn't explain what specific errors need to be handled or why the current approach is insufficient.
4. src/prompt_rewriter/agents/feedback_intake.py:133
- Draft comment:
Ensure that raw LLM responses are logged at DEBUG level to aid troubleshooting when JSON parsing fails. - Reason this comment was not posted:
Comment looked like it was already resolved.
5. src/prompt_rewriter/config.py:5
- Draft comment:
Good use of environment variables with the walrus operator; verify that default values (especially for sensitive settings) are secure for production. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
6. src/prompt_rewriter/main.py:239
- Draft comment:
Consider adding explicit error handling and transaction control around proposal status updates to ensure consistency in failure scenarios. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This comment is suggesting a general improvement about error handling and transaction control. While this might be good advice, it's not pointing to a specific bug or clear issue in the code. The comment doesn't specify what exactly should be done - it's more of a "consider" suggestion. According to the rules, I should not keep comments that are speculative or not clearly actionable. The comment doesn't show strong evidence of a definite problem - it's more of a best practice suggestion. The code is using Prefect tasks which have their own retry mechanisms (as seen in theload_proposaltask with retries=2). Without seeing actual failures or understanding the full error handling strategy, this is speculative advice. The comment might be valid - if intermediate steps fail, the proposal status could be left in an inconsistent state. This could be a real issue in production. The suggestion about transaction control could be important for data consistency. While the concern about consistency is valid in principle, the comment is too vague and doesn't provide specific actionable guidance. It says "consider" which is a weak suggestion. The code is using Prefect's task system which has built-in error handling and retry mechanisms. Without evidence of a specific failure scenario or concrete implementation guidance, this is speculative advice rather than a clear code issue. This comment should be deleted. It's a vague, speculative suggestion ("consider adding") without specific actionable guidance. It doesn't point to a clear bug or definite issue, and the rules state not to make speculative comments or suggestions that aren't clearly actionable.
7. src/prompt_rewriter/models.py:16
- Draft comment:
Verify that enum values (e.g. in FeedbackIntent and ProposalType) match exactly with the expected values in the database schema to avoid inconsistencies. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to verify that enum values match the database schema, which is a form of asking for confirmation or verification. This violates the rule against asking the author to confirm or ensure things. The comment does not provide a specific suggestion or point out a specific issue, making it unapproved.
8. supabase/migrations/20260124000000_prompt_rewriter_agent_schema.sql:323
- Draft comment:
Ensure that the RLS policies and timestamp triggers are thoroughly tested under concurrent load to avoid unexpected access issues in production. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is asking the PR author to ensure that certain aspects are tested, which violates the rule against asking for confirmation or testing. It doesn't provide a specific suggestion or point out a specific issue in the code.
Workflow ID: wflow_3PL6o5B9Eboy9vBg
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
| propose prompt modifications, test changes, and deploy improvements. | ||
|
|
||
| Usage: | ||
| from prompt_rewriter import PromptRewriterOrchestrator |
There was a problem hiding this comment.
The usage example references PromptRewriterOrchestrator which is not exported. Update the docs or export the class if intended.
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive Prompt Rewriter Agent system, a significant feature for autonomously improving prompts based on user feedback. The changes include detailed design documents, a full database schema, and the initial implementation of the agent framework, including the base agent, models, configuration, and the feedback intake agent. The overall architecture is well-thought-out and robust.
My review focuses on ensuring data integrity, correcting potential runtime errors in the agent base class, and improving code clarity and consistency. I've identified a few high-severity issues in src/prompt_rewriter/agents/base.py related to handling of proposal_id and an incorrect LLM API call that could lead to runtime failures. I've also made some medium-severity suggestions regarding database schema integrity, documentation, and naming consistency between the data models and the database schema.
Overall, this is a great foundation for the new system. Addressing these points will improve its stability and maintainability.
| log_entry = AgentLogEntry( | ||
| agent_name=self.name, | ||
| proposal_id=proposal_id or UUID("00000000-0000-0000-0000-000000000000"), | ||
| input_summary=self._summarize_input(input_data), | ||
| ) |
There was a problem hiding this comment.
When proposal_id is None, it's defaulted to a nil UUID. This will cause a foreign key violation when _save_log_entry is called, as a proposal with a nil UUID won't exist in the prompt_rewrite_proposals table. The proposal_id should be passed as None to the AgentLogEntry constructor. I've added a separate comment on src/prompt_rewriter/models.py to make the proposal_id field optional in the AgentLogEntry model to support this change.
| log_entry = AgentLogEntry( | |
| agent_name=self.name, | |
| proposal_id=proposal_id or UUID("00000000-0000-0000-0000-000000000000"), | |
| input_summary=self._summarize_input(input_data), | |
| ) | |
| log_entry = AgentLogEntry( | |
| agent_name=self.name, | |
| proposal_id=proposal_id, | |
| input_summary=self._summarize_input(input_data), | |
| ) |
| config = GenerateContentConfig( | ||
| tools=tools, | ||
| system_instruction=system_instruction, | ||
| ) | ||
|
|
||
| model_instance = genai.GenerativeModel(model_name=model_name) | ||
|
|
||
| response = model_instance.generate_content(prompt, generation_config=config) |
There was a problem hiding this comment.
The implementation of call_llm_with_search has a couple of issues that will likely cause a runtime error:
GenerateContentConfigis passed to thegeneration_configparameter ofgenerate_content, which expects agenai.GenerationConfiginstance.- The
system_instructionis passed toGenerateContentConfigbut it should be passed to theGenerativeModelconstructor. The model is currently instantiated without it.
To fix this, you should pass system_instruction to the model constructor and tools directly to generate_content.
| config = GenerateContentConfig( | |
| tools=tools, | |
| system_instruction=system_instruction, | |
| ) | |
| model_instance = genai.GenerativeModel(model_name=model_name) | |
| response = model_instance.generate_content(prompt, generation_config=config) | |
| model_instance = genai.GenerativeModel( | |
| model_name=model_name, | |
| system_instruction=system_instruction, | |
| ) | |
| response = model_instance.generate_content(prompt, tools=tools) |
| """Log entry for agent execution.""" | ||
|
|
||
| agent_name: str | ||
| proposal_id: UUID |
There was a problem hiding this comment.
The proposal_id should be optional to support agent logs that are not associated with a specific proposal. Currently, BaseAgent.execute in src/prompt_rewriter/agents/base.py attempts to use a nil UUID when no proposal_id is provided, which will cause a foreign key violation when saving the log to the database. Making this field optional will fix the issue.
| proposal_id: UUID | |
| proposal_id: Optional[UUID] = None |
| Usage: | ||
| from prompt_rewriter import PromptRewriterOrchestrator | ||
|
|
||
| orchestrator = PromptRewriterOrchestrator() | ||
| result = await orchestrator.process_feedback(feedback_event) | ||
| """ |
There was a problem hiding this comment.
The usage example in the docstring refers to PromptRewriterOrchestrator, which is not implemented in this PR. The main entry point appears to be the prompt_rewriter_flow Prefect flow. To avoid confusion, I suggest updating the example to reflect the current implementation.
| Usage: | |
| from prompt_rewriter import PromptRewriterOrchestrator | |
| orchestrator = PromptRewriterOrchestrator() | |
| result = await orchestrator.process_feedback(feedback_event) | |
| """ | |
| Usage: | |
| from prompt_rewriter.main import prompt_rewriter_flow | |
| # The main entry point is the `prompt_rewriter_flow` Prefect flow. | |
| # It is triggered with a proposal_id. | |
| # See `src/prompt_rewriter/main.py` for usage details. | |
| """ |
| "input_data": {"summary": log_entry.input_summary}, | ||
| "output_data": {"summary": log_entry.output_summary}, | ||
| "error_message": log_entry.error_message, | ||
| "llm_total_tokens": log_entry.llm_tokens_used, |
There was a problem hiding this comment.
There's an inconsistency between the Pydantic model and the database schema. The AgentLogEntry model uses llm_tokens_used, but the database column is llm_total_tokens. The key used here for insertion is llm_total_tokens. To improve clarity and maintainability, it's best to use the same name in all places. I'd suggest renaming llm_tokens_used to llm_total_tokens in src/prompt_rewriter/models.py.
|
|
||
| -- Deployment | ||
| deployed_at TIMESTAMPTZ, | ||
| deployed_prompt_version_id UUID, -- Reference to prompt_versions if it exists |
There was a problem hiding this comment.
For better data integrity, consider adding a foreign key constraint to deployed_prompt_version_id. The proposal document mentions a reference to prompt_versions(id). If the prompt_versions table exists or is planned, adding a foreign key constraint (e.g., REFERENCES prompt_versions(id)) would prevent orphaned records and ensure referential integrity.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf22842aac
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| title=snippet.get("title", "Unknown"), | ||
| summary=snippet.get("summary", "No summary available"), | ||
| categories=", ".join(snippet.get("disinformation_categories", [])), | ||
| transcription=snippet.get("transcription", "")[:1000], # Truncate |
There was a problem hiding this comment.
Handle non-string disinformation categories before join
The prompt formatting assumes snippet["disinformation_categories"] is a list of strings, but the Stage 4 output schema defines each category as an object with english/spanish fields (and it can also be null in existing tests). In those common cases, ", ".join(...) raises a TypeError, so the feedback intake agent will crash before calling the LLM, breaking the entire rewriter flow for normal snippets. Consider normalizing to a list of strings (e.g., [c["english"] for c in categories]) and handling None explicitly.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR introduces a comprehensive Prompt Rewriter Agent system designed to autonomously improve misinformation detection prompts based on user feedback. The system implements a multi-agent pipeline that analyzes feedback, researches claims, proposes prompt modifications, tests changes experimentally, and deploys improvements.
Changes:
- Database schema migration adding 6 new tables (proposals, experiment runs, reprocess queue, knowledge facts, feedback events, agent logs) with supporting indexes, triggers, and RLS policies
- Python module structure with Pydantic models, configuration management, base agent class, and orchestrator flow using Prefect
- Feedback Intake Agent implementation for classifying and extracting actionable information from user feedback
- Technical documentation including architecture proposal and detailed implementation roadmap
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 34 comments.
Show a summary per file
| File | Description |
|---|---|
| supabase/migrations/20260124000000_prompt_rewriter_agent_schema.sql | Database schema for tracking proposals, experiments, knowledge base, and feedback events |
| src/prompt_rewriter/models.py | Pydantic models defining data structures for feedback, proposals, experiments, and results |
| src/prompt_rewriter/config.py | Configuration dataclasses with environment variable support and default values |
| src/prompt_rewriter/main.py | Prefect orchestrator flow coordinating the multi-agent pipeline from feedback to deployment |
| src/prompt_rewriter/agents/base.py | Base agent class providing LLM utilities, logging, error handling, and database access |
| src/prompt_rewriter/agents/feedback_intake.py | Feedback Intake Agent for analyzing user feedback and extracting claims |
| src/prompt_rewriter/agents/init.py | Package initialization for agent modules |
| src/prompt_rewriter/init.py | Main package initialization exposing key models and types |
| docs/PROMPT_REWRITER_AGENT_PROPOSAL.md | Technical proposal explaining architecture, workflow, and design decisions |
| docs/PROMPT_REWRITER_IMPLEMENTATION_ROADMAP.md | Detailed implementation roadmap with task breakdown and dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| proposal_id UUID NOT NULL REFERENCES prompt_rewrite_proposals(id) ON DELETE CASCADE, | ||
|
|
||
| -- Why this snippet was queued | ||
| reason TEXT NOT NULL, -- 'semantic_similarity', 'same_category', 'manual' |
There was a problem hiding this comment.
The reason field is defined as TEXT with comment indicating values 'semantic_similarity', 'same_category', 'manual', but there's no database-level constraint. Consider creating a reprocess_reason enum type to enforce valid values and maintain consistency with the schema design pattern.
| Returns: | ||
| The LLM's response text | ||
| """ | ||
| import google.generativeai as genai |
There was a problem hiding this comment.
The code imports google.generativeai (the older SDK), but the existing codebase uses from google import genai (the newer Google Genai SDK). This creates an inconsistency and may cause compatibility issues. Consider updating to use the newer SDK's genai.Client API pattern as shown in src/processing_pipeline/stage_1.py:570 and src/processing_pipeline/stage_3.py:373, which provides better async support and consistency with the rest of the codebase.
| self.supabase.table("prompt_rewriter_agent_logs").insert( | ||
| { | ||
| "agent_name": log_entry.agent_name, | ||
| "proposal_id": str(log_entry.proposal_id), | ||
| "started_at": log_entry.started_at.isoformat(), | ||
| "completed_at": ( | ||
| log_entry.completed_at.isoformat() | ||
| if log_entry.completed_at | ||
| else None | ||
| ), | ||
| "duration_ms": log_entry.duration_ms, | ||
| "status": log_entry.status, | ||
| "input_data": {"summary": log_entry.input_summary}, | ||
| "output_data": {"summary": log_entry.output_summary}, | ||
| "error_message": log_entry.error_message, | ||
| "llm_total_tokens": log_entry.llm_tokens_used, | ||
| } | ||
| ).execute() |
There was a problem hiding this comment.
The Supabase insert operation is not awaited in this async function. The supabase-py client's execute() method is synchronous and will block the event loop. Consider using asyncio.to_thread() to run this in a thread pool, or verify if there's an async version of the Supabase client available.
| extracted_claim TEXT, | ||
| user_correction TEXT, | ||
| affected_prompt_stages INTEGER[], -- e.g., [1, 3] for Stage 1 and Stage 3 | ||
| priority TEXT DEFAULT 'medium', -- 'low', 'medium', 'high', 'critical' |
There was a problem hiding this comment.
The priority field is defined as TEXT with a comment indicating values 'low', 'medium', 'high', 'critical', but this is not enforced at the database level. Consider creating a priority enum type similar to the other enums (prompt_rewrite_status, feedback_intent, proposal_type) to enforce valid values and improve data integrity.
|
|
||
| -- Processing | ||
| priority INTEGER DEFAULT 0, -- Higher = process first | ||
| status TEXT DEFAULT 'queued', -- 'queued', 'processing', 'completed', 'failed', 'skipped' |
There was a problem hiding this comment.
The status field is defined as TEXT with comment indicating values 'queued', 'processing', 'completed', 'failed', 'skipped', but there's no database-level constraint. Consider creating a queue_status enum type to enforce valid values and maintain consistency with the schema design pattern used for other enums.
| consistency_score FLOAT, | ||
|
|
||
| -- Evaluation | ||
| evaluation_decision TEXT, -- 'accept', 'refine', 'reject' |
There was a problem hiding this comment.
The evaluation_decision field is defined as TEXT with a comment indicating values 'accept', 'refine', 'reject', but there's no database-level constraint. This could lead to inconsistent data. Since there's already an EvaluationDecision enum in the Python models, consider creating a corresponding PostgreSQL enum type and using it here for consistency and data integrity.
|
|
||
| -- Version control | ||
| version INTEGER DEFAULT 1, | ||
| previous_version_id UUID REFERENCES knowledge_facts(id), |
There was a problem hiding this comment.
The previous_version_id creates a self-referential foreign key that could lead to circular references or orphaned chains if not handled carefully. There's no constraint preventing cycles (e.g., fact A references fact B as previous version, and fact B references fact A). Consider adding a CHECK constraint to ensure version numbers are monotonically increasing, or implementing versioning through a separate versions table to avoid this complexity.
| previous_version_id UUID REFERENCES knowledge_facts(id), | |
| previous_version_id UUID, |
| intent_confidence: float = Field(ge=0, le=1) | ||
| extracted_claim: Optional[str] = None | ||
| user_correction: Optional[str] = None | ||
| affected_prompt_stages: list[int] = Field(default_factory=lambda: [1, 3]) |
There was a problem hiding this comment.
The affected_prompt_stages field defaults to [1, 3] in the lambda, but this default doesn't align with the comment on line 58 of the migration file which indicates the values should represent stage numbers (e.g., Stage 1 and Stage 3). The Python model uses integers while the database uses an INTEGER[] type, which is good. However, ensure that the values 1 and 3 are meaningful in context - if they represent PromptStage values, consider documenting this or using an enum for clarity.
| content: Optional[str] = None # Comment text or feedback details | ||
| comment_id: Optional[str] = None # Liveblocks comment ID | ||
| label_id: Optional[UUID] = None | ||
| created_at: datetime = Field(default_factory=datetime.utcnow) |
There was a problem hiding this comment.
Using datetime.utcnow() is deprecated as of Python 3.12. The existing codebase uses datetime.now(timezone.utc) (see src/processing_pipeline/stage_3.py:379 and src/processing_pipeline/supabase_utils.py:290). Please update to use datetime.now(timezone.utc) for consistency and to avoid deprecation warnings.
| result = ( | ||
| client.table("prompt_rewrite_proposals") | ||
| .select("*") | ||
| .eq("id", str(proposal_id)) | ||
| .single() | ||
| .execute() | ||
| ) |
There was a problem hiding this comment.
The Supabase select operation is not awaited in this async function. The supabase-py client's execute() method is synchronous and will block the event loop. Consider using asyncio.to_thread() to run this in a thread pool for better async performance.
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In `@docs/PROMPT_REWRITER_AGENT_PROPOSAL.md`:
- Line 35: The markdown has lint issues: add a fenced-code language label to the
unlabeled code block (use ```text) and replace bold-as-heading lines for the
options with proper markdown headings (e.g., change **Option A: Inline in Prompt
Files (Recommended for MVP)** to a heading like #### Option A: Inline in Prompt
Files (Recommended for MVP)); apply the same fixes for the later occurrences
referenced (the block at 141-151) so all fenced blocks are labeled and bolded
option lines are converted to proper heading levels.
In `@docs/PROMPT_REWRITER_IMPLEMENTATION_ROADMAP.md`:
- Line 12: The bold "Priority" lines in the roadmap should be converted into
proper Markdown headings or list items and all unlabeled code fences in the
roadmap sections (the triple-backtick blocks shown in the diff) must include a
language tag (e.g., ```text) to satisfy markdownlint MD036/MD040; update each
instance that currently uses **Priority: Critical | Estimate: 1 day** (and other
bolded "Priority" occurrences) to a heading like "### Priority: Critical |
Estimate: 1 day" or an item in a list, and add the `text` language tag to each
unlabeled code fence so they become ```text ... ```, applying these changes
consistently to all roadmap sections that contain those bold lines and unlabeled
fences.
In `@src/prompt_rewriter/__init__.py`:
- Around line 26-35: The exported names in the __all__ list are unsorted causing
Ruff RUF022; open the module-level __all__ variable in __init__.py and reorder
the entries alphabetically (e.g., arrange "EvaluationResult",
"ExperimentResult", "FeedbackEvent", "FeedbackIntent", "ProposalStatus",
"ProposalType", "PromptProposal", "ResearchResult" into lexicographic order) so
the list is sorted to satisfy the linter.
In `@src/prompt_rewriter/agents/base.py`:
- Around line 160-235: call_llm_with_search currently calls the synchronous
genai.generate_content directly (blocking the event loop) and assumes
response.candidates[0] exists; change it to run the blocking call inside
asyncio.to_thread(...) (wrapping the model_instance.generate_content call) and
after getting the response validate that response.candidates is non-empty before
accessing [0]; if empty, return a sensible fallback (e.g., empty string and
empty grounding list) and otherwise extract response.candidates[0].content/text
and grounding sources to return the tuple (ensure you keep using
model_instance.generate_content, generation config construction with
GenerateContentConfig/Tool/GoogleSearch, and check response.candidates to avoid
IndexError).
- Around line 236-249: The code accesses response.candidates[0] without ensuring
candidates is non-empty, causing IndexError when the SDK returns no candidates;
update the grounding metadata extraction in src/prompt_rewriter/agents/base.py
to first check that response has a non-empty candidates list (e.g., if
getattr(response, "candidates", None) and len(response.candidates) > 0) before
referencing response.candidates[0], and only then proceed to inspect
grounding_metadata and grounding_chunks (symbols: response, candidates,
grounding_metadata, grounding_chunks, sources); if there are no candidates, skip
extraction or leave sources empty to avoid crashing the pipeline.
In `@src/prompt_rewriter/agents/feedback_intake.py`:
- Around line 99-106: The code builds `prompt` using
FEEDBACK_CLASSIFICATION_PROMPT and calls ",
".join(snippet.get("disinformation_categories", [])) which will raise if
`snippet.get("disinformation_categories")` returns None; update the `categories`
expression to defensively coerce to an iterable (e.g., use
`snippet.get("disinformation_categories") or []` or wrap with `list(...)`)
before joining so `", ".join(...)` always receives a list/tuple of strings; keep
the same variable `categories` in the FEEDBACK_CLASSIFICATION_PROMPT.format call
and ensure no other fields rely on nullable values.
In `@src/prompt_rewriter/main.py`:
- Around line 83-94: The FeedbackEvent construction is incorrectly using the
proposal ID (UUID(proposal_data["id"])) for FeedbackEvent.id; change it to use
the actual feedback event id from the payload (e.g.,
UUID(proposal_data["triggered_by_feedback_id"]) or
proposal_data.get("triggered_by_feedback_id")) and fall back to None if that
field is missing, so update the id assignment in the FeedbackEvent instantiation
in main.py accordingly (refer to FeedbackEvent and proposal_data in that block).
In `@supabase/migrations/20260124000000_prompt_rewriter_agent_schema.sql`:
- Around line 374-425: The RLS is too permissive: update grants and policies for
user-scoped data and decide ownership for internal tables. For
prompt_rewrite_proposals and user_feedback_events change GRANT ALL TO
authenticated to GRANT SELECT TO authenticated and replace their policies
("Allow authenticated read on proposals" and "Allow authenticated read on
feedback_events") from USING (true) to USING (triggered_by_user_id = auth.uid())
and USING (user_id = auth.uid()) respectively; for prompt_rewriter_agent_logs
and prompt_experiment_runs either remove the authenticated FOR SELECT policies
and make them service_role-only (keep only the "Allow service_role all ..."
policies) or tighten them to ownership-based checks (e.g., link to proposal
ownership) if users must see related rows; for knowledge_facts and
snippet_reprocess_queue confirm intended scope and either keep shared read
(USING (true)) or add ownership predicates before granting authenticated access.
🧹 Nitpick comments (4)
src/prompt_rewriter/models.py (1)
16-87: Prefer enums for feedback_type and priority to avoid invalid values.These are persisted fields; aligning them with enums (like the other status fields) improves data integrity and avoids typos.
♻️ Suggested refactor
+class FeedbackType(str, Enum): + THUMBS_DOWN = "thumbs_down" + COMMENT = "comment" + LABEL_DISPUTE = "label_dispute" + MANUAL = "manual" + +class FeedbackPriority(str, Enum): + LOW = "low" + MEDIUM = "medium" + HIGH = "high" + CRITICAL = "critical" + class FeedbackEvent(BaseModel): """A user feedback event that may trigger the rewriter.""" id: Optional[UUID] = None - feedback_type: str # 'thumbs_down', 'comment', 'label_dispute', 'manual' + feedback_type: FeedbackType snippet_id: UUID user_id: Optional[UUID] = None content: Optional[str] = None # Comment text or feedback details comment_id: Optional[str] = None # Liveblocks comment ID label_id: Optional[UUID] = None created_at: datetime = Field(default_factory=datetime.utcnow) class FeedbackAnalysis(BaseModel): """Output from the Feedback Intake Agent.""" intent: FeedbackIntent intent_confidence: float = Field(ge=0, le=1) extracted_claim: Optional[str] = None user_correction: Optional[str] = None affected_prompt_stages: list[int] = Field(default_factory=lambda: [1, 3]) - priority: str = "medium" # 'low', 'medium', 'high', 'critical' + priority: FeedbackPriority = FeedbackPriority.MEDIUM reasoning: Optional[str] = Nonesrc/prompt_rewriter/agents/feedback_intake.py (1)
118-131: Avoid blocking I/O inside async.
Supabase client calls are sync; in an async flow this can block the event loop. Considerasyncio.to_thread(...)(or a sync method) and uselogger.exceptionto preserve stack traces.🧵 Example non-blocking pattern
+import asyncio ... async def _get_snippet(self, snippet_id: UUID) -> dict | None: """Fetch snippet details from database.""" try: - result = ( - self.supabase.table("snippets") - .select("id, title, summary, transcription, disinformation_categories") - .eq("id", str(snippet_id)) - .single() - .execute() - ) + def _fetch(): + return ( + self.supabase.table("snippets") + .select("id, title, summary, transcription, disinformation_categories") + .eq("id", str(snippet_id)) + .single() + .execute() + ) + result = await asyncio.to_thread(_fetch) return result.data except Exception as e: - logger.error(f"Error fetching snippet {snippet_id}: {e}") + if isinstance(e, asyncio.CancelledError): + raise + logger.exception(f"Error fetching snippet {snippet_id}: {e}") return Nonesrc/prompt_rewriter/main.py (2)
36-72: Sync Supabase calls inside async tasks can block the event loop.
Consider running these calls in a thread (asyncio.to_thread) or making the tasks sync to avoid blocking Prefect’s async flow.Also applies to: 380-386
106-108: Placeholders are clear; let me know if you want help implementing them.
If you want, I can help draft the Research/Proposal/Experiment/Deployment agents or add guards to prevent accidental production use.Also applies to: 128-129, 157-158, 200-201
|
|
||
| ### Architecture Overview | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Fix markdownlint issues (fenced language + heading levels).
MD040 flags the unlabeled fenced block, and MD036 flags bold text used as headings for the options list.
💡 Suggested fix
-```
+```text
┌─────────────────────────────────────────────────────────────────────────────┐
...
-```
+```
-**Option A: Inline in Prompt Files (Recommended for MVP)**
+#### Option A: Inline in Prompt Files (Recommended for MVP)
-**Option B: Separate Knowledge Base File**
+#### Option B: Separate Knowledge Base File
-**Option C: Database-Stored (Future)**
+#### Option C: Database-Stored (Future)Also applies to: 141-151
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
35-35: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@docs/PROMPT_REWRITER_AGENT_PROPOSAL.md` at line 35, The markdown has lint
issues: add a fenced-code language label to the unlabeled code block (use
```text) and replace bold-as-heading lines for the options with proper markdown
headings (e.g., change **Option A: Inline in Prompt Files (Recommended for
MVP)** to a heading like #### Option A: Inline in Prompt Files (Recommended for
MVP)); apply the same fixes for the later occurrences referenced (the block at
141-151) so all fenced blocks are labeled and bolded option lines are converted
to proper heading levels.
| ## Phase 1: Foundation | ||
|
|
||
| ### 1.1 Database Schema Deployment | ||
| **Priority: Critical | Estimate: 1 day** |
There was a problem hiding this comment.
Address markdownlint MD036/MD040 in roadmap sections.
Convert bold “Priority” lines into headings or list items, and add language tags to unlabeled code fences (Lines 127, 310, 459).
💡 Suggested fix (apply consistently)
-**Priority: Critical | Estimate: 1 day**
+#### Priority: Critical | Estimate: 1 day
-```
+```text
Given this user feedback on a misinformation analysis:
...
-```
+```
-```
+```text
New prompt improvement deployed!
...
-```
+```
-```
+```text
Phase 1 (Foundation)
...
-```
+```Also applies to: 127-127, 310-310, 459-459
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
12-12: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🤖 Prompt for AI Agents
In `@docs/PROMPT_REWRITER_IMPLEMENTATION_ROADMAP.md` at line 12, The bold
"Priority" lines in the roadmap should be converted into proper Markdown
headings or list items and all unlabeled code fences in the roadmap sections
(the triple-backtick blocks shown in the diff) must include a language tag
(e.g., ```text) to satisfy markdownlint MD036/MD040; update each instance that
currently uses **Priority: Critical | Estimate: 1 day** (and other bolded
"Priority" occurrences) to a heading like "### Priority: Critical | Estimate: 1
day" or an item in a list, and add the `text` language tag to each unlabeled
code fence so they become ```text ... ```, applying these changes consistently
to all roadmap sections that contain those bold lines and unlabeled fences.
| __all__ = [ | ||
| "FeedbackEvent", | ||
| "FeedbackIntent", | ||
| "ProposalType", | ||
| "ProposalStatus", | ||
| "ResearchResult", | ||
| "PromptProposal", | ||
| "ExperimentResult", | ||
| "EvaluationResult", | ||
| ] |
There was a problem hiding this comment.
Sort __all__ to satisfy Ruff (RUF022).
Minor lint issue; sorting avoids CI noise.
💡 Suggested fix
__all__ = [
- "FeedbackEvent",
- "FeedbackIntent",
- "ProposalType",
- "ProposalStatus",
- "ResearchResult",
- "PromptProposal",
- "ExperimentResult",
- "EvaluationResult",
+ "EvaluationResult",
+ "ExperimentResult",
+ "FeedbackEvent",
+ "FeedbackIntent",
+ "PromptProposal",
+ "ProposalStatus",
+ "ProposalType",
+ "ResearchResult",
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| __all__ = [ | |
| "FeedbackEvent", | |
| "FeedbackIntent", | |
| "ProposalType", | |
| "ProposalStatus", | |
| "ResearchResult", | |
| "PromptProposal", | |
| "ExperimentResult", | |
| "EvaluationResult", | |
| ] | |
| __all__ = [ | |
| "EvaluationResult", | |
| "ExperimentResult", | |
| "FeedbackEvent", | |
| "FeedbackIntent", | |
| "PromptProposal", | |
| "ProposalStatus", | |
| "ProposalType", | |
| "ResearchResult", | |
| ] |
🧰 Tools
🪛 Ruff (0.14.13)
26-35: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
🤖 Prompt for AI Agents
In `@src/prompt_rewriter/__init__.py` around lines 26 - 35, The exported names in
the __all__ list are unsorted causing Ruff RUF022; open the module-level __all__
variable in __init__.py and reorder the entries alphabetically (e.g., arrange
"EvaluationResult", "ExperimentResult", "FeedbackEvent", "FeedbackIntent",
"ProposalStatus", "ProposalType", "PromptProposal", "ResearchResult" into
lexicographic order) so the list is sorted to satisfy the linter.
| async def call_llm( | ||
| self, | ||
| prompt: str, | ||
| system_instruction: str | None = None, | ||
| model: str | None = None, | ||
| temperature: float = 0.7, | ||
| max_tokens: int = 4096, | ||
| ) -> str: | ||
| """ | ||
| Call an LLM with the given prompt. | ||
|
|
||
| This is a utility method for subclasses to use when they need | ||
| to call an LLM. It handles model selection and basic error handling. | ||
|
|
||
| Args: | ||
| prompt: The user prompt | ||
| system_instruction: Optional system instruction | ||
| model: Model to use (defaults to config.default_llm_model) | ||
| temperature: Sampling temperature | ||
| max_tokens: Maximum tokens to generate | ||
|
|
||
| Returns: | ||
| The LLM's response text | ||
| """ | ||
| import google.generativeai as genai | ||
|
|
||
| model_name = model or self.config.default_llm_model | ||
|
|
||
| generation_config = genai.GenerationConfig( | ||
| temperature=temperature, | ||
| max_output_tokens=max_tokens, | ||
| ) | ||
|
|
||
| model_instance = genai.GenerativeModel( | ||
| model_name=model_name, | ||
| system_instruction=system_instruction, | ||
| generation_config=generation_config, | ||
| ) | ||
|
|
||
| response = model_instance.generate_content(prompt) | ||
| return response.text | ||
|
|
||
| async def call_llm_with_search( | ||
| self, | ||
| prompt: str, | ||
| system_instruction: str | None = None, | ||
| model: str | None = None, | ||
| ) -> tuple[str, list[dict]]: | ||
| """ | ||
| Call an LLM with Google Search grounding enabled. | ||
|
|
||
| Args: | ||
| prompt: The user prompt | ||
| system_instruction: Optional system instruction | ||
| model: Model to use | ||
|
|
||
| Returns: | ||
| Tuple of (response_text, grounding_sources) | ||
| """ | ||
| import google.generativeai as genai | ||
| from google.generativeai.types import GenerateContentConfig, Tool | ||
| from google.generativeai.types.content_types import GoogleSearch | ||
|
|
||
| model_name = model or self.config.research_model | ||
|
|
||
| tools = [Tool(google_search=GoogleSearch())] | ||
|
|
||
| config = GenerateContentConfig( | ||
| tools=tools, | ||
| system_instruction=system_instruction, | ||
| ) | ||
|
|
||
| model_instance = genai.GenerativeModel(model_name=model_name) | ||
|
|
||
| response = model_instance.generate_content(prompt, generation_config=config) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
google-generativeai Python SDK async methods generate_content_async
💡 Result:
In the legacy google-generativeai Python SDK (the one you import as google.generativeai as genai), there is a method named GenerativeModel.generate_content_async(...), but Google’s own migration guide shows it being called without await (it returns an object directly), and there are open bug reports where trying to await it raises TypeError: object GenerateContentResponse can't be used in 'await' expression (notably in Colab / Python 3.11). [1], [2]
In the current SDK (google-genai, imported as from google import genai), you don’t use generate_content_async. Instead, every async method lives under client.aio and is awaited:
from google import genai
client = genai.Client()
response = await client.aio.models.generate_content(
model="gemini-2.0-flash",
contents="Tell me a story in 300 words."
)
print(response.text)This is the recommended replacement for generate_content_async, and streaming also has an async version under client.aio. [1], [3]
Sources: [1] Google migration guide, [2] legacy SDK issue #732, [3] googleapis/python-genai (Google Gen AI Python SDK) README.
🏁 Script executed:
# Search for any usage of response.candidates in the codebase
rg "response\.candidates" --type pyRepository: PublicDataWorks/verdad
Length of output: 1314
🏁 Script executed:
# Check the actual implementation around line 160-235
cd src && head -250 prompt_rewriter/agents/base.py | tail -100Repository: PublicDataWorks/verdad
Length of output: 3482
🏁 Script executed:
# Look for any imports or existing async patterns in the file
rg "asyncio|async def" src/prompt_rewriter/agents/base.pyRepository: PublicDataWorks/verdad
Length of output: 275
Avoid blocking the event loop in async LLM calls; handle empty response candidates.
The generate_content() method is synchronous and will block the event loop when called directly in async methods. Additionally, response.candidates[0] is accessed without ensuring the candidates list is non-empty, which can raise IndexError.
For the blocking I/O, use asyncio.to_thread() as a practical workaround for the legacy SDK. The long-term solution is to migrate to the newer google-genai SDK (imported as from google import genai), which provides proper async methods via client.aio.models.generate_content().
For response handling, check that response.candidates is non-empty before accessing response.candidates[0]:
💡 Suggested fixes
+import asyncio
...
- response = model_instance.generate_content(prompt)
+ response = await asyncio.to_thread(model_instance.generate_content, prompt)
return response.text
...
- response = model_instance.generate_content(prompt, generation_config=config)
+ response = await asyncio.to_thread(
+ model_instance.generate_content,
+ prompt,
+ generation_config=config,
+ )
# Extract grounding metadata
sources = []
- if hasattr(response.candidates[0], "grounding_metadata"):
- metadata = response.candidates[0].grounding_metadata
+ if response.candidates and hasattr(response.candidates[0], "grounding_metadata"):
+ metadata = response.candidates[0].grounding_metadata🤖 Prompt for AI Agents
In `@src/prompt_rewriter/agents/base.py` around lines 160 - 235,
call_llm_with_search currently calls the synchronous genai.generate_content
directly (blocking the event loop) and assumes response.candidates[0] exists;
change it to run the blocking call inside asyncio.to_thread(...) (wrapping the
model_instance.generate_content call) and after getting the response validate
that response.candidates is non-empty before accessing [0]; if empty, return a
sensible fallback (e.g., empty string and empty grounding list) and otherwise
extract response.candidates[0].content/text and grounding sources to return the
tuple (ensure you keep using model_instance.generate_content, generation config
construction with GenerateContentConfig/Tool/GoogleSearch, and check
response.candidates to avoid IndexError).
| # Extract grounding metadata | ||
| sources = [] | ||
| if hasattr(response.candidates[0], "grounding_metadata"): | ||
| metadata = response.candidates[0].grounding_metadata | ||
| if hasattr(metadata, "grounding_chunks"): | ||
| for chunk in metadata.grounding_chunks: | ||
| if hasattr(chunk, "web"): | ||
| sources.append( | ||
| { | ||
| "url": chunk.web.uri, | ||
| "title": chunk.web.title, | ||
| } | ||
| ) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find and examine the base.py file
find . -name "base.py" -path "*/prompt_rewriter/agents/*" -type fRepository: PublicDataWorks/verdad
Length of output: 102
🏁 Script executed:
# Check the repository structure
git ls-files | grep -E "(base\.py|requirements|setup\.py|pyproject\.toml)" | head -20Repository: PublicDataWorks/verdad
Length of output: 191
🌐 Web query:
google-generativeai SDK generate_content empty candidates list behavior
💡 Result:
In the Gemini API (which the google-generativeai SDK wraps), an empty response.candidates list is expected only when the prompt is blocked / rejected, not when only the model output is blocked.
- The API “returns either all requested candidates or none of them” and “returns no candidates at all only if there was something wrong with the prompt (check
promptFeedback).” IfpromptFeedback.blockReasonis set, “the prompt was blocked and no candidates are returned.” [1] - Separately, you can get a candidate object whose
contenthas no parts / no text when the candidate is blocked by safety filters, in which case you should inspectcandidate.finish_reasonandcandidate.safety_ratingsinstead of assumingcontent.parts[0].textexists. (This commonly shows up as “Response candidate content has no parts … likely blocked by the safety filters.”) [2] - If you’re seeing “sometimes empty text” in
output.candidates[0].content.parts[0].texteven for benign prompts, that has been reported as intermittent in Google’s SDK issue trackers; you should treat it as a case your code must handle (and log full response incl.promptFeedback,finish_reason, and safety ratings). [3]
Practical handling:
- If
not response.candidates: checkresponse.prompt_feedback.block_reason(+safety_ratings) and treat as “prompt blocked/invalid”. - Else if candidate exists but has no text/parts: treat as “output blocked” and use
candidate.finish_reason/candidate.safety_ratings. - Only read
response.text/candidate.content.parts[0].textafter those checks.
Sources: [1] Gemini API GenerateContentResponse / PromptFeedback reference, [2] example of blocked candidate with no parts in Vertex AI issue, [3] intermittent empty text report in Google’s SDK issue tracker.
🌐 Web query:
google-generativeai SDK response structure grounding_metadata documentation
💡 Result:
In the google-generativeai Python SDK, grounding information is returned per candidate. After a generate_content(...) call, you access it like:
response.candidates[0].grounding_metadata- then typically:
response.candidates[0].grounding_metadata.grounding_chunksresponse.candidates[0].grounding_metadata.grounding_supports
This access pattern is shown in Google’s Gemini API grounding guide (Python example). [1]
Where this is documented (authoritative schemas)
-
Gemini API (Generative Language) –
Candidateschema
CandidateincludesgroundingMetadata(“Grounding metadata for the candidate … populated for GenerateContent calls”). [2] -
Vertex AI –
GroundingMetadataschema (field meanings)
Documents the main fields you’ll see insidegrounding_metadata, including:
webSearchQueriesretrievalQueriesgroundingChunksgroundingSupportsretrievalMetadata(e.g.,googleSearchDynamicRetrievalScore) [3]
Quick mental model of the response structure
response.candidates[]→ each candidate may have its owngrounding_metadata. [1][2]grounding_chunks→ the retrieved sources (URLs / docs / maps places, depending on tool). [1][3]grounding_supports→ spans/segments of the generated text mapped to chunk indices (for inline citations). [1][3]
Sources:
[1] Gemini API “Grounding with Google Search” (example accessing response.candidates[0].grounding_metadata...)
[2] Gemini API “Candidate” reference (includes groundingMetadata, populated for GenerateContent)
[3] Vertex AI REST reference: GroundingMetadata fields and definitions
🏁 Script executed:
# Read the relevant section of base.py
sed -n '230,260p' src/prompt_rewriter/agents/base.pyRepository: PublicDataWorks/verdad
Length of output: 855
🏁 Script executed:
# Get more context around this code - look at the function definition
sed -n '200,260p' src/prompt_rewriter/agents/base.pyRepository: PublicDataWorks/verdad
Length of output: 1789
🏁 Script executed:
# Check imports to understand how response object is created
head -40 src/prompt_rewriter/agents/base.pyRepository: PublicDataWorks/verdad
Length of output: 1067
🏁 Script executed:
# Search for other uses of response.candidates in the codebase
rg "response\.candidates" --type python -B 2 -A 2Repository: PublicDataWorks/verdad
Length of output: 96
🏁 Script executed:
# Check if there's any existing error handling for empty candidates
rg "candidates" src/prompt_rewriter/agents/base.py -B 2 -A 2Repository: PublicDataWorks/verdad
Length of output: 364
🏁 Script executed:
# Verify there's no surrounding try-catch that might handle this
sed -n '220,270p' src/prompt_rewriter/agents/base.pyRepository: PublicDataWorks/verdad
Length of output: 1231
🏁 Script executed:
# Check if there are any tests for this error case
find . -name "*.py" -path "*/test*" -type f | xargs rg "call_llm_with_search|candidates\[0\]" 2>/dev/null | head -20Repository: PublicDataWorks/verdad
Length of output: 48
Guard against empty candidates before indexing.
response.candidates[0] will raise IndexError if the SDK returns no candidates (which occurs when the prompt is blocked). The defensive check prevents pipeline crashes.
Suggested fix
sources = []
- if hasattr(response.candidates[0], "grounding_metadata"):
- metadata = response.candidates[0].grounding_metadata
+ candidates = getattr(response, "candidates", None) or []
+ if candidates and hasattr(candidates[0], "grounding_metadata"):
+ metadata = candidates[0].grounding_metadata
if hasattr(metadata, "grounding_chunks"):
for chunk in metadata.grounding_chunks:
if hasattr(chunk, "web"):
sources.append(
{
"url": chunk.web.uri,
"title": chunk.web.title,
}
)🤖 Prompt for AI Agents
In `@src/prompt_rewriter/agents/base.py` around lines 236 - 249, The code accesses
response.candidates[0] without ensuring candidates is non-empty, causing
IndexError when the SDK returns no candidates; update the grounding metadata
extraction in src/prompt_rewriter/agents/base.py to first check that response
has a non-empty candidates list (e.g., if getattr(response, "candidates", None)
and len(response.candidates) > 0) before referencing response.candidates[0], and
only then proceed to inspect grounding_metadata and grounding_chunks (symbols:
response, candidates, grounding_metadata, grounding_chunks, sources); if there
are no candidates, skip extraction or leave sources empty to avoid crashing the
pipeline.
| prompt = FEEDBACK_CLASSIFICATION_PROMPT.format( | ||
| title=snippet.get("title", "Unknown"), | ||
| summary=snippet.get("summary", "No summary available"), | ||
| categories=", ".join(snippet.get("disinformation_categories", [])), | ||
| transcription=snippet.get("transcription", "")[:1000], # Truncate | ||
| feedback_type=feedback.feedback_type, | ||
| feedback_content=feedback.content or "No content provided", | ||
| ) |
There was a problem hiding this comment.
Guard against NULL categories.
If disinformation_categories is NULL, ", ".join(None) raises TypeError and the agent fails.
🛠️ Suggested fix
- prompt = FEEDBACK_CLASSIFICATION_PROMPT.format(
+ categories = snippet.get("disinformation_categories") or []
+ prompt = FEEDBACK_CLASSIFICATION_PROMPT.format(
title=snippet.get("title", "Unknown"),
summary=snippet.get("summary", "No summary available"),
- categories=", ".join(snippet.get("disinformation_categories", [])),
+ categories=", ".join(categories),
transcription=snippet.get("transcription", "")[:1000], # Truncate
feedback_type=feedback.feedback_type,
feedback_content=feedback.content or "No content provided",
)🤖 Prompt for AI Agents
In `@src/prompt_rewriter/agents/feedback_intake.py` around lines 99 - 106, The
code builds `prompt` using FEEDBACK_CLASSIFICATION_PROMPT and calls ",
".join(snippet.get("disinformation_categories", [])) which will raise if
`snippet.get("disinformation_categories")` returns None; update the `categories`
expression to defensively coerce to an iterable (e.g., use
`snippet.get("disinformation_categories") or []` or wrap with `list(...)`)
before joining so `", ".join(...)` always receives a list/tuple of strings; keep
the same variable `categories` in the FEEDBACK_CLASSIFICATION_PROMPT.format call
and ensure no other fields rely on nullable values.
| feedback = FeedbackEvent( | ||
| id=UUID(proposal_data["id"]), | ||
| feedback_type=proposal_data["triggered_by_feedback_type"], | ||
| snippet_id=UUID(proposal_data["triggered_by_snippet_id"]), | ||
| user_id=( | ||
| UUID(proposal_data["triggered_by_user_id"]) | ||
| if proposal_data.get("triggered_by_user_id") | ||
| else None | ||
| ), | ||
| content=proposal_data.get("trigger_content"), | ||
| comment_id=proposal_data.get("triggered_by_comment_id"), | ||
| ) |
There was a problem hiding this comment.
Don’t reuse proposal ID as FeedbackEvent ID.
FeedbackEvent.id should reference the feedback event (if available); using the proposal ID will mix entity identities and complicate traceability.
🛠️ Suggested fix
- feedback = FeedbackEvent(
- id=UUID(proposal_data["id"]),
+ feedback = FeedbackEvent(
feedback_type=proposal_data["triggered_by_feedback_type"],
snippet_id=UUID(proposal_data["triggered_by_snippet_id"]),
user_id=(
UUID(proposal_data["triggered_by_user_id"])
if proposal_data.get("triggered_by_user_id")
else None
),
content=proposal_data.get("trigger_content"),
comment_id=proposal_data.get("triggered_by_comment_id"),
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| feedback = FeedbackEvent( | |
| id=UUID(proposal_data["id"]), | |
| feedback_type=proposal_data["triggered_by_feedback_type"], | |
| snippet_id=UUID(proposal_data["triggered_by_snippet_id"]), | |
| user_id=( | |
| UUID(proposal_data["triggered_by_user_id"]) | |
| if proposal_data.get("triggered_by_user_id") | |
| else None | |
| ), | |
| content=proposal_data.get("trigger_content"), | |
| comment_id=proposal_data.get("triggered_by_comment_id"), | |
| ) | |
| feedback = FeedbackEvent( | |
| feedback_type=proposal_data["triggered_by_feedback_type"], | |
| snippet_id=UUID(proposal_data["triggered_by_snippet_id"]), | |
| user_id=( | |
| UUID(proposal_data["triggered_by_user_id"]) | |
| if proposal_data.get("triggered_by_user_id") | |
| else None | |
| ), | |
| content=proposal_data.get("trigger_content"), | |
| comment_id=proposal_data.get("triggered_by_comment_id"), | |
| ) |
🤖 Prompt for AI Agents
In `@src/prompt_rewriter/main.py` around lines 83 - 94, The FeedbackEvent
construction is incorrectly using the proposal ID (UUID(proposal_data["id"]))
for FeedbackEvent.id; change it to use the actual feedback event id from the
payload (e.g., UUID(proposal_data["triggered_by_feedback_id"]) or
proposal_data.get("triggered_by_feedback_id")) and fall back to None if that
field is missing, so update the id assignment in the FeedbackEvent instantiation
in main.py accordingly (refer to FeedbackEvent and proposal_data in that block).
| -- Grant permissions | ||
| GRANT ALL ON prompt_rewrite_proposals TO authenticated; | ||
| GRANT ALL ON prompt_rewrite_proposals TO service_role; | ||
| GRANT ALL ON prompt_experiment_runs TO authenticated; | ||
| GRANT ALL ON prompt_experiment_runs TO service_role; | ||
| GRANT ALL ON snippet_reprocess_queue TO authenticated; | ||
| GRANT ALL ON snippet_reprocess_queue TO service_role; | ||
| GRANT ALL ON knowledge_facts TO authenticated; | ||
| GRANT ALL ON knowledge_facts TO service_role; | ||
| GRANT ALL ON user_feedback_events TO authenticated; | ||
| GRANT ALL ON user_feedback_events TO service_role; | ||
| GRANT ALL ON prompt_rewriter_agent_logs TO authenticated; | ||
| GRANT ALL ON prompt_rewriter_agent_logs TO service_role; | ||
|
|
||
| -- Enable RLS | ||
| ALTER TABLE prompt_rewrite_proposals ENABLE ROW LEVEL SECURITY; | ||
| ALTER TABLE prompt_experiment_runs ENABLE ROW LEVEL SECURITY; | ||
| ALTER TABLE snippet_reprocess_queue ENABLE ROW LEVEL SECURITY; | ||
| ALTER TABLE knowledge_facts ENABLE ROW LEVEL SECURITY; | ||
| ALTER TABLE user_feedback_events ENABLE ROW LEVEL SECURITY; | ||
| ALTER TABLE prompt_rewriter_agent_logs ENABLE ROW LEVEL SECURITY; | ||
|
|
||
| -- RLS Policies - allow authenticated users to read, service_role for write | ||
| CREATE POLICY "Allow authenticated read on proposals" ON prompt_rewrite_proposals | ||
| FOR SELECT TO authenticated USING (true); | ||
| CREATE POLICY "Allow service_role all on proposals" ON prompt_rewrite_proposals | ||
| FOR ALL TO service_role USING (true); | ||
|
|
||
| CREATE POLICY "Allow authenticated read on experiment_runs" ON prompt_experiment_runs | ||
| FOR SELECT TO authenticated USING (true); | ||
| CREATE POLICY "Allow service_role all on experiment_runs" ON prompt_experiment_runs | ||
| FOR ALL TO service_role USING (true); | ||
|
|
||
| CREATE POLICY "Allow authenticated read on reprocess_queue" ON snippet_reprocess_queue | ||
| FOR SELECT TO authenticated USING (true); | ||
| CREATE POLICY "Allow service_role all on reprocess_queue" ON snippet_reprocess_queue | ||
| FOR ALL TO service_role USING (true); | ||
|
|
||
| CREATE POLICY "Allow authenticated read on knowledge_facts" ON knowledge_facts | ||
| FOR SELECT TO authenticated USING (true); | ||
| CREATE POLICY "Allow service_role all on knowledge_facts" ON knowledge_facts | ||
| FOR ALL TO service_role USING (true); | ||
|
|
||
| CREATE POLICY "Allow authenticated read on feedback_events" ON user_feedback_events | ||
| FOR SELECT TO authenticated USING (true); | ||
| CREATE POLICY "Allow service_role all on feedback_events" ON user_feedback_events | ||
| FOR ALL TO service_role USING (true); | ||
|
|
||
| CREATE POLICY "Allow authenticated read on agent_logs" ON prompt_rewriter_agent_logs | ||
| FOR SELECT TO authenticated USING (true); | ||
| CREATE POLICY "Allow service_role all on agent_logs" ON prompt_rewriter_agent_logs | ||
| FOR ALL TO service_role USING (true); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "*20260124000000*" -o -name "*prompt_rewriter_agent_schema*"Repository: PublicDataWorks/verdad
Length of output: 135
🏁 Script executed:
wc -l ./supabase/migrations/20260124000000_prompt_rewriter_agent_schema.sqlRepository: PublicDataWorks/verdad
Length of output: 139
🏁 Script executed:
cat -n ./supabase/migrations/20260124000000_prompt_rewriter_agent_schema.sqlRepository: PublicDataWorks/verdad
Length of output: 18031
Tighten RLS: authenticated users can read all data across most tables.
The current RLS policies use USING (true), allowing authenticated users unrestricted read access to proposals, feedback events, experiment runs, reprocess queues, agent logs, and knowledge facts. This is a significant privacy/compliance risk if these tables contain sensitive or user-specific data.
Required changes:
-
prompt_rewrite_proposalsanduser_feedback_eventshave user ownership columns and should restrict reads to the originating user:- Change
GRANT ALLtoGRANT SELECTfor authenticated users - Replace
USING (true)withUSING (triggered_by_user_id = auth.uid())for proposals - Replace
USING (true)withUSING (user_id = auth.uid())for feedback events
- Change
-
prompt_rewriter_agent_logsandprompt_experiment_runsshould either be service_role-only (if intended for internal system use) or require ownership-based filtering through their linked proposal if users should see their own data. -
knowledge_factsandsnippet_reprocess_queueneed architectural clarification: are these shared system tables, or should they be user-scoped?
🔐 Example tightening for user-scoped tables
- GRANT ALL ON prompt_rewrite_proposals TO authenticated;
+ GRANT SELECT ON prompt_rewrite_proposals TO authenticated;
- CREATE POLICY "Allow authenticated read on proposals" ON prompt_rewrite_proposals
- FOR SELECT TO authenticated USING (true);
+ CREATE POLICY "Allow authenticated read own proposals" ON prompt_rewrite_proposals
+ FOR SELECT TO authenticated USING (triggered_by_user_id = auth.uid());
- GRANT ALL ON user_feedback_events TO authenticated;
+ GRANT SELECT ON user_feedback_events TO authenticated;
- CREATE POLICY "Allow authenticated read on feedback_events" ON user_feedback_events
- FOR SELECT TO authenticated USING (true);
+ CREATE POLICY "Allow authenticated read own feedback_events" ON user_feedback_events
+ FOR SELECT TO authenticated USING (user_id = auth.uid());📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| -- Grant permissions | |
| GRANT ALL ON prompt_rewrite_proposals TO authenticated; | |
| GRANT ALL ON prompt_rewrite_proposals TO service_role; | |
| GRANT ALL ON prompt_experiment_runs TO authenticated; | |
| GRANT ALL ON prompt_experiment_runs TO service_role; | |
| GRANT ALL ON snippet_reprocess_queue TO authenticated; | |
| GRANT ALL ON snippet_reprocess_queue TO service_role; | |
| GRANT ALL ON knowledge_facts TO authenticated; | |
| GRANT ALL ON knowledge_facts TO service_role; | |
| GRANT ALL ON user_feedback_events TO authenticated; | |
| GRANT ALL ON user_feedback_events TO service_role; | |
| GRANT ALL ON prompt_rewriter_agent_logs TO authenticated; | |
| GRANT ALL ON prompt_rewriter_agent_logs TO service_role; | |
| -- Enable RLS | |
| ALTER TABLE prompt_rewrite_proposals ENABLE ROW LEVEL SECURITY; | |
| ALTER TABLE prompt_experiment_runs ENABLE ROW LEVEL SECURITY; | |
| ALTER TABLE snippet_reprocess_queue ENABLE ROW LEVEL SECURITY; | |
| ALTER TABLE knowledge_facts ENABLE ROW LEVEL SECURITY; | |
| ALTER TABLE user_feedback_events ENABLE ROW LEVEL SECURITY; | |
| ALTER TABLE prompt_rewriter_agent_logs ENABLE ROW LEVEL SECURITY; | |
| -- RLS Policies - allow authenticated users to read, service_role for write | |
| CREATE POLICY "Allow authenticated read on proposals" ON prompt_rewrite_proposals | |
| FOR SELECT TO authenticated USING (true); | |
| CREATE POLICY "Allow service_role all on proposals" ON prompt_rewrite_proposals | |
| FOR ALL TO service_role USING (true); | |
| CREATE POLICY "Allow authenticated read on experiment_runs" ON prompt_experiment_runs | |
| FOR SELECT TO authenticated USING (true); | |
| CREATE POLICY "Allow service_role all on experiment_runs" ON prompt_experiment_runs | |
| FOR ALL TO service_role USING (true); | |
| CREATE POLICY "Allow authenticated read on reprocess_queue" ON snippet_reprocess_queue | |
| FOR SELECT TO authenticated USING (true); | |
| CREATE POLICY "Allow service_role all on reprocess_queue" ON snippet_reprocess_queue | |
| FOR ALL TO service_role USING (true); | |
| CREATE POLICY "Allow authenticated read on knowledge_facts" ON knowledge_facts | |
| FOR SELECT TO authenticated USING (true); | |
| CREATE POLICY "Allow service_role all on knowledge_facts" ON knowledge_facts | |
| FOR ALL TO service_role USING (true); | |
| CREATE POLICY "Allow authenticated read on feedback_events" ON user_feedback_events | |
| FOR SELECT TO authenticated USING (true); | |
| CREATE POLICY "Allow service_role all on feedback_events" ON user_feedback_events | |
| FOR ALL TO service_role USING (true); | |
| CREATE POLICY "Allow authenticated read on agent_logs" ON prompt_rewriter_agent_logs | |
| FOR SELECT TO authenticated USING (true); | |
| CREATE POLICY "Allow service_role all on agent_logs" ON prompt_rewriter_agent_logs | |
| FOR ALL TO service_role USING (true); | |
| -- Grant permissions | |
| GRANT SELECT ON prompt_rewrite_proposals TO authenticated; | |
| GRANT ALL ON prompt_rewrite_proposals TO service_role; | |
| GRANT ALL ON prompt_experiment_runs TO authenticated; | |
| GRANT ALL ON prompt_experiment_runs TO service_role; | |
| GRANT ALL ON snippet_reprocess_queue TO authenticated; | |
| GRANT ALL ON snippet_reprocess_queue TO service_role; | |
| GRANT ALL ON knowledge_facts TO authenticated; | |
| GRANT ALL ON knowledge_facts TO service_role; | |
| GRANT SELECT ON user_feedback_events TO authenticated; | |
| GRANT ALL ON user_feedback_events TO service_role; | |
| GRANT ALL ON prompt_rewriter_agent_logs TO authenticated; | |
| GRANT ALL ON prompt_rewriter_agent_logs TO service_role; | |
| -- Enable RLS | |
| ALTER TABLE prompt_rewrite_proposals ENABLE ROW LEVEL SECURITY; | |
| ALTER TABLE prompt_experiment_runs ENABLE ROW LEVEL SECURITY; | |
| ALTER TABLE snippet_reprocess_queue ENABLE ROW LEVEL SECURITY; | |
| ALTER TABLE knowledge_facts ENABLE ROW LEVEL SECURITY; | |
| ALTER TABLE user_feedback_events ENABLE ROW LEVEL SECURITY; | |
| ALTER TABLE prompt_rewriter_agent_logs ENABLE ROW LEVEL SECURITY; | |
| -- RLS Policies - allow authenticated users to read, service_role for write | |
| CREATE POLICY "Allow authenticated read own proposals" ON prompt_rewrite_proposals | |
| FOR SELECT TO authenticated USING (triggered_by_user_id = auth.uid()); | |
| CREATE POLICY "Allow service_role all on proposals" ON prompt_rewrite_proposals | |
| FOR ALL TO service_role USING (true); | |
| CREATE POLICY "Allow authenticated read on experiment_runs" ON prompt_experiment_runs | |
| FOR SELECT TO authenticated USING (true); | |
| CREATE POLICY "Allow service_role all on experiment_runs" ON prompt_experiment_runs | |
| FOR ALL TO service_role USING (true); | |
| CREATE POLICY "Allow authenticated read on reprocess_queue" ON snippet_reprocess_queue | |
| FOR SELECT TO authenticated USING (true); | |
| CREATE POLICY "Allow service_role all on reprocess_queue" ON snippet_reprocess_queue | |
| FOR ALL TO service_role USING (true); | |
| CREATE POLICY "Allow authenticated read on knowledge_facts" ON knowledge_facts | |
| FOR SELECT TO authenticated USING (true); | |
| CREATE POLICY "Allow service_role all on knowledge_facts" ON knowledge_facts | |
| FOR ALL TO service_role USING (true); | |
| CREATE POLICY "Allow authenticated read own feedback_events" ON user_feedback_events | |
| FOR SELECT TO authenticated USING (user_id = auth.uid()); | |
| CREATE POLICY "Allow service_role all on feedback_events" ON user_feedback_events | |
| FOR ALL TO service_role USING (true); | |
| CREATE POLICY "Allow authenticated read on agent_logs" ON prompt_rewriter_agent_logs | |
| FOR SELECT TO authenticated USING (true); | |
| CREATE POLICY "Allow service_role all on agent_logs" ON prompt_rewriter_agent_logs | |
| FOR ALL TO service_role USING (true); |
🤖 Prompt for AI Agents
In `@supabase/migrations/20260124000000_prompt_rewriter_agent_schema.sql` around
lines 374 - 425, The RLS is too permissive: update grants and policies for
user-scoped data and decide ownership for internal tables. For
prompt_rewrite_proposals and user_feedback_events change GRANT ALL TO
authenticated to GRANT SELECT TO authenticated and replace their policies
("Allow authenticated read on proposals" and "Allow authenticated read on
feedback_events") from USING (true) to USING (triggered_by_user_id = auth.uid())
and USING (user_id = auth.uid()) respectively; for prompt_rewriter_agent_logs
and prompt_experiment_runs either remove the authenticated FOR SELECT policies
and make them service_role-only (keep only the "Allow service_role all ..."
policies) or tighten them to ownership-based checks (e.g., link to proposal
ownership) if users must see related rows; for knowledge_facts and
snippet_reprocess_queue confirm intended scope and either keep shared read
(USING (true)) or add ownership predicates before granting authenticated access.
This adds a comprehensive system for dynamically improving prompts based on
user feedback. The system includes:
The Prompt Rewriter Agent will:
Important
Introduces a Prompt Rewriter Agent system for autonomous prompt improvement based on user feedback, with a detailed architecture, database schema, and implementation roadmap.
Prompt Rewriter Agentsystem to improve prompts based on user feedback.main.pycoordinates the agent pipeline.prompt_rewrite_proposals,prompt_experiment_runs,snippet_reprocess_queue,knowledge_facts, anduser_feedback_eventsin20260124000000_prompt_rewriter_agent_schema.sql.prompt_rewrite_status,feedback_intent, andproposal_type.models.pyforFeedbackEvent,FeedbackAnalysis,ResearchResult,PromptProposal,ExperimentResult,EvaluationResult, and others.config.pyfor research, experimentation, evaluation, and deployment agents.docs/.base.pyprovides logging, LLM utilities, and error handling.This description was created by
for bf22842. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.