Skip to content

Add Prompt Rewriter Agent system for autonomous prompt improvement#55

Open
rajivsinclair wants to merge 1 commit intomainfrom
claude/prompt-rewriter-agent-ZVOs0
Open

Add Prompt Rewriter Agent system for autonomous prompt improvement#55
rajivsinclair wants to merge 1 commit intomainfrom
claude/prompt-rewriter-agent-ZVOs0

Conversation

@rajivsinclair
Copy link
Copy Markdown
Contributor

@rajivsinclair rajivsinclair commented Jan 24, 2026

This adds a comprehensive system for dynamically improving prompts based on
user feedback. The system includes:

  • Technical proposal document explaining the architecture and workflow
  • Implementation roadmap with detailed task breakdowns
  • Database migration for tracking proposals, experiments, and knowledge base
  • Initial module structure with models, config, and base agent class
  • Feedback Intake Agent implementation for classifying user feedback
  • Orchestrator Prefect flow coordinating the multi-agent pipeline

The Prompt Rewriter Agent will:

  1. Capture user feedback (thumbs down, comments, label disputes)
  2. Research disputed claims via web search
  3. Generate structured prompt modification proposals
  4. Test proposals experimentally against real snippets
  5. Evaluate results and decide whether to deploy
  6. Update prompts and reprocess affected historical snippets

Important

Introduces a Prompt Rewriter Agent system for autonomous prompt improvement based on user feedback, with a detailed architecture, database schema, and implementation roadmap.

  • Behavior:
    • Introduces Prompt Rewriter Agent system to improve prompts based on user feedback.
    • Implements agents for feedback intake, research, proposal writing, experimentation, evaluation, semantic search, and deployment.
    • Orchestrator flow in main.py coordinates the agent pipeline.
  • Database:
    • Adds tables prompt_rewrite_proposals, prompt_experiment_runs, snippet_reprocess_queue, knowledge_facts, and user_feedback_events in 20260124000000_prompt_rewriter_agent_schema.sql.
    • Includes enums for prompt_rewrite_status, feedback_intent, and proposal_type.
  • Models:
    • Defines Pydantic models in models.py for FeedbackEvent, FeedbackAnalysis, ResearchResult, PromptProposal, ExperimentResult, EvaluationResult, and others.
  • Configuration:
    • Configuration settings in config.py for research, experimentation, evaluation, and deployment agents.
  • Misc:
    • Adds technical proposal and implementation roadmap in docs/.
    • Base agent class in base.py provides logging, LLM utilities, and error handling.

This description was created by Ellipsis for bf22842. You can customize this summary. It will automatically update as commits are pushed.

Summary by CodeRabbit

  • New Features

    • Introduced autonomous system that processes user feedback to research claims, generate prompt improvements, run controlled experiments, evaluate results, and deploy verified changes.
    • Added proposal versioning, rollback capabilities, reprocessing queues, and knowledge base for continuous improvement.
  • Documentation

    • Added comprehensive system architecture, design proposal, and phased implementation roadmap.

✏️ Tip: You can customize this high-level summary in your review settings.

This adds a comprehensive system for dynamically improving prompts based on
user feedback. The system includes:

- Technical proposal document explaining the architecture and workflow
- Implementation roadmap with detailed task breakdowns
- Database migration for tracking proposals, experiments, and knowledge base
- Initial module structure with models, config, and base agent class
- Feedback Intake Agent implementation for classifying user feedback
- Orchestrator Prefect flow coordinating the multi-agent pipeline

The Prompt Rewriter Agent will:
1. Capture user feedback (thumbs down, comments, label disputes)
2. Research disputed claims via web search
3. Generate structured prompt modification proposals
4. Test proposals experimentally against real snippets
5. Evaluate results and decide whether to deploy
6. Update prompts and reprocess affected historical snippets
Copilot AI review requested due to automatic review settings January 24, 2026 21:39
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 24, 2026

Walkthrough

This PR introduces a comprehensive Prompt Rewriter Agent system comprising architecture documentation, foundational models, agent framework, configuration infrastructure, database schema, and an orchestration pipeline to autonomously improve system prompts based on user feedback through iterative research, proposal generation, experimentation, and deployment.

Changes

Cohort / File(s) Summary
Documentation & Design
docs/PROMPT_REWRITER_AGENT_PROPOSAL.md, docs/PROMPT_REWRITER_IMPLEMENTATION_ROADMAP.md
Introduces comprehensive architectural proposal outlining multi-agent pipeline, knowledge base design, structured data formats, database schema extensions, phased implementation roadmap, success metrics, and deployment considerations with extensibility scaffolding.
Data Models & Types
src/prompt_rewriter/models.py
Adds Pydantic data structures and enums standardizing data exchange: FeedbackEvent, FeedbackAnalysis, ResearchResult, PromptProposal, ExperimentResult, EvaluationResult, AgentLogEntry, plus classification enums for intent, proposal type, status, and decisions.
Configuration
src/prompt_rewriter/config.py
Introduces nested configuration dataclasses (ResearchConfig, ExperimentConfig, EvaluationConfig, DeploymentConfig) aggregated in PromptRewriterConfig with from\_env classmethod for environment variable population and default instance initialization.
Agent Framework
src/prompt_rewriter/agents/base.py, src/prompt_rewriter/agents/__init__.py
Defines abstract BaseAgent generic class with async execution wrapper, logging infrastructure, Supabase integration, LLM utilities (call\_llm with optional search grounding), input/output summarization, and agent log persistence.
Feedback Intake Agent
src/prompt_rewriter/agents/feedback_intake.py
Implements FeedbackIntakeAgent to transform raw user feedback into structured analysis via LLM-powered classification, extracting intent, claims, and prompt stage impacts with snippet context retrieval and robust JSON parsing.
Orchestration Pipeline
src/prompt_rewriter/main.py
Adds Prefect-based orchestration with task functions for proposal loading, status updates, feedback analysis, claim research, proposal drafting, experiment execution, result evaluation, and deployment; defines main workflow and worker flow for continuous pending proposal processing.
Package Exports
src/prompt_rewriter/__init__.py
Exposes public API through \_\_all\_\_ with model types and version string (0.1.0).
Database Schema
supabase/migrations/20260124000000_prompt_rewriter_agent_schema.sql
Establishes six core tables (prompt_rewrite_proposals, prompt_experiment_runs, snippet_reprocess_queue, knowledge_facts, user_feedback_events, prompt_rewriter_agent_logs) with ENUM types, RLS policies, triggers for timestamp updates, and utility functions (get_next_pending_proposal, get_next_reprocess_snippet); includes comprehensive indexing and access controls.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Feedback as Feedback System
    participant Pipeline as Orchestrator<br/>(Prefect)
    participant FI as Feedback Intake<br/>Agent
    participant LLM as LLM Service
    participant Research as Research Agent<br/>(Future)
    participant Writer as Proposal Writer<br/>Agent (Future)
    participant Experiment as Experiment<br/>Runner (Future)
    participant Evaluator as Evaluation<br/>Agent (Future)
    participant Deploy as Deployment<br/>Agent (Future)
    participant DB as Supabase

    User->>Feedback: Submit feedback on snippet
    Feedback->>DB: Create user_feedback_event
    
    Pipeline->>DB: Fetch next pending proposal
    Pipeline->>FI: run(FeedbackEvent)
    
    FI->>DB: Retrieve snippet context
    FI->>FI: Build classification prompt
    FI->>LLM: call_llm(classification prompt)
    LLM-->>FI: LLM response
    FI->>FI: Parse structured analysis
    FI-->>Pipeline: FeedbackAnalysis
    
    Pipeline->>DB: Update status → researching
    Pipeline->>Research: research_claim(analysis)
    Research-->>Pipeline: ResearchResult (mock)
    
    Pipeline->>DB: Update status → writing_proposal
    Pipeline->>Writer: write_proposal(analysis, research)
    Writer-->>Pipeline: PromptProposal (mock)
    
    Pipeline->>DB: Update status → experimenting
    Pipeline->>Experiment: run_experiments(proposal)
    Experiment-->>Pipeline: ExperimentResult (mock)
    
    Pipeline->>DB: Update status → evaluating
    Pipeline->>Evaluator: evaluate_results(experiments)
    Evaluator-->>Pipeline: EvaluationResult (accept/refine/reject)
    
    alt Decision = ACCEPT
        Pipeline->>DB: Update status → deploying
        Pipeline->>Deploy: deploy_changes(proposal)
        Deploy-->>Pipeline: DeploymentResult
        Pipeline->>DB: Update status → deployed
    else Decision = REFINE
        Pipeline->>DB: Update status → refining
    else Decision = REJECT
        Pipeline->>DB: Update status → rejected
    end
    
    Pipeline->>DB: Save AgentLogEntry
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Substantive architectural foundation with dense logic across multiple interconnected systems (agent framework, LLM integration, Prefect orchestration, comprehensive database schema with RLS/triggers). Requires careful validation of data model consistency (Python ↔ SQL), error handling patterns, async execution safety, and security policies despite homogeneous coherence as a single feature.

Poem

🐰 A rewriter born from feedback's call,
Seven agents dancing in the hall,
Facts researched, proposals tested true,
The prompt improves—one bounce, then two!
With schemas deep and flows precise,
We learn from you, we make things nice. 🌟

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the main addition: a Prompt Rewriter Agent system for autonomous prompt improvement, which aligns with the substantial new codebase and architecture introduced across models, agents, configuration, orchestration, and database schema.
Docstring Coverage ✅ Passed Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/prompt-rewriter-agent-ZVOs0

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Pylint (4.0.4)
src/prompt_rewriter/config.py

************* Module .pylintrc
.pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: '.pylintrc', line: 1
'disable=C0116\n' (config-parse-error)
[
{
"type": "refactor",
"module": "src.prompt_rewriter.config",
"obj": "PromptRewriterConfig",
"line": 66,
"column": 0,
"endLine": 66,
"endColumn": 26,
"path": "src/prompt_rewriter/config.py",
"symbol": "too-many-instance-attributes",
"message": "Too many instance attributes (17/7)",
"message-id": "R0902"
}
]

src/prompt_rewriter/agents/feedback_intake.py

************* Module .pylintrc
.pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: '.pylintrc', line: 1
'disable=C0116\n' (config-parse-error)
[
{
"type": "convention",
"module": "src.prompt_rewriter.agents.feedback_intake",
"obj": "",
"line": 19,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/prompt_rewriter/agents/feedback_intake.py",
"symbol": "line-too-long",
"message": "Line too long (105/100)",
"message-id": "C0301"
},
{
"type": "warning",
"module": "src.prompt_rewriter.agents.feedback_intake",
"obj": "FeedbackIntakeAgent.init",
"line": 80,
"column": 4,
"endLine": 80,
"endColumn": 16,
"path": "src/prompt_rewriter/agents/feedback_intake.py",
"symbol": "useless-parent-delegation",
"message": "Useless parent or super() delegation in method

... [truncated 2019 characters] ...

ntakeAgent._parse_response",
"line": 156,
"column": 12,
"endLine": 156,
"endColumn": 62,
"path": "src/prompt_rewriter/agents/feedback_intake.py",
"symbol": "logging-fstring-interpolation",
"message": "Use lazy % formatting in logging functions",
"message-id": "W1203"
},
{
"type": "warning",
"module": "src.prompt_rewriter.agents.feedback_intake",
"obj": "FeedbackIntakeAgent._parse_response",
"line": 157,
"column": 12,
"endLine": 157,
"endColumn": 53,
"path": "src/prompt_rewriter/agents/feedback_intake.py",
"symbol": "logging-fstring-interpolation",
"message": "Use lazy % formatting in logging functions",
"message-id": "W1203"
}
]

src/prompt_rewriter/agents/base.py

************* Module .pylintrc
.pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: '.pylintrc', line: 1
'disable=C0116\n' (config-parse-error)
[
{
"type": "convention",
"module": "src.prompt_rewriter.agents.base",
"obj": "BaseAgent.supabase",
"line": 46,
"column": 12,
"endLine": 46,
"endColumn": 46,
"path": "src/prompt_rewriter/agents/base.py",
"symbol": "import-outside-toplevel",
"message": "Import outside toplevel (supabase.create_client)",
"message-id": "C0415"
},
{
"type": "error",
"module": "src.prompt_rewriter.agents.base",
"obj": "BaseAgent.supabase",
"line": 46,
"column": 12,
"endLine": 46,
"endColumn": 46,
"path": "src/prompt_rewriter/agents/base.py",
"symbol": "no-name-in-module",
"message": "No name 'create_client' in module 'supabase'",
"mes

... [truncated 8064 characters] ...

se",
"obj": "BaseAgent.call_llm_with_search",
"line": 221,
"column": 8,
"endLine": 221,
"endColumn": 72,
"path": "src/prompt_rewriter/agents/base.py",
"symbol": "import-outside-toplevel",
"message": "Import outside toplevel (google.generativeai.types.content_types.GoogleSearch)",
"message-id": "C0415"
},
{
"type": "error",
"module": "src.prompt_rewriter.agents.base",
"obj": "BaseAgent.call_llm_with_search",
"line": 221,
"column": 8,
"endLine": 221,
"endColumn": 72,
"path": "src/prompt_rewriter/agents/base.py",
"symbol": "no-name-in-module",
"message": "No name 'generativeai' in module 'google'",
"message-id": "E0611"
}
]

  • 1 others

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @rajivsinclair, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a sophisticated, autonomous Prompt Rewriter Agent system designed to significantly enhance the VERDAD platform's ability to combat misinformation. By creating a closed-loop learning system, it transforms user feedback into actionable prompt improvements, ensuring that the system continuously adapts and becomes more accurate over time. The system orchestrates multiple specialized AI agents to analyze feedback, conduct research, propose and test prompt modifications, and deploy validated changes, ultimately leading to a more robust and self-improving fact-checking mechanism.

Highlights

  • Autonomous Prompt Improvement System: Introduces a "Prompt Rewriter Agent" to dynamically enhance prompts based on user feedback, transforming the platform into a continuously learning system.
  • Multi-Agent Architecture: Implements a pipeline of specialized agents (Feedback Intake, Research, Proposal Writer, Experiment Runner, Evaluation, Semantic Search, Deployment) to automate the improvement process.
  • Comprehensive Documentation: Includes a detailed technical proposal outlining the system's architecture, problem statement, and solution, alongside an implementation roadmap with task breakdowns and estimated efforts.
  • Database Schema Extensions: Adds new tables and enums to the database for tracking proposals, experiment runs, reprocessing queues, a verified knowledge base, user feedback events, and agent logs.
  • Initial Codebase Structure: Establishes the foundational Python module structure, including base agent classes, configuration, Pydantic models, and the Prefect orchestrator flow for pipeline coordination.
  • Feedback-Driven Learning Loop: Enables the system to learn from various user feedback types (e.g., thumbs down, comments) to continuously refine prompt effectiveness and reprocess affected historical data.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to bf22842 in 2 minutes and 23 seconds. Click for details.
  • Reviewed 2758 lines of code in 10 files
  • Skipped 0 files when reviewing.
  • Skipped posting 8 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. docs/PROMPT_REWRITER_AGENT_PROPOSAL.md:1
  • Draft comment:
    Consider adding language identifiers (e.g. python or markdown) to code blocks for improved syntax highlighting.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 20% <= threshold 50% This comment is purely informative and does not provide a specific suggestion or highlight a potential issue with the code. It suggests an improvement for readability but does not address a functional aspect of the code.
2. docs/PROMPT_REWRITER_IMPLEMENTATION_ROADMAP.md:1
  • Draft comment:
    Consider linking roadmap tasks to external issue trackers for better traceability and clarity in effort estimates.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and suggests a process improvement rather than addressing a specific code issue or providing a code suggestion. It doesn't align with the rules for useful comments in a pull request review.
3. src/prompt_rewriter/agents/base.py:184
  • Draft comment:
    Consider adding more robust error handling in the call_llm function to deal with potential API errors or empty responses.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% The comment is suggesting a code quality improvement ("Consider adding more robust error handling"). While this might be valid advice, it falls into the category of speculative suggestions rather than identifying a clear bug. The comment uses "Consider" which is a soft suggestion, not a definitive issue. The docstring claims the function "handles model selection and basic error handling" but I don't see explicit error handling - however, the errors might be allowed to propagate up to the execute() method which does have error handling. The comment doesn't specify what specific errors need to be handled or why the current approach is insufficient. This seems like a general code quality suggestion rather than identifying a specific problem. The function might intentionally let exceptions propagate to the caller (the execute() method has comprehensive error handling). Without knowing the specific API errors that could occur or seeing evidence that the current approach is problematic, this might be an unnecessary suggestion. The comment is vague about what "more robust error handling" means. While the critique is valid, the comment is still speculative and not actionable. It doesn't identify a specific bug or provide clear guidance on what needs to be changed. The rules state "Do NOT make speculative comments" and "Comments that suggest code quality refactors are good! But only if they are actionable and clear." This comment is neither specific nor actionable - it just says "consider adding" without explaining what specific errors need handling. This comment should be deleted. It's a vague, speculative suggestion that isn't actionable. It doesn't identify a specific bug or provide clear guidance on what needs to be fixed. The comment uses soft language ("Consider") and doesn't explain what specific errors need to be handled or why the current approach is insufficient.
4. src/prompt_rewriter/agents/feedback_intake.py:133
  • Draft comment:
    Ensure that raw LLM responses are logged at DEBUG level to aid troubleshooting when JSON parsing fails.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
5. src/prompt_rewriter/config.py:5
  • Draft comment:
    Good use of environment variables with the walrus operator; verify that default values (especially for sensitive settings) are secure for production.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
6. src/prompt_rewriter/main.py:239
  • Draft comment:
    Consider adding explicit error handling and transaction control around proposal status updates to ensure consistency in failure scenarios.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This comment is suggesting a general improvement about error handling and transaction control. While this might be good advice, it's not pointing to a specific bug or clear issue in the code. The comment doesn't specify what exactly should be done - it's more of a "consider" suggestion. According to the rules, I should not keep comments that are speculative or not clearly actionable. The comment doesn't show strong evidence of a definite problem - it's more of a best practice suggestion. The code is using Prefect tasks which have their own retry mechanisms (as seen in the load_proposal task with retries=2). Without seeing actual failures or understanding the full error handling strategy, this is speculative advice. The comment might be valid - if intermediate steps fail, the proposal status could be left in an inconsistent state. This could be a real issue in production. The suggestion about transaction control could be important for data consistency. While the concern about consistency is valid in principle, the comment is too vague and doesn't provide specific actionable guidance. It says "consider" which is a weak suggestion. The code is using Prefect's task system which has built-in error handling and retry mechanisms. Without evidence of a specific failure scenario or concrete implementation guidance, this is speculative advice rather than a clear code issue. This comment should be deleted. It's a vague, speculative suggestion ("consider adding") without specific actionable guidance. It doesn't point to a clear bug or definite issue, and the rules state not to make speculative comments or suggestions that aren't clearly actionable.
7. src/prompt_rewriter/models.py:16
  • Draft comment:
    Verify that enum values (e.g. in FeedbackIntent and ProposalType) match exactly with the expected values in the database schema to avoid inconsistencies.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to verify that enum values match the database schema, which is a form of asking for confirmation or verification. This violates the rule against asking the author to confirm or ensure things. The comment does not provide a specific suggestion or point out a specific issue, making it unapproved.
8. supabase/migrations/20260124000000_prompt_rewriter_agent_schema.sql:323
  • Draft comment:
    Ensure that the RLS policies and timestamp triggers are thoroughly tested under concurrent load to avoid unexpected access issues in production.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is asking the PR author to ensure that certain aspects are tested, which violates the rule against asking for confirmation or testing. It doesn't provide a specific suggestion or point out a specific issue in the code.

Workflow ID: wflow_3PL6o5B9Eboy9vBg

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

propose prompt modifications, test changes, and deploy improvements.

Usage:
from prompt_rewriter import PromptRewriterOrchestrator
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage example references PromptRewriterOrchestrator which is not exported. Update the docs or export the class if intended.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive Prompt Rewriter Agent system, a significant feature for autonomously improving prompts based on user feedback. The changes include detailed design documents, a full database schema, and the initial implementation of the agent framework, including the base agent, models, configuration, and the feedback intake agent. The overall architecture is well-thought-out and robust.

My review focuses on ensuring data integrity, correcting potential runtime errors in the agent base class, and improving code clarity and consistency. I've identified a few high-severity issues in src/prompt_rewriter/agents/base.py related to handling of proposal_id and an incorrect LLM API call that could lead to runtime failures. I've also made some medium-severity suggestions regarding database schema integrity, documentation, and naming consistency between the data models and the database schema.

Overall, this is a great foundation for the new system. Addressing these points will improve its stability and maintainability.

Comment on lines +83 to +87
log_entry = AgentLogEntry(
agent_name=self.name,
proposal_id=proposal_id or UUID("00000000-0000-0000-0000-000000000000"),
input_summary=self._summarize_input(input_data),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When proposal_id is None, it's defaulted to a nil UUID. This will cause a foreign key violation when _save_log_entry is called, as a proposal with a nil UUID won't exist in the prompt_rewrite_proposals table. The proposal_id should be passed as None to the AgentLogEntry constructor. I've added a separate comment on src/prompt_rewriter/models.py to make the proposal_id field optional in the AgentLogEntry model to support this change.

Suggested change
log_entry = AgentLogEntry(
agent_name=self.name,
proposal_id=proposal_id or UUID("00000000-0000-0000-0000-000000000000"),
input_summary=self._summarize_input(input_data),
)
log_entry = AgentLogEntry(
agent_name=self.name,
proposal_id=proposal_id,
input_summary=self._summarize_input(input_data),
)

Comment on lines +227 to +234
config = GenerateContentConfig(
tools=tools,
system_instruction=system_instruction,
)

model_instance = genai.GenerativeModel(model_name=model_name)

response = model_instance.generate_content(prompt, generation_config=config)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The implementation of call_llm_with_search has a couple of issues that will likely cause a runtime error:

  1. GenerateContentConfig is passed to the generation_config parameter of generate_content, which expects a genai.GenerationConfig instance.
  2. The system_instruction is passed to GenerateContentConfig but it should be passed to the GenerativeModel constructor. The model is currently instantiated without it.

To fix this, you should pass system_instruction to the model constructor and tools directly to generate_content.

Suggested change
config = GenerateContentConfig(
tools=tools,
system_instruction=system_instruction,
)
model_instance = genai.GenerativeModel(model_name=model_name)
response = model_instance.generate_content(prompt, generation_config=config)
model_instance = genai.GenerativeModel(
model_name=model_name,
system_instruction=system_instruction,
)
response = model_instance.generate_content(prompt, tools=tools)

"""Log entry for agent execution."""

agent_name: str
proposal_id: UUID
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The proposal_id should be optional to support agent logs that are not associated with a specific proposal. Currently, BaseAgent.execute in src/prompt_rewriter/agents/base.py attempts to use a nil UUID when no proposal_id is provided, which will cause a foreign key violation when saving the log to the database. Making this field optional will fix the issue.

Suggested change
proposal_id: UUID
proposal_id: Optional[UUID] = None

Comment on lines +8 to +13
Usage:
from prompt_rewriter import PromptRewriterOrchestrator

orchestrator = PromptRewriterOrchestrator()
result = await orchestrator.process_feedback(feedback_event)
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The usage example in the docstring refers to PromptRewriterOrchestrator, which is not implemented in this PR. The main entry point appears to be the prompt_rewriter_flow Prefect flow. To avoid confusion, I suggest updating the example to reflect the current implementation.

Suggested change
Usage:
from prompt_rewriter import PromptRewriterOrchestrator
orchestrator = PromptRewriterOrchestrator()
result = await orchestrator.process_feedback(feedback_event)
"""
Usage:
from prompt_rewriter.main import prompt_rewriter_flow
# The main entry point is the `prompt_rewriter_flow` Prefect flow.
# It is triggered with a proposal_id.
# See `src/prompt_rewriter/main.py` for usage details.
"""

"input_data": {"summary": log_entry.input_summary},
"output_data": {"summary": log_entry.output_summary},
"error_message": log_entry.error_message,
"llm_total_tokens": log_entry.llm_tokens_used,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's an inconsistency between the Pydantic model and the database schema. The AgentLogEntry model uses llm_tokens_used, but the database column is llm_total_tokens. The key used here for insertion is llm_total_tokens. To improve clarity and maintainability, it's best to use the same name in all places. I'd suggest renaming llm_tokens_used to llm_total_tokens in src/prompt_rewriter/models.py.


-- Deployment
deployed_at TIMESTAMPTZ,
deployed_prompt_version_id UUID, -- Reference to prompt_versions if it exists
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better data integrity, consider adding a foreign key constraint to deployed_prompt_version_id. The proposal document mentions a reference to prompt_versions(id). If the prompt_versions table exists or is planned, adding a foreign key constraint (e.g., REFERENCES prompt_versions(id)) would prevent orphaned records and ensure referential integrity.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bf22842aac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +100 to +103
title=snippet.get("title", "Unknown"),
summary=snippet.get("summary", "No summary available"),
categories=", ".join(snippet.get("disinformation_categories", [])),
transcription=snippet.get("transcription", "")[:1000], # Truncate
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle non-string disinformation categories before join

The prompt formatting assumes snippet["disinformation_categories"] is a list of strings, but the Stage 4 output schema defines each category as an object with english/spanish fields (and it can also be null in existing tests). In those common cases, ", ".join(...) raises a TypeError, so the feedback intake agent will crash before calling the LLM, breaking the entire rewriter flow for normal snippets. Consider normalizing to a list of strings (e.g., [c["english"] for c in categories]) and handling None explicitly.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive Prompt Rewriter Agent system designed to autonomously improve misinformation detection prompts based on user feedback. The system implements a multi-agent pipeline that analyzes feedback, researches claims, proposes prompt modifications, tests changes experimentally, and deploys improvements.

Changes:

  • Database schema migration adding 6 new tables (proposals, experiment runs, reprocess queue, knowledge facts, feedback events, agent logs) with supporting indexes, triggers, and RLS policies
  • Python module structure with Pydantic models, configuration management, base agent class, and orchestrator flow using Prefect
  • Feedback Intake Agent implementation for classifying and extracting actionable information from user feedback
  • Technical documentation including architecture proposal and detailed implementation roadmap

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 34 comments.

Show a summary per file
File Description
supabase/migrations/20260124000000_prompt_rewriter_agent_schema.sql Database schema for tracking proposals, experiments, knowledge base, and feedback events
src/prompt_rewriter/models.py Pydantic models defining data structures for feedback, proposals, experiments, and results
src/prompt_rewriter/config.py Configuration dataclasses with environment variable support and default values
src/prompt_rewriter/main.py Prefect orchestrator flow coordinating the multi-agent pipeline from feedback to deployment
src/prompt_rewriter/agents/base.py Base agent class providing LLM utilities, logging, error handling, and database access
src/prompt_rewriter/agents/feedback_intake.py Feedback Intake Agent for analyzing user feedback and extracting claims
src/prompt_rewriter/agents/init.py Package initialization for agent modules
src/prompt_rewriter/init.py Main package initialization exposing key models and types
docs/PROMPT_REWRITER_AGENT_PROPOSAL.md Technical proposal explaining architecture, workflow, and design decisions
docs/PROMPT_REWRITER_IMPLEMENTATION_ROADMAP.md Detailed implementation roadmap with task breakdown and dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

proposal_id UUID NOT NULL REFERENCES prompt_rewrite_proposals(id) ON DELETE CASCADE,

-- Why this snippet was queued
reason TEXT NOT NULL, -- 'semantic_similarity', 'same_category', 'manual'
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason field is defined as TEXT with comment indicating values 'semantic_similarity', 'same_category', 'manual', but there's no database-level constraint. Consider creating a reprocess_reason enum type to enforce valid values and maintain consistency with the schema design pattern.

Copilot uses AI. Check for mistakes.
Returns:
The LLM's response text
"""
import google.generativeai as genai
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code imports google.generativeai (the older SDK), but the existing codebase uses from google import genai (the newer Google Genai SDK). This creates an inconsistency and may cause compatibility issues. Consider updating to use the newer SDK's genai.Client API pattern as shown in src/processing_pipeline/stage_1.py:570 and src/processing_pipeline/stage_3.py:373, which provides better async support and consistency with the rest of the codebase.

Copilot uses AI. Check for mistakes.
Comment on lines +139 to +156
self.supabase.table("prompt_rewriter_agent_logs").insert(
{
"agent_name": log_entry.agent_name,
"proposal_id": str(log_entry.proposal_id),
"started_at": log_entry.started_at.isoformat(),
"completed_at": (
log_entry.completed_at.isoformat()
if log_entry.completed_at
else None
),
"duration_ms": log_entry.duration_ms,
"status": log_entry.status,
"input_data": {"summary": log_entry.input_summary},
"output_data": {"summary": log_entry.output_summary},
"error_message": log_entry.error_message,
"llm_total_tokens": log_entry.llm_tokens_used,
}
).execute()
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Supabase insert operation is not awaited in this async function. The supabase-py client's execute() method is synchronous and will block the event loop. Consider using asyncio.to_thread() to run this in a thread pool, or verify if there's an async version of the Supabase client available.

Copilot uses AI. Check for mistakes.
extracted_claim TEXT,
user_correction TEXT,
affected_prompt_stages INTEGER[], -- e.g., [1, 3] for Stage 1 and Stage 3
priority TEXT DEFAULT 'medium', -- 'low', 'medium', 'high', 'critical'
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The priority field is defined as TEXT with a comment indicating values 'low', 'medium', 'high', 'critical', but this is not enforced at the database level. Consider creating a priority enum type similar to the other enums (prompt_rewrite_status, feedback_intent, proposal_type) to enforce valid values and improve data integrity.

Copilot uses AI. Check for mistakes.

-- Processing
priority INTEGER DEFAULT 0, -- Higher = process first
status TEXT DEFAULT 'queued', -- 'queued', 'processing', 'completed', 'failed', 'skipped'
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status field is defined as TEXT with comment indicating values 'queued', 'processing', 'completed', 'failed', 'skipped', but there's no database-level constraint. Consider creating a queue_status enum type to enforce valid values and maintain consistency with the schema design pattern used for other enums.

Copilot uses AI. Check for mistakes.
consistency_score FLOAT,

-- Evaluation
evaluation_decision TEXT, -- 'accept', 'refine', 'reject'
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The evaluation_decision field is defined as TEXT with a comment indicating values 'accept', 'refine', 'reject', but there's no database-level constraint. This could lead to inconsistent data. Since there's already an EvaluationDecision enum in the Python models, consider creating a corresponding PostgreSQL enum type and using it here for consistency and data integrity.

Copilot uses AI. Check for mistakes.

-- Version control
version INTEGER DEFAULT 1,
previous_version_id UUID REFERENCES knowledge_facts(id),
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous_version_id creates a self-referential foreign key that could lead to circular references or orphaned chains if not handled carefully. There's no constraint preventing cycles (e.g., fact A references fact B as previous version, and fact B references fact A). Consider adding a CHECK constraint to ensure version numbers are monotonically increasing, or implementing versioning through a separate versions table to avoid this complexity.

Suggested change
previous_version_id UUID REFERENCES knowledge_facts(id),
previous_version_id UUID,

Copilot uses AI. Check for mistakes.
intent_confidence: float = Field(ge=0, le=1)
extracted_claim: Optional[str] = None
user_correction: Optional[str] = None
affected_prompt_stages: list[int] = Field(default_factory=lambda: [1, 3])
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The affected_prompt_stages field defaults to [1, 3] in the lambda, but this default doesn't align with the comment on line 58 of the migration file which indicates the values should represent stage numbers (e.g., Stage 1 and Stage 3). The Python model uses integers while the database uses an INTEGER[] type, which is good. However, ensure that the values 1 and 3 are meaningful in context - if they represent PromptStage values, consider documenting this or using an enum for clarity.

Copilot uses AI. Check for mistakes.
content: Optional[str] = None # Comment text or feedback details
comment_id: Optional[str] = None # Liveblocks comment ID
label_id: Optional[UUID] = None
created_at: datetime = Field(default_factory=datetime.utcnow)
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using datetime.utcnow() is deprecated as of Python 3.12. The existing codebase uses datetime.now(timezone.utc) (see src/processing_pipeline/stage_3.py:379 and src/processing_pipeline/supabase_utils.py:290). Please update to use datetime.now(timezone.utc) for consistency and to avoid deprecation warnings.

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +49
result = (
client.table("prompt_rewrite_proposals")
.select("*")
.eq("id", str(proposal_id))
.single()
.execute()
)
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Supabase select operation is not awaited in this async function. The supabase-py client's execute() method is synchronous and will block the event loop. Consider using asyncio.to_thread() to run this in a thread pool for better async performance.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Fix all issues with AI agents
In `@docs/PROMPT_REWRITER_AGENT_PROPOSAL.md`:
- Line 35: The markdown has lint issues: add a fenced-code language label to the
unlabeled code block (use ```text) and replace bold-as-heading lines for the
options with proper markdown headings (e.g., change **Option A: Inline in Prompt
Files (Recommended for MVP)** to a heading like #### Option A: Inline in Prompt
Files (Recommended for MVP)); apply the same fixes for the later occurrences
referenced (the block at 141-151) so all fenced blocks are labeled and bolded
option lines are converted to proper heading levels.

In `@docs/PROMPT_REWRITER_IMPLEMENTATION_ROADMAP.md`:
- Line 12: The bold "Priority" lines in the roadmap should be converted into
proper Markdown headings or list items and all unlabeled code fences in the
roadmap sections (the triple-backtick blocks shown in the diff) must include a
language tag (e.g., ```text) to satisfy markdownlint MD036/MD040; update each
instance that currently uses **Priority: Critical | Estimate: 1 day** (and other
bolded "Priority" occurrences) to a heading like "### Priority: Critical |
Estimate: 1 day" or an item in a list, and add the `text` language tag to each
unlabeled code fence so they become ```text ... ```, applying these changes
consistently to all roadmap sections that contain those bold lines and unlabeled
fences.

In `@src/prompt_rewriter/__init__.py`:
- Around line 26-35: The exported names in the __all__ list are unsorted causing
Ruff RUF022; open the module-level __all__ variable in __init__.py and reorder
the entries alphabetically (e.g., arrange "EvaluationResult",
"ExperimentResult", "FeedbackEvent", "FeedbackIntent", "ProposalStatus",
"ProposalType", "PromptProposal", "ResearchResult" into lexicographic order) so
the list is sorted to satisfy the linter.

In `@src/prompt_rewriter/agents/base.py`:
- Around line 160-235: call_llm_with_search currently calls the synchronous
genai.generate_content directly (blocking the event loop) and assumes
response.candidates[0] exists; change it to run the blocking call inside
asyncio.to_thread(...) (wrapping the model_instance.generate_content call) and
after getting the response validate that response.candidates is non-empty before
accessing [0]; if empty, return a sensible fallback (e.g., empty string and
empty grounding list) and otherwise extract response.candidates[0].content/text
and grounding sources to return the tuple (ensure you keep using
model_instance.generate_content, generation config construction with
GenerateContentConfig/Tool/GoogleSearch, and check response.candidates to avoid
IndexError).
- Around line 236-249: The code accesses response.candidates[0] without ensuring
candidates is non-empty, causing IndexError when the SDK returns no candidates;
update the grounding metadata extraction in src/prompt_rewriter/agents/base.py
to first check that response has a non-empty candidates list (e.g., if
getattr(response, "candidates", None) and len(response.candidates) > 0) before
referencing response.candidates[0], and only then proceed to inspect
grounding_metadata and grounding_chunks (symbols: response, candidates,
grounding_metadata, grounding_chunks, sources); if there are no candidates, skip
extraction or leave sources empty to avoid crashing the pipeline.

In `@src/prompt_rewriter/agents/feedback_intake.py`:
- Around line 99-106: The code builds `prompt` using
FEEDBACK_CLASSIFICATION_PROMPT and calls ",
".join(snippet.get("disinformation_categories", [])) which will raise if
`snippet.get("disinformation_categories")` returns None; update the `categories`
expression to defensively coerce to an iterable (e.g., use
`snippet.get("disinformation_categories") or []` or wrap with `list(...)`)
before joining so `", ".join(...)` always receives a list/tuple of strings; keep
the same variable `categories` in the FEEDBACK_CLASSIFICATION_PROMPT.format call
and ensure no other fields rely on nullable values.

In `@src/prompt_rewriter/main.py`:
- Around line 83-94: The FeedbackEvent construction is incorrectly using the
proposal ID (UUID(proposal_data["id"])) for FeedbackEvent.id; change it to use
the actual feedback event id from the payload (e.g.,
UUID(proposal_data["triggered_by_feedback_id"]) or
proposal_data.get("triggered_by_feedback_id")) and fall back to None if that
field is missing, so update the id assignment in the FeedbackEvent instantiation
in main.py accordingly (refer to FeedbackEvent and proposal_data in that block).

In `@supabase/migrations/20260124000000_prompt_rewriter_agent_schema.sql`:
- Around line 374-425: The RLS is too permissive: update grants and policies for
user-scoped data and decide ownership for internal tables. For
prompt_rewrite_proposals and user_feedback_events change GRANT ALL TO
authenticated to GRANT SELECT TO authenticated and replace their policies
("Allow authenticated read on proposals" and "Allow authenticated read on
feedback_events") from USING (true) to USING (triggered_by_user_id = auth.uid())
and USING (user_id = auth.uid()) respectively; for prompt_rewriter_agent_logs
and prompt_experiment_runs either remove the authenticated FOR SELECT policies
and make them service_role-only (keep only the "Allow service_role all ..."
policies) or tighten them to ownership-based checks (e.g., link to proposal
ownership) if users must see related rows; for knowledge_facts and
snippet_reprocess_queue confirm intended scope and either keep shared read
(USING (true)) or add ownership predicates before granting authenticated access.
🧹 Nitpick comments (4)
src/prompt_rewriter/models.py (1)

16-87: Prefer enums for feedback_type and priority to avoid invalid values.

These are persisted fields; aligning them with enums (like the other status fields) improves data integrity and avoids typos.

♻️ Suggested refactor
+class FeedbackType(str, Enum):
+    THUMBS_DOWN = "thumbs_down"
+    COMMENT = "comment"
+    LABEL_DISPUTE = "label_dispute"
+    MANUAL = "manual"
+
+class FeedbackPriority(str, Enum):
+    LOW = "low"
+    MEDIUM = "medium"
+    HIGH = "high"
+    CRITICAL = "critical"
+
 class FeedbackEvent(BaseModel):
     """A user feedback event that may trigger the rewriter."""
 
     id: Optional[UUID] = None
-    feedback_type: str  # 'thumbs_down', 'comment', 'label_dispute', 'manual'
+    feedback_type: FeedbackType
     snippet_id: UUID
     user_id: Optional[UUID] = None
     content: Optional[str] = None  # Comment text or feedback details
     comment_id: Optional[str] = None  # Liveblocks comment ID
     label_id: Optional[UUID] = None
     created_at: datetime = Field(default_factory=datetime.utcnow)
 
 class FeedbackAnalysis(BaseModel):
     """Output from the Feedback Intake Agent."""
 
     intent: FeedbackIntent
     intent_confidence: float = Field(ge=0, le=1)
     extracted_claim: Optional[str] = None
     user_correction: Optional[str] = None
     affected_prompt_stages: list[int] = Field(default_factory=lambda: [1, 3])
-    priority: str = "medium"  # 'low', 'medium', 'high', 'critical'
+    priority: FeedbackPriority = FeedbackPriority.MEDIUM
     reasoning: Optional[str] = None
src/prompt_rewriter/agents/feedback_intake.py (1)

118-131: Avoid blocking I/O inside async.
Supabase client calls are sync; in an async flow this can block the event loop. Consider asyncio.to_thread(...) (or a sync method) and use logger.exception to preserve stack traces.

🧵 Example non-blocking pattern
+import asyncio
 ...
     async def _get_snippet(self, snippet_id: UUID) -> dict | None:
         """Fetch snippet details from database."""
         try:
-            result = (
-                self.supabase.table("snippets")
-                .select("id, title, summary, transcription, disinformation_categories")
-                .eq("id", str(snippet_id))
-                .single()
-                .execute()
-            )
+            def _fetch():
+                return (
+                    self.supabase.table("snippets")
+                    .select("id, title, summary, transcription, disinformation_categories")
+                    .eq("id", str(snippet_id))
+                    .single()
+                    .execute()
+                )
+            result = await asyncio.to_thread(_fetch)
             return result.data
         except Exception as e:
-            logger.error(f"Error fetching snippet {snippet_id}: {e}")
+            if isinstance(e, asyncio.CancelledError):
+                raise
+            logger.exception(f"Error fetching snippet {snippet_id}: {e}")
             return None
src/prompt_rewriter/main.py (2)

36-72: Sync Supabase calls inside async tasks can block the event loop.
Consider running these calls in a thread (asyncio.to_thread) or making the tasks sync to avoid blocking Prefect’s async flow.

Also applies to: 380-386


106-108: Placeholders are clear; let me know if you want help implementing them.
If you want, I can help draft the Research/Proposal/Experiment/Deployment agents or add guards to prevent accidental production use.

Also applies to: 128-129, 157-158, 200-201


### Architecture Overview

```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix markdownlint issues (fenced language + heading levels).

MD040 flags the unlabeled fenced block, and MD036 flags bold text used as headings for the options list.

💡 Suggested fix
-```
+```text
 ┌─────────────────────────────────────────────────────────────────────────────┐
 ...
-```
+```

-**Option A: Inline in Prompt Files (Recommended for MVP)**
+#### Option A: Inline in Prompt Files (Recommended for MVP)

-**Option B: Separate Knowledge Base File**
+#### Option B: Separate Knowledge Base File

-**Option C: Database-Stored (Future)**
+#### Option C: Database-Stored (Future)

Also applies to: 141-151

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

35-35: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@docs/PROMPT_REWRITER_AGENT_PROPOSAL.md` at line 35, The markdown has lint
issues: add a fenced-code language label to the unlabeled code block (use
```text) and replace bold-as-heading lines for the options with proper markdown
headings (e.g., change **Option A: Inline in Prompt Files (Recommended for
MVP)** to a heading like #### Option A: Inline in Prompt Files (Recommended for
MVP)); apply the same fixes for the later occurrences referenced (the block at
141-151) so all fenced blocks are labeled and bolded option lines are converted
to proper heading levels.

## Phase 1: Foundation

### 1.1 Database Schema Deployment
**Priority: Critical | Estimate: 1 day**
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Address markdownlint MD036/MD040 in roadmap sections.

Convert bold “Priority” lines into headings or list items, and add language tags to unlabeled code fences (Lines 127, 310, 459).

💡 Suggested fix (apply consistently)
-**Priority: Critical | Estimate: 1 day**
+#### Priority: Critical | Estimate: 1 day

-```
+```text
 Given this user feedback on a misinformation analysis:
 ...
-```
+```

-```
+```text
 New prompt improvement deployed!
 ...
-```
+```

-```
+```text
 Phase 1 (Foundation)
 ...
-```
+```

Also applies to: 127-127, 310-310, 459-459

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

12-12: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🤖 Prompt for AI Agents
In `@docs/PROMPT_REWRITER_IMPLEMENTATION_ROADMAP.md` at line 12, The bold
"Priority" lines in the roadmap should be converted into proper Markdown
headings or list items and all unlabeled code fences in the roadmap sections
(the triple-backtick blocks shown in the diff) must include a language tag
(e.g., ```text) to satisfy markdownlint MD036/MD040; update each instance that
currently uses **Priority: Critical | Estimate: 1 day** (and other bolded
"Priority" occurrences) to a heading like "### Priority: Critical | Estimate: 1
day" or an item in a list, and add the `text` language tag to each unlabeled
code fence so they become ```text ... ```, applying these changes consistently
to all roadmap sections that contain those bold lines and unlabeled fences.

Comment on lines +26 to +35
__all__ = [
"FeedbackEvent",
"FeedbackIntent",
"ProposalType",
"ProposalStatus",
"ResearchResult",
"PromptProposal",
"ExperimentResult",
"EvaluationResult",
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Sort __all__ to satisfy Ruff (RUF022).

Minor lint issue; sorting avoids CI noise.

💡 Suggested fix
 __all__ = [
-    "FeedbackEvent",
-    "FeedbackIntent",
-    "ProposalType",
-    "ProposalStatus",
-    "ResearchResult",
-    "PromptProposal",
-    "ExperimentResult",
-    "EvaluationResult",
+    "EvaluationResult",
+    "ExperimentResult",
+    "FeedbackEvent",
+    "FeedbackIntent",
+    "PromptProposal",
+    "ProposalStatus",
+    "ProposalType",
+    "ResearchResult",
 ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
__all__ = [
"FeedbackEvent",
"FeedbackIntent",
"ProposalType",
"ProposalStatus",
"ResearchResult",
"PromptProposal",
"ExperimentResult",
"EvaluationResult",
]
__all__ = [
"EvaluationResult",
"ExperimentResult",
"FeedbackEvent",
"FeedbackIntent",
"PromptProposal",
"ProposalStatus",
"ProposalType",
"ResearchResult",
]
🧰 Tools
🪛 Ruff (0.14.13)

26-35: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

🤖 Prompt for AI Agents
In `@src/prompt_rewriter/__init__.py` around lines 26 - 35, The exported names in
the __all__ list are unsorted causing Ruff RUF022; open the module-level __all__
variable in __init__.py and reorder the entries alphabetically (e.g., arrange
"EvaluationResult", "ExperimentResult", "FeedbackEvent", "FeedbackIntent",
"ProposalStatus", "ProposalType", "PromptProposal", "ResearchResult" into
lexicographic order) so the list is sorted to satisfy the linter.

Comment on lines +160 to +235
async def call_llm(
self,
prompt: str,
system_instruction: str | None = None,
model: str | None = None,
temperature: float = 0.7,
max_tokens: int = 4096,
) -> str:
"""
Call an LLM with the given prompt.

This is a utility method for subclasses to use when they need
to call an LLM. It handles model selection and basic error handling.

Args:
prompt: The user prompt
system_instruction: Optional system instruction
model: Model to use (defaults to config.default_llm_model)
temperature: Sampling temperature
max_tokens: Maximum tokens to generate

Returns:
The LLM's response text
"""
import google.generativeai as genai

model_name = model or self.config.default_llm_model

generation_config = genai.GenerationConfig(
temperature=temperature,
max_output_tokens=max_tokens,
)

model_instance = genai.GenerativeModel(
model_name=model_name,
system_instruction=system_instruction,
generation_config=generation_config,
)

response = model_instance.generate_content(prompt)
return response.text

async def call_llm_with_search(
self,
prompt: str,
system_instruction: str | None = None,
model: str | None = None,
) -> tuple[str, list[dict]]:
"""
Call an LLM with Google Search grounding enabled.

Args:
prompt: The user prompt
system_instruction: Optional system instruction
model: Model to use

Returns:
Tuple of (response_text, grounding_sources)
"""
import google.generativeai as genai
from google.generativeai.types import GenerateContentConfig, Tool
from google.generativeai.types.content_types import GoogleSearch

model_name = model or self.config.research_model

tools = [Tool(google_search=GoogleSearch())]

config = GenerateContentConfig(
tools=tools,
system_instruction=system_instruction,
)

model_instance = genai.GenerativeModel(model_name=model_name)

response = model_instance.generate_content(prompt, generation_config=config)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

google-generativeai Python SDK async methods generate_content_async

💡 Result:

In the legacy google-generativeai Python SDK (the one you import as google.generativeai as genai), there is a method named GenerativeModel.generate_content_async(...), but Google’s own migration guide shows it being called without await (it returns an object directly), and there are open bug reports where trying to await it raises TypeError: object GenerateContentResponse can't be used in 'await' expression (notably in Colab / Python 3.11). [1], [2]

In the current SDK (google-genai, imported as from google import genai), you don’t use generate_content_async. Instead, every async method lives under client.aio and is awaited:

from google import genai

client = genai.Client()

response = await client.aio.models.generate_content(
    model="gemini-2.0-flash",
    contents="Tell me a story in 300 words."
)
print(response.text)

This is the recommended replacement for generate_content_async, and streaming also has an async version under client.aio. [1], [3]

Sources: [1] Google migration guide, [2] legacy SDK issue #732, [3] googleapis/python-genai (Google Gen AI Python SDK) README.


🏁 Script executed:

# Search for any usage of response.candidates in the codebase
rg "response\.candidates" --type py

Repository: PublicDataWorks/verdad

Length of output: 1314


🏁 Script executed:

# Check the actual implementation around line 160-235
cd src && head -250 prompt_rewriter/agents/base.py | tail -100

Repository: PublicDataWorks/verdad

Length of output: 3482


🏁 Script executed:

# Look for any imports or existing async patterns in the file
rg "asyncio|async def" src/prompt_rewriter/agents/base.py

Repository: PublicDataWorks/verdad

Length of output: 275


Avoid blocking the event loop in async LLM calls; handle empty response candidates.

The generate_content() method is synchronous and will block the event loop when called directly in async methods. Additionally, response.candidates[0] is accessed without ensuring the candidates list is non-empty, which can raise IndexError.

For the blocking I/O, use asyncio.to_thread() as a practical workaround for the legacy SDK. The long-term solution is to migrate to the newer google-genai SDK (imported as from google import genai), which provides proper async methods via client.aio.models.generate_content().

For response handling, check that response.candidates is non-empty before accessing response.candidates[0]:

💡 Suggested fixes
+import asyncio
 ...
-        response = model_instance.generate_content(prompt)
+        response = await asyncio.to_thread(model_instance.generate_content, prompt)
         return response.text
 ...
-        response = model_instance.generate_content(prompt, generation_config=config)
+        response = await asyncio.to_thread(
+            model_instance.generate_content,
+            prompt,
+            generation_config=config,
+        )

         # Extract grounding metadata
         sources = []
-        if hasattr(response.candidates[0], "grounding_metadata"):
-            metadata = response.candidates[0].grounding_metadata
+        if response.candidates and hasattr(response.candidates[0], "grounding_metadata"):
+            metadata = response.candidates[0].grounding_metadata
🤖 Prompt for AI Agents
In `@src/prompt_rewriter/agents/base.py` around lines 160 - 235,
call_llm_with_search currently calls the synchronous genai.generate_content
directly (blocking the event loop) and assumes response.candidates[0] exists;
change it to run the blocking call inside asyncio.to_thread(...) (wrapping the
model_instance.generate_content call) and after getting the response validate
that response.candidates is non-empty before accessing [0]; if empty, return a
sensible fallback (e.g., empty string and empty grounding list) and otherwise
extract response.candidates[0].content/text and grounding sources to return the
tuple (ensure you keep using model_instance.generate_content, generation config
construction with GenerateContentConfig/Tool/GoogleSearch, and check
response.candidates to avoid IndexError).

Comment on lines +236 to +249
# Extract grounding metadata
sources = []
if hasattr(response.candidates[0], "grounding_metadata"):
metadata = response.candidates[0].grounding_metadata
if hasattr(metadata, "grounding_chunks"):
for chunk in metadata.grounding_chunks:
if hasattr(chunk, "web"):
sources.append(
{
"url": chunk.web.uri,
"title": chunk.web.title,
}
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the base.py file
find . -name "base.py" -path "*/prompt_rewriter/agents/*" -type f

Repository: PublicDataWorks/verdad

Length of output: 102


🏁 Script executed:

# Check the repository structure
git ls-files | grep -E "(base\.py|requirements|setup\.py|pyproject\.toml)" | head -20

Repository: PublicDataWorks/verdad

Length of output: 191


🌐 Web query:

google-generativeai SDK generate_content empty candidates list behavior

💡 Result:

In the Gemini API (which the google-generativeai SDK wraps), an empty response.candidates list is expected only when the prompt is blocked / rejected, not when only the model output is blocked.

  • The API “returns either all requested candidates or none of them” and “returns no candidates at all only if there was something wrong with the prompt (check promptFeedback).” If promptFeedback.blockReason is set, “the prompt was blocked and no candidates are returned.” [1]
  • Separately, you can get a candidate object whose content has no parts / no text when the candidate is blocked by safety filters, in which case you should inspect candidate.finish_reason and candidate.safety_ratings instead of assuming content.parts[0].text exists. (This commonly shows up as “Response candidate content has no parts … likely blocked by the safety filters.”) [2]
  • If you’re seeing “sometimes empty text” in output.candidates[0].content.parts[0].text even for benign prompts, that has been reported as intermittent in Google’s SDK issue trackers; you should treat it as a case your code must handle (and log full response incl. promptFeedback, finish_reason, and safety ratings). [3]

Practical handling:

  1. If not response.candidates: check response.prompt_feedback.block_reason (+ safety_ratings) and treat as “prompt blocked/invalid”.
  2. Else if candidate exists but has no text/parts: treat as “output blocked” and use candidate.finish_reason / candidate.safety_ratings.
  3. Only read response.text / candidate.content.parts[0].text after those checks.

Sources: [1] Gemini API GenerateContentResponse / PromptFeedback reference, [2] example of blocked candidate with no parts in Vertex AI issue, [3] intermittent empty text report in Google’s SDK issue tracker.


🌐 Web query:

google-generativeai SDK response structure grounding_metadata documentation

💡 Result:

In the google-generativeai Python SDK, grounding information is returned per candidate. After a generate_content(...) call, you access it like:

  • response.candidates[0].grounding_metadata
  • then typically:
    • response.candidates[0].grounding_metadata.grounding_chunks
    • response.candidates[0].grounding_metadata.grounding_supports

This access pattern is shown in Google’s Gemini API grounding guide (Python example). [1]

Where this is documented (authoritative schemas)

  1. Gemini API (Generative Language) – Candidate schema
    Candidate includes groundingMetadata (“Grounding metadata for the candidate … populated for GenerateContent calls”). [2]

  2. Vertex AI – GroundingMetadata schema (field meanings)
    Documents the main fields you’ll see inside grounding_metadata, including:

  • webSearchQueries
  • retrievalQueries
  • groundingChunks
  • groundingSupports
  • retrievalMetadata (e.g., googleSearchDynamicRetrievalScore) [3]

Quick mental model of the response structure

  • response.candidates[] → each candidate may have its own grounding_metadata. [1][2]
  • grounding_chunks → the retrieved sources (URLs / docs / maps places, depending on tool). [1][3]
  • grounding_supports → spans/segments of the generated text mapped to chunk indices (for inline citations). [1][3]

Sources:
[1] Gemini API “Grounding with Google Search” (example accessing response.candidates[0].grounding_metadata...)
[2] Gemini API “Candidate” reference (includes groundingMetadata, populated for GenerateContent)
[3] Vertex AI REST reference: GroundingMetadata fields and definitions


🏁 Script executed:

# Read the relevant section of base.py
sed -n '230,260p' src/prompt_rewriter/agents/base.py

Repository: PublicDataWorks/verdad

Length of output: 855


🏁 Script executed:

# Get more context around this code - look at the function definition
sed -n '200,260p' src/prompt_rewriter/agents/base.py

Repository: PublicDataWorks/verdad

Length of output: 1789


🏁 Script executed:

# Check imports to understand how response object is created
head -40 src/prompt_rewriter/agents/base.py

Repository: PublicDataWorks/verdad

Length of output: 1067


🏁 Script executed:

# Search for other uses of response.candidates in the codebase
rg "response\.candidates" --type python -B 2 -A 2

Repository: PublicDataWorks/verdad

Length of output: 96


🏁 Script executed:

# Check if there's any existing error handling for empty candidates
rg "candidates" src/prompt_rewriter/agents/base.py -B 2 -A 2

Repository: PublicDataWorks/verdad

Length of output: 364


🏁 Script executed:

# Verify there's no surrounding try-catch that might handle this
sed -n '220,270p' src/prompt_rewriter/agents/base.py

Repository: PublicDataWorks/verdad

Length of output: 1231


🏁 Script executed:

# Check if there are any tests for this error case
find . -name "*.py" -path "*/test*" -type f | xargs rg "call_llm_with_search|candidates\[0\]" 2>/dev/null | head -20

Repository: PublicDataWorks/verdad

Length of output: 48


Guard against empty candidates before indexing.

response.candidates[0] will raise IndexError if the SDK returns no candidates (which occurs when the prompt is blocked). The defensive check prevents pipeline crashes.

Suggested fix
         sources = []
-        if hasattr(response.candidates[0], "grounding_metadata"):
-            metadata = response.candidates[0].grounding_metadata
+        candidates = getattr(response, "candidates", None) or []
+        if candidates and hasattr(candidates[0], "grounding_metadata"):
+            metadata = candidates[0].grounding_metadata
             if hasattr(metadata, "grounding_chunks"):
                 for chunk in metadata.grounding_chunks:
                     if hasattr(chunk, "web"):
                         sources.append(
                             {
                                 "url": chunk.web.uri,
                                 "title": chunk.web.title,
                             }
                         )
🤖 Prompt for AI Agents
In `@src/prompt_rewriter/agents/base.py` around lines 236 - 249, The code accesses
response.candidates[0] without ensuring candidates is non-empty, causing
IndexError when the SDK returns no candidates; update the grounding metadata
extraction in src/prompt_rewriter/agents/base.py to first check that response
has a non-empty candidates list (e.g., if getattr(response, "candidates", None)
and len(response.candidates) > 0) before referencing response.candidates[0], and
only then proceed to inspect grounding_metadata and grounding_chunks (symbols:
response, candidates, grounding_metadata, grounding_chunks, sources); if there
are no candidates, skip extraction or leave sources empty to avoid crashing the
pipeline.

Comment on lines +99 to +106
prompt = FEEDBACK_CLASSIFICATION_PROMPT.format(
title=snippet.get("title", "Unknown"),
summary=snippet.get("summary", "No summary available"),
categories=", ".join(snippet.get("disinformation_categories", [])),
transcription=snippet.get("transcription", "")[:1000], # Truncate
feedback_type=feedback.feedback_type,
feedback_content=feedback.content or "No content provided",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against NULL categories.
If disinformation_categories is NULL, ", ".join(None) raises TypeError and the agent fails.

🛠️ Suggested fix
-        prompt = FEEDBACK_CLASSIFICATION_PROMPT.format(
+        categories = snippet.get("disinformation_categories") or []
+        prompt = FEEDBACK_CLASSIFICATION_PROMPT.format(
             title=snippet.get("title", "Unknown"),
             summary=snippet.get("summary", "No summary available"),
-            categories=", ".join(snippet.get("disinformation_categories", [])),
+            categories=", ".join(categories),
             transcription=snippet.get("transcription", "")[:1000],  # Truncate
             feedback_type=feedback.feedback_type,
             feedback_content=feedback.content or "No content provided",
         )
🤖 Prompt for AI Agents
In `@src/prompt_rewriter/agents/feedback_intake.py` around lines 99 - 106, The
code builds `prompt` using FEEDBACK_CLASSIFICATION_PROMPT and calls ",
".join(snippet.get("disinformation_categories", [])) which will raise if
`snippet.get("disinformation_categories")` returns None; update the `categories`
expression to defensively coerce to an iterable (e.g., use
`snippet.get("disinformation_categories") or []` or wrap with `list(...)`)
before joining so `", ".join(...)` always receives a list/tuple of strings; keep
the same variable `categories` in the FEEDBACK_CLASSIFICATION_PROMPT.format call
and ensure no other fields rely on nullable values.

Comment on lines +83 to +94
feedback = FeedbackEvent(
id=UUID(proposal_data["id"]),
feedback_type=proposal_data["triggered_by_feedback_type"],
snippet_id=UUID(proposal_data["triggered_by_snippet_id"]),
user_id=(
UUID(proposal_data["triggered_by_user_id"])
if proposal_data.get("triggered_by_user_id")
else None
),
content=proposal_data.get("trigger_content"),
comment_id=proposal_data.get("triggered_by_comment_id"),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t reuse proposal ID as FeedbackEvent ID.
FeedbackEvent.id should reference the feedback event (if available); using the proposal ID will mix entity identities and complicate traceability.

🛠️ Suggested fix
-    feedback = FeedbackEvent(
-        id=UUID(proposal_data["id"]),
+    feedback = FeedbackEvent(
         feedback_type=proposal_data["triggered_by_feedback_type"],
         snippet_id=UUID(proposal_data["triggered_by_snippet_id"]),
         user_id=(
             UUID(proposal_data["triggered_by_user_id"])
             if proposal_data.get("triggered_by_user_id")
             else None
         ),
         content=proposal_data.get("trigger_content"),
         comment_id=proposal_data.get("triggered_by_comment_id"),
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
feedback = FeedbackEvent(
id=UUID(proposal_data["id"]),
feedback_type=proposal_data["triggered_by_feedback_type"],
snippet_id=UUID(proposal_data["triggered_by_snippet_id"]),
user_id=(
UUID(proposal_data["triggered_by_user_id"])
if proposal_data.get("triggered_by_user_id")
else None
),
content=proposal_data.get("trigger_content"),
comment_id=proposal_data.get("triggered_by_comment_id"),
)
feedback = FeedbackEvent(
feedback_type=proposal_data["triggered_by_feedback_type"],
snippet_id=UUID(proposal_data["triggered_by_snippet_id"]),
user_id=(
UUID(proposal_data["triggered_by_user_id"])
if proposal_data.get("triggered_by_user_id")
else None
),
content=proposal_data.get("trigger_content"),
comment_id=proposal_data.get("triggered_by_comment_id"),
)
🤖 Prompt for AI Agents
In `@src/prompt_rewriter/main.py` around lines 83 - 94, The FeedbackEvent
construction is incorrectly using the proposal ID (UUID(proposal_data["id"]))
for FeedbackEvent.id; change it to use the actual feedback event id from the
payload (e.g., UUID(proposal_data["triggered_by_feedback_id"]) or
proposal_data.get("triggered_by_feedback_id")) and fall back to None if that
field is missing, so update the id assignment in the FeedbackEvent instantiation
in main.py accordingly (refer to FeedbackEvent and proposal_data in that block).

Comment on lines +374 to +425
-- Grant permissions
GRANT ALL ON prompt_rewrite_proposals TO authenticated;
GRANT ALL ON prompt_rewrite_proposals TO service_role;
GRANT ALL ON prompt_experiment_runs TO authenticated;
GRANT ALL ON prompt_experiment_runs TO service_role;
GRANT ALL ON snippet_reprocess_queue TO authenticated;
GRANT ALL ON snippet_reprocess_queue TO service_role;
GRANT ALL ON knowledge_facts TO authenticated;
GRANT ALL ON knowledge_facts TO service_role;
GRANT ALL ON user_feedback_events TO authenticated;
GRANT ALL ON user_feedback_events TO service_role;
GRANT ALL ON prompt_rewriter_agent_logs TO authenticated;
GRANT ALL ON prompt_rewriter_agent_logs TO service_role;

-- Enable RLS
ALTER TABLE prompt_rewrite_proposals ENABLE ROW LEVEL SECURITY;
ALTER TABLE prompt_experiment_runs ENABLE ROW LEVEL SECURITY;
ALTER TABLE snippet_reprocess_queue ENABLE ROW LEVEL SECURITY;
ALTER TABLE knowledge_facts ENABLE ROW LEVEL SECURITY;
ALTER TABLE user_feedback_events ENABLE ROW LEVEL SECURITY;
ALTER TABLE prompt_rewriter_agent_logs ENABLE ROW LEVEL SECURITY;

-- RLS Policies - allow authenticated users to read, service_role for write
CREATE POLICY "Allow authenticated read on proposals" ON prompt_rewrite_proposals
FOR SELECT TO authenticated USING (true);
CREATE POLICY "Allow service_role all on proposals" ON prompt_rewrite_proposals
FOR ALL TO service_role USING (true);

CREATE POLICY "Allow authenticated read on experiment_runs" ON prompt_experiment_runs
FOR SELECT TO authenticated USING (true);
CREATE POLICY "Allow service_role all on experiment_runs" ON prompt_experiment_runs
FOR ALL TO service_role USING (true);

CREATE POLICY "Allow authenticated read on reprocess_queue" ON snippet_reprocess_queue
FOR SELECT TO authenticated USING (true);
CREATE POLICY "Allow service_role all on reprocess_queue" ON snippet_reprocess_queue
FOR ALL TO service_role USING (true);

CREATE POLICY "Allow authenticated read on knowledge_facts" ON knowledge_facts
FOR SELECT TO authenticated USING (true);
CREATE POLICY "Allow service_role all on knowledge_facts" ON knowledge_facts
FOR ALL TO service_role USING (true);

CREATE POLICY "Allow authenticated read on feedback_events" ON user_feedback_events
FOR SELECT TO authenticated USING (true);
CREATE POLICY "Allow service_role all on feedback_events" ON user_feedback_events
FOR ALL TO service_role USING (true);

CREATE POLICY "Allow authenticated read on agent_logs" ON prompt_rewriter_agent_logs
FOR SELECT TO authenticated USING (true);
CREATE POLICY "Allow service_role all on agent_logs" ON prompt_rewriter_agent_logs
FOR ALL TO service_role USING (true);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "*20260124000000*" -o -name "*prompt_rewriter_agent_schema*"

Repository: PublicDataWorks/verdad

Length of output: 135


🏁 Script executed:

wc -l ./supabase/migrations/20260124000000_prompt_rewriter_agent_schema.sql

Repository: PublicDataWorks/verdad

Length of output: 139


🏁 Script executed:

cat -n ./supabase/migrations/20260124000000_prompt_rewriter_agent_schema.sql

Repository: PublicDataWorks/verdad

Length of output: 18031


Tighten RLS: authenticated users can read all data across most tables.
The current RLS policies use USING (true), allowing authenticated users unrestricted read access to proposals, feedback events, experiment runs, reprocess queues, agent logs, and knowledge facts. This is a significant privacy/compliance risk if these tables contain sensitive or user-specific data.

Required changes:

  • prompt_rewrite_proposals and user_feedback_events have user ownership columns and should restrict reads to the originating user:

    • Change GRANT ALL to GRANT SELECT for authenticated users
    • Replace USING (true) with USING (triggered_by_user_id = auth.uid()) for proposals
    • Replace USING (true) with USING (user_id = auth.uid()) for feedback events
  • prompt_rewriter_agent_logs and prompt_experiment_runs should either be service_role-only (if intended for internal system use) or require ownership-based filtering through their linked proposal if users should see their own data.

  • knowledge_facts and snippet_reprocess_queue need architectural clarification: are these shared system tables, or should they be user-scoped?

🔐 Example tightening for user-scoped tables
- GRANT ALL ON prompt_rewrite_proposals TO authenticated;
+ GRANT SELECT ON prompt_rewrite_proposals TO authenticated;

- CREATE POLICY "Allow authenticated read on proposals" ON prompt_rewrite_proposals
-     FOR SELECT TO authenticated USING (true);
+ CREATE POLICY "Allow authenticated read own proposals" ON prompt_rewrite_proposals
+     FOR SELECT TO authenticated USING (triggered_by_user_id = auth.uid());

- GRANT ALL ON user_feedback_events TO authenticated;
+ GRANT SELECT ON user_feedback_events TO authenticated;

- CREATE POLICY "Allow authenticated read on feedback_events" ON user_feedback_events
-     FOR SELECT TO authenticated USING (true);
+ CREATE POLICY "Allow authenticated read own feedback_events" ON user_feedback_events
+     FOR SELECT TO authenticated USING (user_id = auth.uid());
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
-- Grant permissions
GRANT ALL ON prompt_rewrite_proposals TO authenticated;
GRANT ALL ON prompt_rewrite_proposals TO service_role;
GRANT ALL ON prompt_experiment_runs TO authenticated;
GRANT ALL ON prompt_experiment_runs TO service_role;
GRANT ALL ON snippet_reprocess_queue TO authenticated;
GRANT ALL ON snippet_reprocess_queue TO service_role;
GRANT ALL ON knowledge_facts TO authenticated;
GRANT ALL ON knowledge_facts TO service_role;
GRANT ALL ON user_feedback_events TO authenticated;
GRANT ALL ON user_feedback_events TO service_role;
GRANT ALL ON prompt_rewriter_agent_logs TO authenticated;
GRANT ALL ON prompt_rewriter_agent_logs TO service_role;
-- Enable RLS
ALTER TABLE prompt_rewrite_proposals ENABLE ROW LEVEL SECURITY;
ALTER TABLE prompt_experiment_runs ENABLE ROW LEVEL SECURITY;
ALTER TABLE snippet_reprocess_queue ENABLE ROW LEVEL SECURITY;
ALTER TABLE knowledge_facts ENABLE ROW LEVEL SECURITY;
ALTER TABLE user_feedback_events ENABLE ROW LEVEL SECURITY;
ALTER TABLE prompt_rewriter_agent_logs ENABLE ROW LEVEL SECURITY;
-- RLS Policies - allow authenticated users to read, service_role for write
CREATE POLICY "Allow authenticated read on proposals" ON prompt_rewrite_proposals
FOR SELECT TO authenticated USING (true);
CREATE POLICY "Allow service_role all on proposals" ON prompt_rewrite_proposals
FOR ALL TO service_role USING (true);
CREATE POLICY "Allow authenticated read on experiment_runs" ON prompt_experiment_runs
FOR SELECT TO authenticated USING (true);
CREATE POLICY "Allow service_role all on experiment_runs" ON prompt_experiment_runs
FOR ALL TO service_role USING (true);
CREATE POLICY "Allow authenticated read on reprocess_queue" ON snippet_reprocess_queue
FOR SELECT TO authenticated USING (true);
CREATE POLICY "Allow service_role all on reprocess_queue" ON snippet_reprocess_queue
FOR ALL TO service_role USING (true);
CREATE POLICY "Allow authenticated read on knowledge_facts" ON knowledge_facts
FOR SELECT TO authenticated USING (true);
CREATE POLICY "Allow service_role all on knowledge_facts" ON knowledge_facts
FOR ALL TO service_role USING (true);
CREATE POLICY "Allow authenticated read on feedback_events" ON user_feedback_events
FOR SELECT TO authenticated USING (true);
CREATE POLICY "Allow service_role all on feedback_events" ON user_feedback_events
FOR ALL TO service_role USING (true);
CREATE POLICY "Allow authenticated read on agent_logs" ON prompt_rewriter_agent_logs
FOR SELECT TO authenticated USING (true);
CREATE POLICY "Allow service_role all on agent_logs" ON prompt_rewriter_agent_logs
FOR ALL TO service_role USING (true);
-- Grant permissions
GRANT SELECT ON prompt_rewrite_proposals TO authenticated;
GRANT ALL ON prompt_rewrite_proposals TO service_role;
GRANT ALL ON prompt_experiment_runs TO authenticated;
GRANT ALL ON prompt_experiment_runs TO service_role;
GRANT ALL ON snippet_reprocess_queue TO authenticated;
GRANT ALL ON snippet_reprocess_queue TO service_role;
GRANT ALL ON knowledge_facts TO authenticated;
GRANT ALL ON knowledge_facts TO service_role;
GRANT SELECT ON user_feedback_events TO authenticated;
GRANT ALL ON user_feedback_events TO service_role;
GRANT ALL ON prompt_rewriter_agent_logs TO authenticated;
GRANT ALL ON prompt_rewriter_agent_logs TO service_role;
-- Enable RLS
ALTER TABLE prompt_rewrite_proposals ENABLE ROW LEVEL SECURITY;
ALTER TABLE prompt_experiment_runs ENABLE ROW LEVEL SECURITY;
ALTER TABLE snippet_reprocess_queue ENABLE ROW LEVEL SECURITY;
ALTER TABLE knowledge_facts ENABLE ROW LEVEL SECURITY;
ALTER TABLE user_feedback_events ENABLE ROW LEVEL SECURITY;
ALTER TABLE prompt_rewriter_agent_logs ENABLE ROW LEVEL SECURITY;
-- RLS Policies - allow authenticated users to read, service_role for write
CREATE POLICY "Allow authenticated read own proposals" ON prompt_rewrite_proposals
FOR SELECT TO authenticated USING (triggered_by_user_id = auth.uid());
CREATE POLICY "Allow service_role all on proposals" ON prompt_rewrite_proposals
FOR ALL TO service_role USING (true);
CREATE POLICY "Allow authenticated read on experiment_runs" ON prompt_experiment_runs
FOR SELECT TO authenticated USING (true);
CREATE POLICY "Allow service_role all on experiment_runs" ON prompt_experiment_runs
FOR ALL TO service_role USING (true);
CREATE POLICY "Allow authenticated read on reprocess_queue" ON snippet_reprocess_queue
FOR SELECT TO authenticated USING (true);
CREATE POLICY "Allow service_role all on reprocess_queue" ON snippet_reprocess_queue
FOR ALL TO service_role USING (true);
CREATE POLICY "Allow authenticated read on knowledge_facts" ON knowledge_facts
FOR SELECT TO authenticated USING (true);
CREATE POLICY "Allow service_role all on knowledge_facts" ON knowledge_facts
FOR ALL TO service_role USING (true);
CREATE POLICY "Allow authenticated read own feedback_events" ON user_feedback_events
FOR SELECT TO authenticated USING (user_id = auth.uid());
CREATE POLICY "Allow service_role all on feedback_events" ON user_feedback_events
FOR ALL TO service_role USING (true);
CREATE POLICY "Allow authenticated read on agent_logs" ON prompt_rewriter_agent_logs
FOR SELECT TO authenticated USING (true);
CREATE POLICY "Allow service_role all on agent_logs" ON prompt_rewriter_agent_logs
FOR ALL TO service_role USING (true);
🤖 Prompt for AI Agents
In `@supabase/migrations/20260124000000_prompt_rewriter_agent_schema.sql` around
lines 374 - 425, The RLS is too permissive: update grants and policies for
user-scoped data and decide ownership for internal tables. For
prompt_rewrite_proposals and user_feedback_events change GRANT ALL TO
authenticated to GRANT SELECT TO authenticated and replace their policies
("Allow authenticated read on proposals" and "Allow authenticated read on
feedback_events") from USING (true) to USING (triggered_by_user_id = auth.uid())
and USING (user_id = auth.uid()) respectively; for prompt_rewriter_agent_logs
and prompt_experiment_runs either remove the authenticated FOR SELECT policies
and make them service_role-only (keep only the "Allow service_role all ..."
policies) or tighten them to ownership-based checks (e.g., link to proposal
ownership) if users must see related rows; for knowledge_facts and
snippet_reprocess_queue confirm intended scope and either keep shared read
(USING (true)) or add ownership predicates before granting authenticated access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants