-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Current System
Example Full system prompt:
https://gist.github.com/sprice/a915695ade4ebe7c05a4d9cfb25e9957
# copy bill to clipboard
npm run prompt -- C-244System prompt in codebase:
https://github.com/BuildCanada/BillsTracker/blob/main/src/prompt/summary-and-vote-prompt.ts
Goals
- Quality & Correctness - Produce LLM judgements and assessments that meet or exceed our standard of quality and correctness
- Separation of concerns - Decouple application UI/business logic from LLM system
- Improved prompt engineering - Establish foundation for objective/systematic prompt improvements
- Enhanced metadata - Provide valuable contextual information for each bill and judgement
Phase 0
Tighten up code-formatting/CI
Move Judgement Logic to LLM
- Migrate all voting decision logic to the system prompt (LLM outputs:
yes,no, orabstain) - Rename
neutraltoabstainto align with parliamentary voting terminology - Remove application logic that currently switches votes to
neutral/abstain - Related: Change
neutraltoabstainand update prompt to judgeyes,no, orabstain#61
Infrastructure & Observability
- Development environment - Create dedicated environment for judgement/metadata system improvements (not a general dev environment)
- Related: Development Database #59
- Prompt tracing - Implement trace tracking (recommend Langfuse)
- Logging improvements
- Replace
console.*statements withdebugpackage - Log prompts and prompt arguments in production; log everything in development
- Replace
Metadata Requirements
Define initial required metadata fields:
- Builder Friendliness (Add KPI: Builder Friendliness Score #57)
- Relevance (Add Sort for "Relevance" #56)
Separate LLM Calls
- Judgement call - System prompt + bill → vote decision + analysis
- Metadata call(s) - System prompt + bill + judgement/analsis → metadata extraction
Future Phases
Problem: Are we collecting as much data as we want for each bill?
- Additional LLM calls - Once a bill has been analyzed we may want to run additional LLM calls to collect more metadata. See Add Sort for "Relevance" #56, Add KPI: Builder Friendliness Score #57
Problem: Are our prompts as good as they can be?
- Prompt evaluations - Create eval framework for testing prompt performance
- Manual prompt refinement - Iteratively improve system prompts
- Automated prompt optimization - Explore programmatic prompt training/tuning (DSPy)
- Related ([DRAFT] Improve System Prompt #44)
Problem: Many bills are changes to existing bills. We need the context of those bills.
- Get other legislation
- Build database of other bills?
- Create LLM tool to fetch contents of other bills?
Problem: Understanding the issues related to bills is often found in Parliamentary debate and among commentary/criticism of topic experts
-
Hansard context database
-
Build database of Hansard content?
-
Create LLM tool to fetch specific Hansard content?
-
Topic Expert context - Build LLM tools to search for and fetch context of related commentary/criticism of bills from topic experts.
Metadata
Metadata
Assignees
Labels
No labels