Improve Builder MP Vote Judgements & Metadata

## Current System

**Example Full system prompt:**
https://gist.github.com/sprice/a915695ade4ebe7c05a4d9cfb25e9957

```bash
# copy bill to clipboard
npm run prompt -- C-244
```

**System prompt in codebase:**
https://github.com/BuildCanada/BillsTracker/blob/main/src/prompt/summary-and-vote-prompt.ts

## Goals

1. **Quality & Correctness** - Produce LLM judgements and assessments that meet or exceed our standard of quality and correctness
1. **Separation of concerns** - Decouple application UI/business logic from LLM system
2. **Improved prompt engineering** - Establish foundation for objective/systematic prompt improvements
3. **Enhanced metadata** - Provide valuable contextual information for each bill and judgement

## Phase 0

### Tighten up code-formatting/CI
- **Related:** #58

### Move Judgement Logic to LLM
- Migrate all voting decision logic to the system prompt (LLM outputs: `yes`, `no`, or `abstain`)
- Rename `neutral` to `abstain` to align with parliamentary voting terminology
- Remove application logic that currently switches votes to `neutral/abstain`
- **Related:** #61

### Infrastructure & Observability
- **Development environment** - Create dedicated environment for judgement/metadata system improvements (not a general dev environment)
  - **Related:** #59
- **Prompt tracing** - Implement trace tracking (recommend [Langfuse](https://langfuse.com))
- **Logging improvements**
  - Replace `console.*` statements with [`debug`](https://www.npmjs.com/package/debug) package
  - Log prompts and prompt arguments in production; log everything in development

### Metadata Requirements
Define initial required metadata fields:
- **Builder Friendliness** (#57)
- **Relevance** (#56)

### Separate LLM Calls
- **Judgement call** - System prompt + bill → vote decision + analysis
- **Metadata call(s)** - System prompt + bill + judgement/analsis → metadata extraction

## Future Phases

### Problem: Are we collecting as much data as we want for each bill?

- **Additional LLM calls** - Once a bill has been analyzed we may want to run additional LLM calls to collect more metadata. See #56, #57

### Problem: Are our prompts as good as they can be?

- **Prompt evaluations** - Create eval framework for testing prompt performance
- **Manual prompt refinement** - Iteratively improve system prompts
- **Automated prompt optimization** - Explore programmatic prompt training/tuning (DSPy)
  - **Related** (#44)

### Problem: Many bills are changes to existing bills. We need the context of those bills.

- **Get other legislation** 
  - Build database of other bills?
  - Create LLM tool to fetch contents of other bills?

### Problem: Understanding the issues related to bills is often found in Parliamentary debate and among commentary/criticism of topic experts

- **Hansard context database**
 - Build database of Hansard content?
 - Create LLM tool to fetch specific Hansard content?

- **Topic Expert context** - Build LLM tools to search for and fetch context of related commentary/criticism of bills from topic experts.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Builder MP Vote Judgements & Metadata #62

Current System

Goals

Phase 0

Tighten up code-formatting/CI

Move Judgement Logic to LLM

Infrastructure & Observability

Metadata Requirements

Separate LLM Calls

Future Phases

Problem: Are we collecting as much data as we want for each bill?

Problem: Are our prompts as good as they can be?

Problem: Many bills are changes to existing bills. We need the context of those bills.

Problem: Understanding the issues related to bills is often found in Parliamentary debate and among commentary/criticism of topic experts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve Builder MP Vote Judgements & Metadata #62

Description

Current System

Goals

Phase 0

Tighten up code-formatting/CI

Move Judgement Logic to LLM

Infrastructure & Observability

Metadata Requirements

Separate LLM Calls

Future Phases

Problem: Are we collecting as much data as we want for each bill?

Problem: Are our prompts as good as they can be?

Problem: Many bills are changes to existing bills. We need the context of those bills.

Problem: Understanding the issues related to bills is often found in Parliamentary debate and among commentary/criticism of topic experts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions