Add evidence spans to extraction output

## Feature Request

### Summary
Include evidence spans in extraction output to link each extracted field to its source location in the input text. This provides character offsets and text snippets showing where data was extracted from.

### Motivation
- **Verification**: Users can verify extracted data by reviewing the source text
- **Debugging**: Easier to identify why extraction succeeded or failed for specific fields
- **Transparency**: Clear provenance showing which text led to which extracted values
- **Auditing**: Critical for applications requiring traceability (legal, medical, financial documents)

### Proposed Output Format

```json
{
  "data": {
    "invoice_number": "INV-2024-001",
    "total": 1250.00,
    "date": "2024-01-15"
  },
  "evidence": {
    "invoice_number": {
      "text": "Invoice Number: INV-2024-001",
      "start": 45,
      "end": 75
    },
    "total": {
      "text": "Total Amount: ,250.00",
      "start": 234,
      "end": 257
    },
    "date": {
      "text": "Date: January 15, 2024",
      "start": 12,
      "end": 34
    }
  }
}
```

### Implementation Considerations

1. **LLM prompt changes**: Instruct model to return both extracted value and source span
2. **Validation**: Verify evidence spans are valid (within input bounds, non-overlapping)
3. **Schema extension**: Add optional `includeEvidence` flag to schema definitions
4. **Performance**: Minimal overhead since LLMs can provide this in same response
5. **Backward compatibility**: Make evidence optional to avoid breaking existing users

### Example Use Cases

- **Legal contracts**: Link extracted clauses to exact paragraph locations
- **Medical records**: Trace diagnosis codes to supporting text
- **Financial documents**: Verify amounts and dates from source
- **Quality assurance**: Automated review of extraction accuracy

### Technical Approach

Option 1: Extend LLM prompt to request evidence with each field
Option 2: Post-processing fuzzy match to find source spans (less reliable)
Option 3: Structured output format where each field includes `value` and `evidence`

**Recommendation**: Option 1 or 3 for deterministic results aligned with project principles.

### Related Work
- spaCy's entity spans with character offsets
- Information extraction systems with provenance tracking
- PDF extraction tools with bounding boxes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add evidence spans to extraction output #53

Feature Request

Summary

Motivation

Proposed Output Format

Implementation Considerations

Example Use Cases

Technical Approach

Related Work

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add evidence spans to extraction output #53

Description

Feature Request

Summary

Motivation

Proposed Output Format

Implementation Considerations

Example Use Cases

Technical Approach

Related Work

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions