Add type coercion for LLM output before validation errors

## Summary
Attempt to parse/coerce field values to the expected type before throwing validation errors, making the system more forgiving of minor LLM formatting mistakes.

## Problem
Models sometimes return correct data but in the wrong type format. Currently this fails validation immediately:

**Example from benchmarks:**
```
mistral:7b on 02-receipt-medium:
  ⚠️  [subtotal] Field 'subtotal' must be a number, got string
  ⚠️  [tax] Field 'tax' must be a number, got string  
  ⚠️  [total] Field 'total' must be a number, got string
```

The model likely returned:
```json
{
  "subtotal": "8.75",
  "tax": "0.70",
  "total": "9.45"
}
```

Instead of:
```json
{
  "subtotal": 8.75,
  "tax": 0.70,
  "total": 9.45
}
```

This is valid data, just wrong type formatting.

## Proposed Solution
Add type coercion in `src/core/validator.ts` before validation:

### String → Number
```typescript
case 'number':
    let numValue = value;
    
    // Try to coerce string to number
    if (typeof value === 'string') {
        const parsed = parseFloat(value.trim());
        if (isValidNumber(parsed)) {
            numValue = parsed;
            // Optionally update the data object with coerced value
        }
    }
    
    if (typeof numValue !== 'number') {
        errors.push(...);
    }
```

### String → Boolean
```typescript
case 'boolean':
    let boolValue = value;
    
    // Try to coerce string to boolean
    if (typeof value === 'string') {
        const lower = value.toLowerCase().trim();
        if (lower === 'true' || lower === '1' || lower === 'yes') {
            boolValue = true;
        } else if (lower === 'false' || lower === '0' || lower === 'no') {
            boolValue = false;
        }
    }
    
    if (typeof boolValue !== 'boolean') {
        errors.push(...);
    }
```

### String → Date
Already somewhat lenient (accepts Date objects), but could be more forgiving:
```typescript
case 'date':
    // Already accepts both string and Date object
    // Could add more date format parsing (e.g., "12/15/2024" → ISO format)
```

### Number → String  
```typescript
case 'string':
    let strValue = value;
    
    // Coerce number/boolean to string
    if (typeof value === 'number' || typeof value === 'boolean') {
        strValue = String(value);
    }
    
    if (typeof strValue !== 'string') {
        errors.push(...);
    }
```

## Benefits
1. **Better model compatibility**: Different models have different JSON serialization behaviors
2. **More forgiving extraction**: Focus on data accuracy rather than formatting
3. **Reduced false negatives**: Valid data won't be rejected for minor formatting issues
4. **Still validates**: Type coercion only happens when safe/unambiguous

## Considerations
- **Data mutation**: Should coerced values update the original data object?
- **Logging**: Should we log when coercion happens for debugging?
- **Opt-in/out**: Should this be configurable per schema?
- **Confidence scoring**: Should coerced fields have lower confidence?

## Implementation
1. Add coercion logic to `validateFieldValue()` in `src/core/validator.ts`
2. Add tests for each coercion case
3. Update documentation to explain coercion behavior
4. Consider adding `strictTypes` schema option to disable coercion

## Priority
**Medium** - Would improve model compatibility and reduce false negatives, but current strict validation is also valuable

## Related
- Seen in model comparison benchmarks (mistral:7b returning string numbers)
- Related to #26 (additional data types)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add type coercion for LLM output before validation errors #28

Summary

Problem

Proposed Solution

String → Number

String → Boolean

String → Date

Number → String

Benefits

Considerations

Implementation

Priority

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add type coercion for LLM output before validation errors #28

Description

Summary

Problem

Proposed Solution

String → Number

String → Boolean

String → Date

Number → String

Benefits

Considerations

Implementation

Priority

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions