[BUG]: hash_tokenizer_config() crashes with FileNotFoundError when used with SentencePiece tokenizer

 
## Summary

`hash_tokenizer_config()` in `openverifiablellm/tokenizer.py` is hardcoded to look for BPE artifacts (`vocab.json` and `merges.txt`). Calling it after  training a SentencePiece tokenizer crashes immediately with
`FileNotFoundError` because SentencePiece produces `spm.model` and `spm.vocab` instead.

 The function claims to hash tokenizer configuration but silently only works for one of the two supported
tokenizer types.

 
 
## Root Cause

In `openverifiablellm/tokenizer.py`, `hash_tokenizer_config()` is hardcoded
for BPE artifacts only 

SentencePiece training produces completely different artifacts:
```
spm.model   ← binary model file
spm.vocab   ← human readable vocab
```

Neither `vocab.json` nor `merges.txt` is ever produced by SentencePiece,
so the function always crashes when called after SentencePiece training.
 
 
## Proposed Fix

### `openverifiablellm/tokenizer.py`

Detect tokenizer type from artifacts present on disk  

### `tests/test_tokenizer.py`

Add missing tests
 
 
## Files to Change

- `openverifiablellm/tokenizer.py` — fix `hash_tokenizer_config()`
- `tests/test_tokenizer.py` — add missing test cases

 
## Related

- PR #17 — feat: deterministic tokenizer training and config hashing
  (introduced `hash_tokenizer_config()` with BPE-only support)
- CodeRabbit review on PR #17 flagged missing `test_hash_changes_when_merges_change`
- Follows from: [FEATURE] Complete SentencePiece tokenizer implementation
- Follows from: [FEATURE] Complete BPETokenizer and BaseTokenizer contract

---
  
### Impact

Critical - Application is unusable

### Code of Conduct

- [x] I have joined the [Discord server](https://discord.gg/hjUhu33uAn) and will post updates there
- [x] I have searched existing issues to avoid duplicates

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG]: hash_tokenizer_config() crashes with FileNotFoundError when used with SentencePiece tokenizer #69

Summary

Root Cause

Proposed Fix

`openverifiablellm/tokenizer.py`

`tests/test_tokenizer.py`

Files to Change

Related

Impact

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[BUG]: hash_tokenizer_config() crashes with FileNotFoundError when used with SentencePiece tokenizer #69

Description

Summary

Root Cause

Proposed Fix

openverifiablellm/tokenizer.py

tests/test_tokenizer.py

Files to Change

Related

Impact

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`openverifiablellm/tokenizer.py`

`tests/test_tokenizer.py`