Fix: Added check the type of tokenizer.json's model_merges#987
Fix: Added check the type of tokenizer.json's model_merges#987N-E-W-T-O-N wants to merge 9 commits intomicrosoft:mainfrom
Conversation
|
Mistral |
|
Gpt-2
|
awesome, thanks for this! could you add a unit test here as well to ensure expected functionality: https://github.com/microsoft/onnxruntime-extensions/tree/main/test? |
|
/azp run onnxruntime-extensions.CI |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Using phi-4 as an example |
|
The following Functions is only called when |
sayanshaw24
left a comment
There was a problem hiding this comment.
please use test/data/phi-4-base or test/data/phi-4-mini-reasoning, we'd like to minimize tokenizer files checked into the repo; (also, FYI onnxruntime-extensions only needs the tokenizer.json and tokenizer_config.json files to load tokenizers).
|
Commenter does not have sufficient privileges for PR 987 in repo microsoft/onnxruntime-extensions |
|
Hi @sayanshaw24 |
|
/azp run onnxruntime-extensions.CI |
|
Azure Pipelines successfully started running 1 pipeline(s). |
N-E-W-T-O-N
left a comment
There was a problem hiding this comment.
Merge the latest code .Please rerun the pipeline
|
/azp where |
|
Azure DevOps orgs getting events for this repository: |
Modern Tokenizer store merge's value as list of lists of strings. But models like GPT-2 or gemma-2 store as a list of strings
Ex Mistral
Added a check for this fix