Issue
Model Config and tokenizer config mismatch
In HF model repo config.json - llm_config section:
https://huggingface.co/inclusionAI/Ming-flash-omni-2.0/blob/main/config.json#L96-L99
"image_patch_token": 157157,
"video_patch_token": 157175,
"image_start_token": 157158,
"video_start_token": 157159,
The video_start_token is 157159,
However, in the tokenizer_config.json and tokenizer.json file, the id is pointing to
|
"157159": { |
|
"content": "</image>", |
|
"lstrip": false, |
|
"normalized": false, |
|
"rstrip": false, |
|
"single_word": false, |
|
"special": true |
|
}, |
which seems to be the image end token id.
Refer to the video start token id in tokenizer config file:
|
"157160": { |
|
"content": "<video>", |
|
"lstrip": false, |
|
"normalized": false, |
|
"rstrip": false, |
|
"single_word": false, |
|
"special": true |
|
}, |
Should we update the video_start_token to 157160 in HF repo config.json?
Issue
Model Config and tokenizer config mismatch
In HF model repo config.json - llm_config section:
https://huggingface.co/inclusionAI/Ming-flash-omni-2.0/blob/main/config.json#L96-L99
The
video_start_tokenis 157159,However, in the
tokenizer_config.jsonandtokenizer.jsonfile, the id is pointing toMing/tokenizer_config.json
Lines 2149 to 2156 in 2a0c02a
which seems to be the image end token id.
Refer to the video start token id in tokenizer config file:
Ming/tokenizer_config.json
Lines 2157 to 2164 in 2a0c02a
Should we update the
video_start_tokento157160in HF repo config.json?