-
Notifications
You must be signed in to change notification settings - Fork 11
WIP: Training TASTE w/o vq using llama tokenizer (text) #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
WIP: Training TASTE w/o vq using llama tokenizer (text) #4
Conversation
| make_v_proj_identity: bool = False, | ||
| is_word_level: bool = False, | ||
| skip_prefix_idx: Optional[int] = None, | ||
| vocab_size: int = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new augment new_vocab_size=None
if new_vocab_size is not None:
do your work
| postfix_token_to_wrap = [tokenizer.eos_token_id] if add_eos else [] | ||
| _skip_prefix_idx = len(prefix_token_to_wrap) | ||
| logging.info(f"Tokenizer is from transformers `WhisperTokenizerFast` of transformers. Decoder prefix ids: {forced_decoder_ids}.") | ||
| if whisper_tokenizer_name_or_fpath.endswith("Llama-3.2-1B"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
只要不是whisper都應該採用這條路徑
| prefix_token_to_wrap = [tokenizer.bos_token_id] if add_bos else [] | ||
| postfix_token_to_wrap = [tokenizer.eos_token_id] if add_eos else [] | ||
| _skip_prefix_idx = len(prefix_token_to_wrap) | ||
| logging.info(f"Using Llama tokenizer from {whisper_tokenizer_name_or_fpath}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
記得同時修
| $RTSLM_WORK_DIR/CosyVoice/cosyvoice/bin/train.py \ | ||
| --train_engine $train_engine \ | ||
| --config $conf_fpath \ | ||
| --train_data ./data/train.data.list \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libritts 資料的格式也可運作嗎?
Training TASTE w/o vq, using llama tokenizer (Llama-3.2-1B)
taste_no_vq_llama.yaml)