Skip to content

预测全是乱码 #3

@F-JH

Description

@F-JH

应该是tokenizer的问题,目前是直接使用GPT的tokenizer,在huggingface上找了个叫【chat-DialoGPT-small-zh】的pre-train,clone下来后直接使用它的tokenizer,先训着吧,要是到后面还是乱码,我会考虑换一个分词器(比如jieba?),哎那到时候要重新预处理数据了。哦对了,第一版的预处理数据先不放上来,后面训练出来OK再放

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions