Skip to content

请问您处理后的DeepCom数据集的训练集验证集与测试集分别包含多少例子呢? #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
walt676 opened this issue Nov 4, 2021 · 2 comments

Comments

@walt676
Copy link

walt676 commented Nov 4, 2021

感谢您分享您的代码!
我注意到Deepcom中测试集有20000个样例,您预处理后的测试集中有13238个样例,是否一些样例在数据集生成时被筛选掉了?

@walt676 walt676 closed this as completed Nov 4, 2021
@walt676 walt676 reopened this Nov 4, 2021
@ZhichaoOuyang
Copy link
Collaborator

感谢您分享您的代码! 我注意到Deepcom中测试集有20000个样例,您预处理后的测试集中有13238个样例,是否一些样例在数据集生成时被筛选掉了?

可以看下论文里有说把测试集里面与训练集有重复的样例做了过滤,并且一些无法生成小ast的样例也做了过滤。

@walt676
Copy link
Author

walt676 commented Nov 11, 2021

@ZhichaoOuyang 感谢回复!我刚注意到我似乎没有用到‘Tree-LSTM_pretrain’ 里的内容进行预训练,而是直接在'BASTS'子文件夹中进行预处理与训练过程,请问预训练好的文件是否包含在BASTS子文件夹下,还是我这样做使得性能降低了呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants