Skip to content

Finetuning process #2

@jeffe107

Description

@jeffe107

Hello,

Thank you for your great work. I am attempting to finetune ProteinBERT for another task, and I am trying to understand some of the strategies you followed.

First, why did you implement another tokenization strategy? I see that somehow included the mecanism labels during the tokenization, but if these were the "target labels", why not use them just as the "second column" of the input file (I am following the provided tutorial by ProteinBERT developers, and that's why I refer to the labels as the second column).

According to the former question, I see that the input example file has many columns and differs from the native strategy to finetune ProteinBERT, where only the sequence and the label can be provided as input. Hence, I want to know how you handle the labels, and if it is possible to use "multiple features" at the same time for finetuning the model.

Following on this, I see that the finetuning was performed six times, this means that ProteinBERT was finetuned six times for each feature?

Now, for the prediction process, how the output from the model/s (I will understand better with the answer to the previoous question) is handled? I mean the output would be "six vectors of probabilities" for each feature that was included, how this information is processed for final prediction?

Finally, during the final epoch (third stage), you mentioned that you used sequences longer than 1024 aa, so my questions are: in the previous stages you only used shorter sequences? I understand that ProteinBERT would split randomly the sequences that sequences that are longer than 1024, how did you handle then these longer sequences?

I appreciate your help and any insight about these topics.

Best regards,

Jeferyd

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions