-
Notifications
You must be signed in to change notification settings - Fork 6
Closed
Description
Until now, we have only used our framework for ChEBI, but in principle, it should also be applicable to other data sets and prediction tasks. One such task is the prediction of protein functions as specified by the Gene Ontology in combination with protein data from UniProtKB. As an orientation, we can use the DeepGO paper which proposes a solution for this exact task. The goal is to apply our model to the GO / UniProtKB datasets and compare the results to those of DeepGO.
Tasks
- Minimal dataset implementation: Build a dataset class that extracts proteins and labels from UniProtKB / GO and processes them into a dataset that can be used to train Electra (one-hot encoding of trigrams of amino acids)
- Model training and evaluation: Evaluate using the same metrics as DeepGO for comparing the models
- Additonal features and finetuning: Pretraining with additional unlabeled protein data, trained input embeddings, hyperparameters
Metadata
Metadata
Assignees
Labels
No labels