Skip to content

Protein function prediction with GO #36

@sfluegel05

Description

@sfluegel05

Until now, we have only used our framework for ChEBI, but in principle, it should also be applicable to other data sets and prediction tasks. One such task is the prediction of protein functions as specified by the Gene Ontology in combination with protein data from UniProtKB. As an orientation, we can use the DeepGO paper which proposes a solution for this exact task. The goal is to apply our model to the GO / UniProtKB datasets and compare the results to those of DeepGO.

Tasks

  1. Minimal dataset implementation: Build a dataset class that extracts proteins and labels from UniProtKB / GO and processes them into a dataset that can be used to train Electra (one-hot encoding of trigrams of amino acids)
  2. Model training and evaluation: Evaluate using the same metrics as DeepGO for comparing the models
  3. Additonal features and finetuning: Pretraining with additional unlabeled protein data, trained input embeddings, hyperparameters

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions