GitHub - mabbasiazad/Protein_Language_Models

Tasks done by ESM2

ESM-2 is a pLM trained using unsupervied masked language modelling on 250 Million protein sequences by researchers at Facebook AI Research (FAIR).

It is available in several sizes, ranging from 8 Million to 15 Billion parameters. The smaller models are suitable for various sequence and token classification tasks. The FAIR team also adapted the 3 Billion parameter version into the ESMFold protein structure prediction algorithm.

In this repository, I have used the ESM2 model for the following tasks:

Masked Language Modeling

mutation_effect_enzyme calculates the mutation score introduced in the paper Language models enable zero-shot prediction of the effects of mutations on protein function. It ranks the functional impact of mutated sequences compared to the original (wild-type) sequence.
Sequence Classification
- Training a small model (model_name = facebook/esm2_t6_8M_UR50D)
  
  sequence_classification_small notebook is about predicting whether a protein is (1) an enzyme, (2) a receptor protein, or (3) a structural protein. The dataset was taken from Amelie-Schreiber.
  
  You can run this model on Google Colab.
- Training a larger model (model_name = "facebook/esm2_t33_650M_UR50D")
  
  To run this model we need more compute resources then what is available on Google Colab. So, I ran the sequence_classification_large notebook in AWS SageMaker. The GPU instance used are mentined in the notebook.
  
  The specific problem addressed in this notebook is subcellular localization: Given a protein sequence, can we build a model that can predict if it lives on the outside (cell membrane) or inside of a cell? This is an important piece of information that can help us understand the function and whether it would make a good drug target.
  
  The source of the notebook is aws-healthcare-lifescience-ai-ml-sample-notebooks

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
sequence_classification_large/finetune_esm_on_deeploc		sequence_classification_large/finetune_esm_on_deeploc
.gitignore		.gitignore
README.md		README.md
mutation_effect_enzyme.ipynb		mutation_effect_enzyme.ipynb
sequence_classification_small.ipynb		sequence_classification_small.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tasks done by ESM2

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tasks done by ESM2

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages