GitHub - abrarrahmanabir/LOCAS

LOCAS: Multilabel RNA Localization with Supervised Contrastive Learning

Traditional approaches to predicting mRNA subcellular localization often fail to address the complexity of multiple compartmentalization, limiting biological insights. While recent multi-label models have shown progress, challenges persist in accurately capturing intricate localization patterns. We introduce LOCAS (Localization with Supervised Contrastive Learning), a novel framework that in corporates an RNA language model to generate initial embeddings and supervised contrastive learning (SCL) to identify distinct RNA clusters based on sequence similarity. LOCAS also uses a multi-label classification head (ML-Decoder) with cross-attention, enabling accurate multi-compartment predic tions. Our contributions include: (1) the first integration of RNA language models to create a nuanced embedding space for RNA sequences, (2) an SCL approach that detects overlapping localization pat terns with a multi-label similarity threshold, and (3) a multi-label classification head tailored for RNA localization. Comprehensive experiments, including extensive ablation studies and optimized threshold tuning, confirm LOCAS achieves state-of-the-art accuracy across all metrics, setting a new standard in multi-compartment mRNA localization.

Model Architecture

Installation

Clone the repository:

git clone https://github.com/abrarrahmanabir/LOCAS.git
cd LOCAS

Install dependencies:

pip install torch torchvision tqdm matplotlib pandas numpy scikit-learn

Dataset Format

The dataset should have a column 'Sequence' that contains the RNA sequence and the other columns are the locations - 'Chromatin', 'Cytoplasm', 'Cytosol', 'Exosome', 'Membrane', 'Nucleolus','Nucleoplasm' , 'Nucleus', 'Ribosome' and they are one hot encoded.

How to Train

We give an example dataset named 'dataset.csv' and corresponding language model embeddings in 'dataset.npy'. To start the training process, execute the following command:

python train_final.py

How to Run Inference

Run the following command to perform inference with the test set of RNALocate v2.0 dataset and corresponding embeddings are saved in 'test_embeddings.npy'.

python inference_final.py

RiNALMo Embedding Generation

Run the following command to generate language model embeddings for any custom dataset. Put the dataset in place of 'dataset.csv' with the required format and it will generate .npy file with the embeddings. You can use it for training or inference. We have provided an example 'dataset.csv' and 'dataset.npy'.

python emb_gen_final.py

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Data		Data
Trained_models		Trained_models
README.md		README.md
classifier_v2.pth		classifier_v2.pth
dataset.csv		dataset.csv
dataset.npy		dataset.npy
emb_gen_final.py		emb_gen_final.py
encoder.py		encoder.py
encoder_v2.pth		encoder_v2.pth
inference.py		inference.py
inference_final.py		inference_final.py
main.py		main.py
mldecoder.py		mldecoder.py
model.jpg		model.jpg
model.pdf		model.pdf
multi_sup_con_loss.py		multi_sup_con_loss.py
overall_training.png		overall_training.png
test_embeddings.npy		test_embeddings.npy
testsetv2.csv		testsetv2.csv
train_final.py		train_final.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LOCAS: Multilabel RNA Localization with Supervised Contrastive Learning

Model Architecture

Installation

Dataset Format

How to Train

How to Run Inference

RiNALMo Embedding Generation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LOCAS: Multilabel RNA Localization with Supervised Contrastive Learning

Model Architecture

Installation

Dataset Format

How to Train

How to Run Inference

RiNALMo Embedding Generation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages