Skip to content

chenhcs/DIFFUSE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DIFFUSE

DIFFUSE is a deep learning based method for predicting isoform functions by integrating the data of isoform sequences, domains and expression profiles. This is an instruction of predicting isoform functions using DIFFUSE.

Predicted Functions

  • Predicted functions for all the 39,375 isoforms on 4,184 GO terms are saved in a text file. Redundancy in the GO predictions are removed. Considering the predicted functions of an isoform, all GO terms that have a child GO term assigned to the same isoform are discarded.

Dependencies

Set the backend of Keras as TensorFlow by modifying the configuration file.

Data Preparation

  • Download complete data from the link, unzip data.zip to the data/ folder.
  • Preprocessing code for domain, sequence and expression data are provided in the preprocessing directory, you can use them to process your own data.
  • Dataset#2 and Dataset#3 used in the performance comparison section of the paper can be downloaded from the link.

Get Started

Test pre-trained models

  • Pre-trained models for several GO terms are provided in the saved_models directory.
  • Run the script ./codes/demo.sh to generate predictions for the test data. You can change the GO term in this script to another one with a pre-trained model.
  • Performance in terms of AUC and AUPRC will be reported. The predictions are saved in the results directory. The first column in the file shows gene IDs, the second column shows isoform IDs and the third column shows prediction scores indicating how likely the corresponding isoforms have the GO term.

Train new models

  • Run the script ./src/train_new_model.sh for training new models. You can change the GO term index in the script to train models for different GO terms appearing in the GO term lists.

ParaDIFFUSE

  • You can train DIFFUSE models parallelly with multiple GPUs. Run ./para_src/para_train.sh to train models parallelly for mutiple GO terms. Training time can be recorded by running the script ./para_src/time.sh.

Citation

If you find DIFFUSE is useful for your research, please consider citing the following paper:

@article{chen2019diffuse,
  title={DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning},
  author={Chen, Hao and Shaw, Dipan and Zeng, Jianyang and Bu, Dongbo and Jiang, Tao},
  journal={Bioinformatics},
  volume={35},
  number={14},
  pages={i284--i294},
  year={2019},
  publisher={Oxford University Press}
}

About

Deep learning based prediction of IsoForm FUnctions from Sequences and Expression

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors