NextVir

As of 9/30, this repository has been made public for the submission of our manuscript: NextVir: Enabling Classification of Tumor-Causing Viruses with Genomic Foundation Models for publication.

Requirements

Please use conda to install the main requirements. You may need to change the version of pytorch and/or HuggingFace depending on your hardware. All experiments were performed without triton, which may install automatically.

All experiments were performed using AMD Vega 20 GPUs.

The primary dataset for the experiments in the manuscript are available at https://zenodo.org/records/13922061. The 'disjoint' set refers to the context-supported dataset which seperated the reads in the train/test/val based on their location in the reference genome.

Main

main.py is the entry point for training and testing models. In it's default configuration, main.py will train a NextVir-D model on multiclass icav data. The dataset will be available on zenodo shortly, but is a constructed set of 150bp reads and modified labels from 7 known oncoviruses and the human reference genome.

While documentation and reporting is still in progress, please see the full set of command-line arguments can be seen in utils/train_utils.py, in the parse_args() function.

Training was performed using DataParalel. To use a single gpu, pass --device=0 to main, for multiple, use --device=0,1,2,3.

To train with the dataset provided, place the three '150bp_multiviral_XXX.fa' in the the /data directory and run "main.py" with the correct "--device" argument for your hardware. For the binary detection problem, pass --num_classes=1 as well.

Logging is enabled via weights & biases. If you do not have a wandb account, feel free to comment out all references to this package.

Trained versions of various models in the paper will be available soon, along with an inference script for detecting oncoviral DNA with an already trained model.

Acknowledgments

The LoRA implementation is based on Hayden Prarie's CLIP finetuning work here: https://github.com/Hprairie/finetune-clip.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
figures		figures
utils		utils
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
main.py		main.py
main_removal.py		main_removal.py
mutate_reads.py		mutate_reads.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NextVir

Requirements

Main

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NextVir

Requirements

Main

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages