Currently, only vpf-class is implemented, but we have plans to include more tools
in this framework.
vpf-class attemps to classify viruses using Viral Protein Families.
Usage example: Given a .fna file, obtain the proteins of each virus with
prodigal, then perform a hmmsearch against the given hmms (VPFs) file to
obtain a classification.
stack exec -- vpf-class --data-index ../data/index.yaml -i ../data/test.fna -o test-classifiedThis will output a directory with a .tsv file for each specified classification
level in the index.yaml file. Using the provided files, one thus obtains:
test-classified/baltimore.tsvtest-classified/family.tsvtest-classified/genus.tsvtest-classified/host_domain.tsvtest-classified/host_family.tsvtest-classified/host_genus.tsv
Please read to the end to find out where to obtain all the required data files.
Concurrency options can be specified with --workers (number of
parallel workers running prodigal or hmmsearch) and --chunk-size (max
number of genomes for each prodigal/hmmsearch process).
Since there are still no release binaries available, you will need to install
stack and compile vpf-tools yourself. The instructions
are the same for both Mac OS and Linux, the tool has not been tested on
Windows.
First, install stack using
curl -sSL https://get.haskellstack.org/ | shThen run
git clone https://github.com/biocom-uib/vpf-tools
cd vpf-tools
stack buildto clone the repository and compile all targets. The first time this can take a
while as stack also needs to install GHC and compile all the dependencies.
Once it has finished, you should be able to run any of the tools from this
directory by prefixing them with stack exec --, for instance,
stack exec -- vpf-class --helpThere is experimental support for OpenMPI. Add --flag vpf-class:+mpi when
building and then run the tool normally as any other program with mpirun.
You can find our classification of VPFs either as a
compressed package (including index.yaml)
here.
Alternatively, you can download individual data files here, at the "VPF
classification" tab. The data files that vpf-class requires are in the rows
"Full data" (modelClassesFile) and "UViG Score samples" (scoreSamplesFile).
This VPF classification has been obtained as described in the paper, but the
tool is designed to work with any user-provided classification files.
The most recent hmms file containing the HMMER models of VPFs (vpfsFile in
data-index.yml) can be downloaded from
IMG/VR. To use it with the
provided index.yaml, extract final_list.hmms into the data directory,
next to index.yaml.
NOTE: To work around these issues and for user convenience, we plan to provide a Dockerfile in future releases.
-
The first step (
curl -sSL https://get.haskellstack.org/ | sh) requires root access: The default configuration in the Stack installer uses/usr/local/as the default prefix. Stack can also be installed in$HOME/.local/following their manual installation method. -
Stack build reports errors either while installing GHC or downloading package indices: If you have any issues during the installation, please check out the Stack documentation to verify that all dependencies are satisfied.
-
I have issues with conda: Some users have reported issues with Stack and Conda. Thus, installing it in a Conda-polluted environment is discouraged and unsupported.