-
Notifications
You must be signed in to change notification settings - Fork 1
CASSIS - (C)luster (ASS)ignment by (I)slands of (S)ites
License
lu-p-us/cassis
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Copyright (C) 2015 Leibniz Institute for Natural Product Research and
Infection Biology -- Hans-Knoell-Institute (HKI).
This program comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to redistribute it under certain conditions. See the
file COPYING, which you should have received along with this program,
for details.
###########################################################
# CASSIS - (C)luster (ASS)ignment by (I)slands of (S)ites #
###########################################################
CASSIS is a tool to precisely predict secondary metabolite (SM) gene
clusters around a given anchor (or backbone) gene. Genes encoding SMs
tend to be clustered. Gene clusters are small groups of normally up to
20 genes; tightly co-localized, co-regulated, and participating in the
same metabolic pathway. The predictions are based on transcription
factor binding sites shared by promoter sequences of the putative
cluster genes.
usage: cassis.pl <parameters>
required parameters:
--annotation, -a <file>
Genome annotation file. A simple text file in tabular format with
at least five columns. These are:
gene <string> | contig <string> | start position <int> | stop
position <int> | strand <+ or->.
The column separator must be tab (\t). The annotation file must be
sorted ascending by contig, start and stop. Start positions must
be smaller than stop positions. Contig names have to coincide with
the genome sequence file.
--genome, -g <file>
Genomic sequence file. A multiFASTA file containing the DNA
sequences of all contigs of the species. Contig names have to
coincide with the annotation file.
optional parameters:
--anchor, -b <ID>
Feature ID of the clusters anchor gene. The ID has to coincide
with the annotation file. This gene will be the starting point of
the cluster prediction.
--cluster, -c <name>
Name of the gene cluster to predict. Creates a sub-directory with
the given name to separate predictions for different clusters but
same species (in the same working directory). Will be set to the
cluster's anchor gene ID, if ommitted.
--dir, -d <directory>
Working directory. All directories and files generated by CASSIS
will go in here. Choosing different directories for e.g. different
species might be a good idea.
--fimo, -f [<p-value cut-off>]
Enables the motif search with FIMO. Setting a maximum p-value,
which will restrict the number of binding sites found by FIMO, is
optional. If no p-value cut-off is specied, the default value of
0.00006 will be used. To run the motiv search, the MEME suite must
be installed on your system. See http://meme-
suite.org/doc/download.html
--frequency, -fq <frequency cut-off between 0 and 100>
By default, CASSIS will skip the cluster prediction for a certain
motif, if this motif results in a number of binding sites found in
more than 14 % of all promoters in the genome. By changing this
paramater you may set a different frequency cut-off.
--gap-length, -gl <0-5>
Sets the maximum number of promoters in a row WITHOUT binding
sites, which are allowed inside a cluster prediction. Allowed are
0, 1, 2, 3, 4, and 5. The default value is 2. Only applies if the
cluster prediction has been enabled (see parameter --prediction).
--help, -h
Show the help page, which you are currently reading.
--meme, -m [<e-value cut-off>]
Enables the motif prediction around the anchor gene with MEME.
Setting a maximum e-value for found motifs is optional. MEME will
stop, if it reaches the e-value, regardless if any motif has been
found or not. If no e-value cut-off is specified, the default
value of 1.0e+005 will be used. The cut-off -1 has a special
meaning: This means NO cut-off and MEME will not stop until it
finds at least one motif, regardless of its e-value. To run the
motiv prediction, the MEME suite must be installed on your system.
See http://meme-suite.org/doc/download.html
--mismatches, -mm <allowed mismatches>
Setting a maximum number of allowed mismatches (0-3) per binding
site sequence is optional. The default is 0. See --sitar.
--num-cpus, -n <number>
Set number of CPUs (or CPU cores) to use. The motif prediction
(parameter --meme) and the motif search (parameter --fimo) steps
will then run with multiple forks in parallel. CASSIS uses only
one CPU by default.
--prediction, -p
Enables the cluster prediction, including the motif prediction via
paramter --meme and the motif search via parameter --fimo. By
default it's disabled.
--sitar, -s <file>
Alternative to MEME and FIMO: Enables the motif search with SiTaR.
You have to provide a file in (multi) FASTA format with binding
site sequences of at least one transcription factor.
--verbose, -v
By default, only the most important information is printed. Use
this parameter if you prefer a more verbose output.
runtime (Intel Xeon, running at 2.7 GHz):
Using 2 CPUs (via --num-cpus), predicting the cluster to a given
anchor gene takes about 40 min. Using 4 CPUs, it takes about 22 min.
And using 60 CPUs, about 3 min.
command-line example:
cassis.pl --dir fungi/fumigatus/ --annotation fungi/fumigatus/A_fumigatus_Af293_version_s03-m04-r22_features.csv --genome fungi/fumigatus/A_fumigatus_Af293_version_s03-m04-r22_chromosomes.fasta --anchor Afu6g09660 --cluster Gliotoxin --meme 1.0e+005 --fimo 0.00006 -frequency 14 --prediction --gap-length 2 --num-cpus 2 --verbose
Contact: gianni.panagiotou@leibniz-hki.de, thomas.wolf@leibniz-hki.de
Cite: If you use CASSIS, please cite https://doi.org/10.1093/bioinformatics/btv713
About
CASSIS - (C)luster (ASS)ignment by (I)slands of (S)ites
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published