-
Notifications
You must be signed in to change notification settings - Fork 2
The parameter file
RaptRanker input parameters with a json file. Template file is here.
The basic rule of json file is following. Be careful that some { }, [], ,, : and "" are needed for json format.
{
"key":value,
"key":"text",
"key_as_list":[
{
"key_1":value,
"key_1":"text"
},
{
"key_2":value,
"key_2":"text"
}
]
}
Set the FASTA(1) or FASTQ(2) format. For example, if you input .fastq files you would write as "file_type":2,.
Note that in both cases, the input file must not contain any base other than "ACG(T/U)", such as "N". The input FASTA/FASTQ files are expected to be quality filtered.
The number of input files.
The information about each input file. round_id,round_name and file_path are need for each input file.
-
round_idis the numeric ID for each input file. We recommend use the SELEX round number. -
round_nameis the text for each input file. This text will be used in score name (ex."round_name":"foo1R"-> Frequency_foo1R, Enrichment_foo1R, ....). -
file_pathis the full-path for the input file.
The following is an example when input three FASTQ files (5R--7R).
"file_type":2,
"input_file_nums":3,
"input_file_list":[
{
"round_id":5,
"round_name":"Round5",
"file_path":"/path/to/5R.fastq"
},
{
"round_id":6,
"round_name":"Round6",
"file_path":"/path/to/6R.fastq"
},
{
"round_id":7,
"round_name":"Round7",
"file_path":"/path/to/7R.fastq"
}
],
...
Please note that "," needs at after "}" in case of there is next input file, and "," DO NOT need between "}" and "]" (in the case of the last input file).
The experiment_dbfile is the one of RaptRanker's outputs. This is a sqlite3 database file which records all unique sequences, thier secondary structure, Frequecy and Enrichment score. These records are identical unless the filtering parameters are changed. So it's like, "output for one inputted SELEX experiment".
Please set this parameter in the form of /full/path/to/filename.sqlite3.
The analysis_dbfile is the another RaptRanker's outputs. This is a sqlite3 database file which records all subsequences, clustering results, and so on. These records are identical unless the clustering parameters are changed. So it's like, "output for one RaptRanker clustering". Please set analysis_dbfile in the form of /full/path/to/output/filename.sqlite3.
The analysis_output_path is the output path for intermediate files and score CSV. Please set analysis_output_path as /full/path/to/output/. We recommend use same directory for analysis_output_path and the path to analysis_dbfile. This parameter must end with "/".
For example, once you have run an analysis in RaptRanker, if you want to see the results in different clustering parameters (window size, threshold, etc), you should change only the analysis_dbfile and analysis_output_path. In this case, RaptRanker use the previous secondary structure prediction results, and reduce the time to run. Also, if you add a new round, you don't have to change the experiment_dbfile. RaptRanker input new sequences and analyze them additionaly. (This is determined by the input_file_list information.) In the same SELEX analysis, you need to change the experiment_dbfile only when you change the filtering parameter.
The nucleotide sequences for filtering. RaptRanker extracts only the sequences whose both fix regions is the same as these. In some cases, the forward fix sequence include T7-promoter, barcode sequences, and so on. If they are inculded, please include them in forward_primer. Please note that they are "a primer-binding region" (they are in sequenced sequences), it is not "a primer sequence".
The nucleotide sequences for secondary structure prediction. Please note that they are "a primer-binding region" (they are in sequenced sequences), it is not "a primer sequence".
For example, if the template sequence is TAATACGACTCACTATA-GGGAGCAGGAGAGAGGTCAGATG-30N-CCTATGCGTGCTAGTGTGA (T7 promoter - forward primer binding region - random region - reverse primer binding region), the parameters should be set as following.
"forward_primer":"TAATACGACTCACTATAGGGAGCAGGAGAGAGGTCAGATG",
"reverse_primer":"CCTATGCGTGCTAGTGTGA",
"add_forward_primer":"GGGAGCAGGAGAGAGGTCAGATG",
"add_reverse_primer":"CCTATGCGTGCTAGTGTGA",
The upper/lower limits of the length of random region. RaptRanker extracts only the random regions whose lengths L are sequence_minimum_length ≤ L ≤ sequence_maximum_length. We usually use +-5nt from the design (ex. 30N -> "sequence_maximum_length":35, and "sequence_minimum_length":25,).
These parameters affect clustering and AME calculations. If you are not sure, we recommend that you leave the default.
This is the length of subsequences. Default is 10.
This parameter representing the weight of the sequence information in clustering. If this value is large, the sequence information becomes more emphasized than the secondary structure information, and therefore more strict matching of the nucleotide sequences between entries is required to be clustered together. Default is 0.5.
This is a parameter for SketchSort; an upper limit of cosine distance between vectors. Default is 0.001.
This is a parameter for SketchSort; an upper limit of the expectation value for false negatives. Default is 0.00001.
These parameters are not required. Use as needed.
This parameter is obsolete. Please use parameters listed below.
This parameter needs bool value (true or false). If this parameter is true, RaptRanker calculates each score. Default is false
The add_binding needs bool value (true or false). If this parameter is true, RaptRanker input binding flags from the binding_file_path. The binding_file is a CSV file which have sequence (random region only) and flag (TRUE:1 or FALSE:0) as following,
ATCAGTCAGTTGCA,0
ACTGATCGCACACA,1
ACAGTCAAAACACC,1
...