This pipeline aligns input query sequences of genomic features on input genomes and produces tables and plot with query sequences number in genomes.
- BLAST output files with default filtering parameters values
- BED files filtered with all combinations of alignment length and BLAST e-value parameters values (0.01, 1e-15 and 0.75, 0.90, 0.95 respectively)
- Summary TSV tables for each combination of parameters values where rows are query sequences, columns are scaffolds/contigs, and values in cells are number of specific query sequence in specific scaffold/contig.
- Heat map plot in SVG format were rows are input genomes, columns are input query sequences, and values in cells are number of specific query sequence in whoe genome.
- Unix OS
This pipeline was tested with Ubuntu 20.04.3 LTS. - NCBI BLAST command line tool
This pipeline was tested with version 2.13.0+. NCBI blast must be added to PATH variable. - Python
This pipeline was tested with version 3.10.10. - Biopython Python module
This pipeline was tested with version 1.79. - Matplotlib Python module
This pipeline was tested with version 3.7.1. - NumPy Python module
This pipeline was tested with version 1.24.3. - pandas Python module
This pipeline was tested with version 1.5.3. - seaborn Python module
This pipeline was tested with version 0.12.2.
Download and inflate archive with pipeline via GitHub GUI or if you have Git installed paste the following command in shell:
git clone https://github.com/sofya-d/QBLAST- Path to directory with genome FASTA files. Files must have '.fasta' extension
- Path to directory where BLAST databases will be written
- Path to directory with query sequence FASTA files. Files must have '.fasta' extension
- Path to directory with script files of this pipeline
- Path to output directory
./1_run_blast.sh ./genomes ./databases ./queries ./ ./output