Fishing For Plasmids
This repository contains a tool to fish plasmid contigs out of an Escherichia coli assembly. The Fishing For Plasmids tool tries to distinguish plasmid contigs from chromosome contigs and plasmid contigs from plasmid contigs, in case multiple plasmid types are present in the genome. The tool enables investigators to study plasmids based on E. coli WGS data.
Databases for the tool are hosted on figshare https://figshare.com/articles/dataset/FishingForPlasmids_databases/12735692
Installation
- Git clone repository to wanted location
- Obtain databases from Figshare and place them in the "blast_db" directory
- Create a directory named “data” in this location
- Create the following directories in the “data” directory: “assemblyDir”, “blast_EcPlGe”, “blast_pFinder”, “FFP_output”, “pmlst_out”
- Create a directory named “pMLST” in the “scripts” directory
Dependencies
Python 3.6 (or newer) blast (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs) pmlst pmlst_db
Install pmlst and pmlst_db in the “pMLST” directory, you created in step 4, and follow installation protocol (https://bitbucket.org/genomicepidemiology/pmlst/src/master/)
Usage
-
Go to your Fishing For Plasmids directory in the terminal
-
Copy your assemblies of interest in the data/assemblyDir/ directory Note: make sure all assemblies have the same file extension (.fna, .fa, or .fasta)
-
First create a config.YAML file by running:
python script/make_config_file.py
-
Then run the FishingForPlasmids script:
Python FFP.py
Note: The Snakefile is not functional yet, therefore, you have to use FFP.py