TO RUN:

Overview

This pipeline is used to find the most abundant genes that contain motifs that may be associated with gamma ray resistance. All the motifs can be found in motif_list.txt and relevant clinical data can be found in clinical_data.txt.

This pipeline identifies the most abundunt genes and then simulates pre/post crispr knockout, using a suitable crispr site for knockout. Afterwards, this data is compiled into a report.

TO RUN:

pre-requisites

Make sure each file is executable with chmod +x filename
Make sure you have the following files in the same directory you are running the files
- clinical_data.txt
- motif_list.txt
- exomes/*.fasta
Make sure python is installed

Run in this order:

copyExomes.sh

Reads in the clinical data file and identifies the samples that have a diameter between 20 and 30 mm long (inclusive) and have had their genomes sequenced. Copy the identified exomes using the sample code names to a new directory called exomesCohort.

This is the only file that requires a parameter, run it like this:

./copyExomes.sh clinical_data.txt

createCrisprReady.sh

Using the motif_list.txt file, identifies the 3 highest occurring motifs in each exome inside the exomesCohort folder. Output the headers and corresponding sequences to a new file called {exomename}_topmotifs.fasta.

./createCrisprReady.sh

identifyCrisprSite.sh

For each gene inside the exomename_topmotifs.fasta files, this script identifies a suitable CRISPR site. Finds sequences that contain “NGG”, where “N” can be any base, that has at least 20 basepairs upstream. Example of upstream: ATGAACGTCTGTAAGAACTGCGGATCTGTCA (Everything left of CGG is upstream of the DNA) Output suitable candidates (headers and sequences) to a new file called {exomename}_precrispr.fasta

./identifyCrisprSite.sh

editGenome.sh

Using those files, this script that will insert the letter A right before the NGG site. Output to a new file called {exomename}_postcrispr.fasta. This is simulating a singular succesful crispr edit.

./editGenome.sh

exomeReport.py

This python script that will generate a single report that summarizes the findings. It is a text file that lists the name of the discoverer of the organism, the diameter, the code name, and the environment it came from. The next sentence will be where the file can be located on the server, and it prints out the first FASTA block of the file (ie just the first header and sequence).

Organism CODENAME, discovered by DISCOVERER, has a diameter of DIAMETER, and from the environment ENVIRONMENT.

The list of genes can be found in: ./some_path_crispr/codename_postcrispr.fasta

The first sequence of CODENAME is:

Gene0123

ATACGTACGGATCTATTT

python exomeReport.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

TO RUN:

pre-requisites

copyExomes.sh

createCrisprReady.sh

identifyCrisprSite.sh

editGenome.sh

exomeReport.py

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
crisprReady		crisprReady
exomes		exomes
exomesCohort		exomesCohort
postCrispr		postCrispr
preCrispr		preCrispr
README.md		README.md
clinical_data.txt		clinical_data.txt
copyExomes.sh		copyExomes.sh
createCrisprReady.sh		createCrisprReady.sh
editGenome.sh		editGenome.sh
exomeReport.py		exomeReport.py
identifyCrisprSite.sh		identifyCrisprSite.sh
motif_list.txt		motif_list.txt

Folders and files

Latest commit

History

Repository files navigation

Overview

TO RUN:

pre-requisites

copyExomes.sh

createCrisprReady.sh

identifyCrisprSite.sh

editGenome.sh

exomeReport.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages