Companion code for:
Zou, R.S., Marin-Gonzalez, A., Liu, Y., Liu, H.B., Shen, L., Dveirin, R., Luo, J.X., Kalhor, R. and Ha, T. Massively parallel genomic perturbations with multi-target CRISPR reveal new insights on Cas9 activity and DNA damage responses at endogenous sites. bioRxiv (2022). https://www.biorxiv.org/content/10.1101/2022.01.18.476836v1
- Anaconda Python 3.7 (Anaconda's python distribution comes with the required numpy and scipy libraries)
- pysam
- bowtie2
- samtools
- Ensure that both
samtoolsandbowtie2are added to path and can be called directly from bash
- Download sequencing reads in FASTQ format from SRA
- Download the prebuilt bowtie2 indices for human hg19 and hg38 genome assemblies
- Human hg38
- Human hg19
- Extract from archive, move to the corresponding folders named
hg38_bowtie2/andhg19_bowtie2/
- Download two human hg19 and hg38 genome assemblies in FASTA format
- Generate FASTA file indices
samtools faidx hg38_bowtie2/hg38.fasamtools faidx hg19_bowtie2/hg19.fa
Bash scripts are used to automate the processing of sequencing data.
Python scripts are used to perform analysis of various data featured in the manuscript.
They are labeled script_*_*.py, such as script_1_putative.py
List of target sites for the 'GG', 'CT', and 'TA' mgRNA sequences, along with its
hg38 genomic context, are included in mgRNA_target_sites.xlsx.