-
Notifications
You must be signed in to change notification settings - Fork 2
Usage Instructions
Follow these instructions to run the ATAC-seq QC analysis pipeline on your own data on your own computer.
Note for Windows users: Install git for windows and wherever these instructions say to run a command in your "command terminal" run that command in your "Git Bash" shell.
- Install docker and/or verify that it's working
- Carefully follow all the instructions here for your operating system: https://docs.docker.com/installation/
- Do not proceed past this step until you have verified that docker is working for you. To test if docker works run the following command in your terminal. Continue to the next step only if the following command completes without any errors:
docker run hello-world
-
Clone this repository to your computer
In your command terminal, run:
git clone https://github.com/greysAcademicCode/docker-pipelines.git
- Later on, if changes are made to this repository, you can update the files on your computer by running
docker pullin yourdocker-pipelinesfolder. You'll now have adocker-pipelinesfolder on your computer. You should see a few files and folders in there includingREADME.md.
-
Get the bowtie2 indices
In your command terminal run:
cd docker-pipelines
bash miscScripts/getBT2Index.sh
This will download ~15GB of bowtie2 index files for mm10, mm9, hg19 and hg38 and only needs to be done once.
- Alternatively, you can trade this large download for another large download and a bunch of CPU time if you want to re-build the indices yourself by running
miscScripts/makeBT2Index.sh. This requires that you have bowtie2-build in yourPATHand optionallyudrif you want super fast downloads, if you don't want to bother with gettingudr, then edit that script so thatSUPER_FAST_DOWNLOAD=false.
-
Get the latest Docker image
In your command terminal, run:
docker pull greyson/pipelines
If you get errors here, note the following: Access to this container is restricted on the request of Anshul Kundaje (it contains some of his group's code). To get access, you must do the following in order:
- Create a docker user account
- Email grey@christoforo.net a message containing your docker user name and that you'd like access to
greyson/pipelines - Sign in as your user on the command line by running
docker login- This login is remembered for the future so it must be done only once per computer you wish to authorize
-
(Optional) Test the pipeline
In your terminal, make sure you're in the docker-pipelines directory from above and run:
bash runATACPipeline.sh
This will run the pipeline on some (very small) mouse and human example data I've included in the support file package. After waiting a bit, the result files should appear in a newly created folder called ATACPipeOutput. All of the pipeline analysis files will end up here. Look for the summary report in a file that ends with report.pdf
-
Analyze your data
If the above test worked out alright, you're ready to do the analysis on your own fastq data files. This is the same process as running the test above except that you must first name your data files in a special way and put them in the proper folder structure for the pipeline to find them:
In the docker-pipelines directory from above, a folder calledinputDatawas created when you cloned the repo. This folder containsmouseandhumansubfolders which hold the data to be processed for that species. Inside the these species folders are data folders (named anything) that contain two fastq input data files. You can put as many data folders you like into the species folders, they will all be processed. Each of the two fastq files you put into the data folders must uniquely match the naming patterns*R1*fastq*and*R2*fastq*. So an example would be putting your two fastq input data files in a folder structure like this:
inputData/mouse/trialA/billyTheMouse_R1_brain.fastq.gz
inputData/mouse/trialA/billyTheMouse_R2_brain.fastq.gz
Then you simply run the wrapper script:
bash runATACPipeline.sh
and all of the data (for both mouse and human) you put into inputData will be processed sequentially with outputs appearing in ATACPipeOutput for each data folder you added. Look for the QC report files ending in report.pdf in a newly created folder called ATACPipeOutput/reports.