Current Version: 0.0.0
The YSEQ Y-DNA Analysis Pipeline is designed to process paired ended fastq.gz files and extract the Y- and MT-DNA Haplogroups.
-
Python 3.6+
-
Snakemake
-
conda
-
pandas
-
Path
-
bcftools
-
samtools
-
samtools
-
tabix
-
Clone the repository:
git clone https://github.com/morm24/yseq-pipeline.git cd yseq-pipeline -
Install Python:
apt install python3
-
Install snakemake:
apt install snakemake
-
Install conda & mamba:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 conda config --set channel_priority strict
conda install -n base -c conda-forge mamba
-
Install pandas & Path '''sh pip install Pandas Path '''
-
Prepare the input files:
Place your samples into the
resources/sample/{SampleID}folder. The Filename name has to be:{SampleID}_R1.fastq.gzif paried end, the second files name must be{SampleID}_R2.fastq.gzPlace the chosen reference sequence into
resources/refseq/{ref}/{ref}.fa. The Name of the foldername and the fasta seqence have to be the same.Add all SampleIDs to the file
config/samples.csv. Separated with a "," state the name of the reference (file name without ending). Example:ID,REF 63819,hs1Chose what type of reads you use, the mapping software and result directory in
config/config.yamlOpen
config/config.yaml. Change the Sample, Reference and Results folders. Chose the read type, and mapping software.Add all SampleIDs to the file
config/samples.csv. Separated with a "," note the reference sequence, you want to map and analyze the sample to. Example:ID,REF 63819,hs1 -
Run the pipeline: To just run the Pipeline, type the following command with the amount of cores it should use:
snakemake -pfc {cores} -
Check the output: TBD
Common reference sequences and their Download links are:
- hg38: Download hg38
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz- hg19: Download hg19
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz- hs1: Download hs1
wget https://hgdownload.soe.ucsc.edu/goldenPath/hs1/bigZips/hs1.fa.gzTo download the latest YFull YTree updates, follow these steps:
- Visit the YFull YTree website.
- Download the latest JSON file from the resources section.
- Place the downloaded JSON file in the
resources/treedirectory and rename it tolatest_YFull_YTree.json.
When set up corectly, and placed the hs1 sequence into resources/refseq/hs1/hs1.fa the following example workflow should run without errors and should take about 5 minutes to finish (faster with more cores):
snakemake -pc 1
This project is licensed under the TBD License. See the LICENSE file for details.
For any questions or issues, please open an issue on the GitHub repository.