YSEQ Y-DNA Analysis Pipeline

Version

Current Version: 0.0.0

Overview

The YSEQ Y-DNA Analysis Pipeline is designed to process paired ended fastq.gz files and extract the Y- and MT-DNA Haplogroups.

Dependencies

Python 3.6+
Snakemake
conda
pandas
Path
bcftools
samtools
samtools
tabix

Installation

Clone the repository:

git clone https://github.com/morm24/yseq-pipeline.git
cd yseq-pipeline

Install Python:
```
apt install python3
```
Install snakemake:
```
apt install snakemake
```

Install conda & mamba:

Install and set up miniconda

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
conda config --set channel_priority strict

Install mamba:

conda install -n base -c conda-forge mamba

Install pandas & Path '''sh pip install Pandas Path '''

Usage

Prepare the input files:

First Methode

Add your sample

Place your samples into the resources/sample/{SampleID} folder. The Filename name has to be: {SampleID}_R1.fastq.gz if paried end, the second files name must be {SampleID}_R2.fastq.gz

Add the reference sequence

Place the chosen reference sequence into resources/refseq/{ref}/{ref}.fa. The Name of the foldername and the fasta seqence have to be the same.

add both to the samples.csv

Add all SampleIDs to the file config/samples.csv. Separated with a "," state the name of the reference (file name without ending). Example:
```
ID,REF
63819,hs1
```
configurate the settings

Chose what type of reads you use, the mapping software and result directory in config/config.yaml

Second Methode

Add your resource folders

Open config/config.yaml. Change the Sample, Reference and Results folders. Chose the read type, and mapping software.

Add your sample

Add all SampleIDs to the file config/samples.csv. Separated with a "," note the reference sequence, you want to map and analyze the sample to. Example:
```
ID,REF
63819,hs1
```
configurate the settings
Run the pipeline: To just run the Pipeline, type the following command with the amount of cores it should use:
```
snakemake -pfc {cores}
```
Check the output: TBD

Reference Sequences

Common reference sequences and their Download links are:

hg38: Download hg38

wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz

hg19: Download hg19

wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz

hs1: Download hs1

wget https://hgdownload.soe.ucsc.edu/goldenPath/hs1/bigZips/hs1.fa.gz

YFull YTree Updates TO DO: change to free version

To download the latest YFull YTree updates, follow these steps:

Visit the YFull YTree website.
Download the latest JSON file from the resources section.
Place the downloaded JSON file in the resources/tree directory and rename it to latest_YFull_YTree.json.

Example

When set up corectly, and placed the hs1 sequence into resources/refseq/hs1/hs1.fa the following example workflow should run without errors and should take about 5 minutes to finish (faster with more cores): snakemake -pc 1

License

This project is licensed under the TBD License. See the LICENSE file for details.

Contact

For any questions or issues, please open an issue on the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
config		config
resources		resources
workflow		workflow
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
questions.txt		questions.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YSEQ Y-DNA Analysis Pipeline

Version

Overview

Dependencies

Installation

Install and set up miniconda

Install mamba:

Usage

First Methode

Add your sample

Add the reference sequence

add both to the samples.csv

configurate the settings

Second Methode

Add your resource folders

Add your sample

configurate the settings

Reference Sequences

YFull YTree Updates TO DO: change to free version

Example

License

Contact

About

Uh oh!

Releases

Packages

Languages

License

morm24/yseq-pipeline

Folders and files

Latest commit

History

Repository files navigation

YSEQ Y-DNA Analysis Pipeline

Version

Overview

Dependencies

Installation

Install and set up miniconda

Install mamba:

Usage

First Methode

Add your sample

Add the reference sequence

add both to the samples.csv

configurate the settings

Second Methode

Add your resource folders

Add your sample

configurate the settings

Reference Sequences

YFull YTree Updates TO DO: change to free version

Example

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages