Skip to content

richardslab/EXWAS_pipeline

Repository files navigation

EXWAS_pipeline

Cite with Zenodo Nextflow run with conda

This documentations is split into multiple sections:

Getting started

nextflow run \
  ./main.nf \ 
  -c ./nextflow_template.config \
  -profile conda

Warning

Please make sure information in these configuration files are correct and the paths to proj_config_template.yml is specified in nextflow_template.config

If running with conda environment, make sure conda environment file obtained from our github repo is specified in nextflow_template.config

Please ensure VEP apptainer definition file obtained from our github repo is specified in nextflow_template.config

File output descriptions

  • Output files are 'published' to the output directory specified in the nextflow_template.config using Hard link as described here

    • This can be verified by inspecting the inodes number.
    • The hashes are in the trace located here
<launch_dir>/pipeline_execution_information/
ls -i <output_dir>/output_file
ls -i <launch_dir>/work/<dir_hash>/<dir_hash>/output_file
  • Consequently, to completely delete the files, must delete both references to the file (one in the output directory and one in the 'work' directory)

Pipeline descriptions:

This is a set of python/nextflow scripts that is used to run Gene-based burden testing using Regenie. This pipeline is divided into 2 parts.

Part 1: This is to annotated all the variants using VEP with plugins specified by the users and generate input files to run Regenie. This includes annotation files, mask files, and the setlist files required.

The apptainer VEP image can be created using the definition files provided. Default is version 105.

Part 2: This runs Regenie step 1 and step 2 with user defined parameters. Step 1 expects 1 set of plink files of genotyped variants. This step is performed once. Step 2 is per analysis (as specified by the user in the configuration files)

program requirements (paths to be specified in proj_config_template.yml):

  • nextflow >= 23.10.0
  • python 3.10.9
  • SQLITE3 3.40.1
  • Other required programs (plink, tabix, etc) are listed in proj_config_template.yml
  • Python packages required will be specified via the CONDA environment

General Notes

  • Making conda environment on first run will take time. As long as the conda cache is not deleted, the environment will not be made again.
  • For all steps in the pipeline, as long as the log files are not deleted, the steps will be skipped.
    • Notes this also means that if the log files are moved from default location or renamed, the steps will be re-ran and files will be overwritten.
  • Remember to set this flag to bcftools_param_set_id 0 to use the original IDs in the VCF file.
    • if set to 1, then the annotation files will be using the modified SNP id that is chr:pos:ref:alt after left alignment. Needs to take extra step to ensure this modified ID matches your ExWAS input data if using this flag.

Citations

If you use Exwas_pipeline for your analysis, please cite it using the following doi: doi

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •