Skip to content

csglab/RCADEEM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RCADEEM: Recognition Code-Assisted Discovery of regulatory Elements by Expectation-Maximization

RCADEEM uses a hidden Markov model (HMM) to represent multiple, alternative DNA motifs, each corresponding to the binding preference of a zinc finger array.

Requirements

Installation

git clone https://github.com/csglab/RCADEEM.git

After cloning:

  1. You can add the line export PATH=${cloning_directory}/RCADEEM:$PATH to your .bashrc file.
  2. cd to the repository directory and run the make command.
  3. Change the value of line 91 of the RCADEEM script to the path to the executable MEME files (fasta-center and fasta-dinucleotide-shuffle). Alternatively, you can provide the path via the --meme_lib_exec_meme_dir argument.

Demo

The demo script is RCADEEM_demo.sh and the input files are in ./data/demo_CTCF/IN/. The demo run time for task RCADEEM is ~2 hours and ~10 minutes for task HEATMAP.

## Using fasta sequences directly.
RCADEEM --task RCADEEM,HEATMAP \
          --job_ID CTCF_demo_from_fasta \
          --out_dir ./data/demo_CTCF/OUT \
          --C2H2_ZFP_fasta ./data/demo_CTCF/IN/CTCF_protein_sequence.fa \
          --target_fasta ./data/demo_CTCF/IN/GSM1407629.top500summits.500bp.fasta
## Using the bed file and the genome fasta (requires --chr_sizes). 
RCADEEM --task RCADEEM,HEATMAP \
          --job_ID CTCF_demo_from_bed \
          --out_dir ./data/demo_CTCF/OUT \
          --C2H2_ZFP_fasta ./data/demo_CTCF/IN/CTCF_protein_sequence.fa \
          --target_bed ./data/demo_CTCF/IN/target_CTCF_coef_br_top_100.bed \
          --genome_fasta ${GENOME} \
          --chr_sizes ${CHR_SIZES}

You can download the demo output from: https://usegalaxy.org/u/ahcorcha/h/rcadeemdemoout

Input files

  • --C2H2_ZFP_fasta Fasta file of the C2H2-ZF protein sequence.
  • --target_fasta FASTA file with the sequences of interest, mutually exclusive with --target_bed and --genome_fasta.
  • --target_bed BED file specifying the sequences of interest, requires --genome_fasta argument.
  • --genome_fasta Fasta file of the reference genome, requires --chr_sizes.

Citation

Jolma, A., Hernandez-Corchado, A., Yang, A. W. H., Fathi, A., Laverty, K. U., Brechalov, A., Razavi, R., Albu, M., Zheng, H., Kulakovskiy, I. V., Najafabadi, H. S., & Hughes, T. R. (2024). GHT-SELEX demonstrates unexpectedly high intrinsic sequence specificity and complex DNA binding of many human transcription factors. BioRxiv, 2024.11.11.618478. https://doi.org/10.1101/2024.11.11.618478

Heatmaps are generated with the ComplexHeatmap and circlize packages. If you use them in published research, please cite:

  • Gu, Z. Complex Heatmap Visualization. iMeta 2022. or
  • Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 2016. and
  • Gu, Z. circlize implements and enhances circular visualization in R. Bioinformatics 2014.

About

Recognition Code-Assisted Discovery of regulatory Elements by Expectation Maximization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages