RCADEEM uses a hidden Markov model (HMM) to represent multiple, alternative DNA motifs, each corresponding to the binding preference of a zinc finger array.
-
Unix-compatible OS.
-
R libraries: assertr, cowplot, data.table, ggh4x, ggplot2, ggseqlogo, glmnet, gplots, gridExtra, matrixStats, optparse, patchwork, pROC, randomForest, readr, reshape, stringdist, stringi, stringr, circlize, and ComplexHeatmap.
git clone https://github.com/csglab/RCADEEM.gitAfter cloning:
- You can add the line
export PATH=${cloning_directory}/RCADEEM:$PATHto your.bashrcfile. cdto the repository directory and run themakecommand.- Change the value of line
91of theRCADEEMscript to the path to the executable MEME files (fasta-centerandfasta-dinucleotide-shuffle). Alternatively, you can provide the path via the--meme_lib_exec_meme_dirargument.
The demo script is RCADEEM_demo.sh and the input files are in ./data/demo_CTCF/IN/. The demo run time for task RCADEEM is ~2 hours and ~10 minutes for task HEATMAP.
## Using fasta sequences directly.
RCADEEM --task RCADEEM,HEATMAP \
--job_ID CTCF_demo_from_fasta \
--out_dir ./data/demo_CTCF/OUT \
--C2H2_ZFP_fasta ./data/demo_CTCF/IN/CTCF_protein_sequence.fa \
--target_fasta ./data/demo_CTCF/IN/GSM1407629.top500summits.500bp.fasta## Using the bed file and the genome fasta (requires --chr_sizes).
RCADEEM --task RCADEEM,HEATMAP \
--job_ID CTCF_demo_from_bed \
--out_dir ./data/demo_CTCF/OUT \
--C2H2_ZFP_fasta ./data/demo_CTCF/IN/CTCF_protein_sequence.fa \
--target_bed ./data/demo_CTCF/IN/target_CTCF_coef_br_top_100.bed \
--genome_fasta ${GENOME} \
--chr_sizes ${CHR_SIZES}You can download the demo output from: https://usegalaxy.org/u/ahcorcha/h/rcadeemdemoout
--C2H2_ZFP_fastaFasta file of the C2H2-ZF protein sequence.--target_fastaFASTA file with the sequences of interest, mutually exclusive with--target_bedand--genome_fasta.--target_bedBED file specifying the sequences of interest, requires--genome_fastaargument.--genome_fastaFasta file of the reference genome, requires--chr_sizes.
Jolma, A., Hernandez-Corchado, A., Yang, A. W. H., Fathi, A., Laverty, K. U., Brechalov, A., Razavi, R., Albu, M., Zheng, H., Kulakovskiy, I. V., Najafabadi, H. S., & Hughes, T. R. (2024). GHT-SELEX demonstrates unexpectedly high intrinsic sequence specificity and complex DNA binding of many human transcription factors. BioRxiv, 2024.11.11.618478. https://doi.org/10.1101/2024.11.11.618478
Heatmaps are generated with the ComplexHeatmap and circlize packages. If you use them in published research, please cite:
- Gu, Z. Complex Heatmap Visualization. iMeta 2022. or
- Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 2016. and
- Gu, Z. circlize implements and enhances circular visualization in R. Bioinformatics 2014.