scCompReg (Single-Cell Comparative Regulatory analysis) is an R package that provides coupled clustering and joint embedding of scRNA-seq and scATAC-seq on one sample, and performs comparative gene regulatory analysis between two conditions.
Please check the man page via ?function (for example, ?sc_compreg) for a detailed description of the types of inputs and outputs.
- Operating System: Linux or MacOS
- R (>= 3.6.0)
- Bedtools (Linux)
- Homer (Linux)
- scCompReg first release.
Use the following command to install scCompReg R package from source code:
require(devtools)
devtools::install_github("SUwonglab/sc-compReg", ref="master", subdir="R_package")For a full example of using the scCompReg method, please refer to example.R. The necessary data have been uploaded to the data folder in this repository.
To download the data, make sure you have git lfs installed. Installation instructions can be found here: https://github.com/git-lfs/git-lfs/wiki/Installation
Next, run the following line in shell:
git lfs clone https://github.com/SUwonglab/sc-compReg.gitThe downloaded data directory will be in sc-compReg/data/. Simply set in R
path = './example_data/'
prior_data_path = './prior_data/'To run scCompReg, run the following lines in R:
library(scCompReg)
sample1 = readRDS(paste(path, 'sample1.rds', sep = ''))
sample2 = readRDS(paste(path, 'sample2.rds', sep = ''))
peak.name.intersect.dir = paste(path, 'PeakName_intersect.txt', sep='')
peak.gene.prior.dir = paste(path, 'peak_gene_prior_intersect.bed', sep='')
motif = readRDS(paste(prior_data_path, 'motif_human.rds', sep=''))
motif.file = readRDS(paste(path, 'motif_file.rds', sep=''))
compreg.output = sc_compreg(sample1$O1,
sample1$E1,
sample1$O1.idx,
sample1$E1.idx,
sample1$symbol1,
sample1$peak.name1,
sample2$O2,
sample2$E2,
sample2$O2.idx,
sample2$E2.idx,
sample2$symbol2,
sample2$peak.name2,
motif$motif.name,
motif$motif.weight,
motif$match2,
motif.file,
peak.name.intersect.dir,
peak.gene.prior.dir,
sep.char=' ')
To save the obtained output, run the lines below in R:
for (i in 1:compreg.output$n.pops) {
write.table(compreg.output$hub.tf[[i]],
paste(path, 'tf_', i, '.txt', sep=''),
row.names = F,
quote = F,
sep = '\t')
write.table(compreg.output$diff.net[[i]],
paste(path, 'diff_net_', i, '.txt', sep=''),
row.names = F,
quote = F,
sep = '\t')
}The entire scCompReg workflow consists of three mandatory steps and one optional step.
-
Download the
prior_datadirectory from github viagit clone git@github.com:SUwonglab/sc-compReg.git. -
Optional: obtaining cluster assignments from coupled nonnegative matrix factorization.
-
Preproces data for
cnmf:-
Obtain
peak.bedfile -
In
sc-compReg/preprocess_data/, run the following script:bash cnmf_process_data.sh path/to/peak.bed genome_version path/to/prior_data
where
genome_versionis one of {hg19,hg38,mm9,mm10}, andprior_datais a folder downloaded in step 1. -
Output:
- peak_gene_coupling_matrix.txt
-
After loading
peak.nameandsymbol, run the following script in R to convertpeak_gene_coupling_matrix.txttoD, the coupling matrix, using the following code in R:
D <- cnmf_load_coupling_matrix('peak_gene_coupling_matrix.txt'), peak.name, symbol)
-
-
Run
cnmfto get the cluster labels for sample 1 and sample 2. The cluster labels should be passed tosc_compregasO1.idx,E1.idx,O2.idx, andE2.idx. For an example on how to runcnmf, please refer tocnmf_example.R -
Note: It is not required to obtain cluster assignments using the coupled nonnegative matrix factorization workflow. The necessary input to
scCompRegis some consistent cluster assignments in scRNA-seq and scATAC-seq.
-
-
Process data for
scCompReg-
Obtain
peak_name1.txtandpeak_name2.txtfiles containing the peak names of sample 1 and sample 2, respectively in bed format (chr \t start \t end but ignoring the spaces in the previous text) -
In
sc-compReg/preprocess_data/, run the following script:bash sc_compreg_process_data_.sh path/to/peak_name1.txt path/to/peak_name2.txt genome_version path/to/prior_data
where
genome_versionis one of {hg19,hg38,mm9,mm10}, andprior_datais a folder downloaded in step 1. -
Output:
- PeakName_intersect.txt
- peak_gene_prior_intersect.bed
- MotifTarget.txt
-
-
Follow the tutorial on the
sc_compregfunction.- The necessary inputs to
sc_compregare- consistent cluster assignments in scRNA-seq and scATAC-seq (can be obtained from coupled nonnegative matrix factorization or obtained elsewhere)
- log2-transformed gene expression matrices of samples 1 and 2
- log2-transformed chromatin accessibility matrices of samples 1 and 2
- symbol names of samples 1 and 2
- Input
peak.name.intersect.diris the path to thePeakName_intersect.txtfile generated in step 3. - Input
peak.gene.prior.diris the path to thepeak_gene_prior_intersect.bedfile generated in step 3. - Load the corresponding motif file for human in R via
or for mouse,
motif = readRDS('prior_data/motif_human.rds')
motif = readRDS('prior_data/motif_mouse.rds')
- Load
motif.filein R viawheremotif.file = mfbs_load(motif.target.dir)
motif.target.diris the path to theMotifTarget.txtfile generated in step 3.
- The necessary inputs to
scCompReg provides access to the following functions:
| Command | Description |
|---|---|
| sc_compreg | Performs single-cell comparative regulatory analysis based on scRNA-seq and scATAC-seq data from two different conditions. |
| mfbs_load | Efficiently loads the motif_target file and returns an R list of the loaded objects. |
[1] Sc-compReg enables the comparison of gene regulatory networks between conditions using single-cell data Zhana Duren, Wenhui Sophia Lu, Joseph G. Arthur, Preyas Shah, Jingxue Xin, Francesca Meschi, Miranda Lin Li, Corey M. Nemec, Yifeng Yin, and Wing Hung Wong

