TERate is a computational pipeline to measure transcription elongation rates (TERs) with 4sUDRB-Seq.
Measure transcription elongation rates (TERs) with 4sUDRB-Seq.
Different time points should be calculated separately.
BAM file was originally mapped form TopHat, Bowtie, Bowtie2 or BWA.
##Prerequisites
###Software / Package
To obtain average reads (Hits) distribution of 4sUDRB-Seq, BAM format file was converted to bedgraph format file firstly. Following BAM format file was illustrated by the case of TopHat (v2.0.9) results of 4sUDRB-Seq 10 minute sample. All C++ and shell scripts were marked 'bold italic'.
-
- Please add the TERate directory to your $PATH first or copy all scripts to your current work directory ('TERate_output/').
export PATH="~/TERate-master/:$PATH";
or copy all scripts to your current work directory ('TERate_output/')
mkdir TERate_output
cd TERate_output/
cp ~/TERate-master/bam2bedgraph ../TERate_output/
cp ~/TERate-master/gene_to_window ../TERate_output/
cp ~/TERate-master/split_bedgraph.sh ../TERate_output/
cp ~/TERate-master/split_refFlat.sh ../TERate_output/
cp ~/TERate-master/bedgraph_to_hits ../TERate_output/
cp ~/TERate-master/TER_calculate ../TERate_output/-
- BAM to bedgraph with 'bam2bedgraph' script from bedtools. And select longest isoform of gene, further to split gene annotation file refFlat_hg38.txt (Download form UCSC Genome Browser) into 300 bp bins/windows with 'gee_to_window' script.
./bam2bedgraph accepted_hits.bam > accepted_hits.bedgraph
perl -alne '$,="\t";print (@F[0,1],$F[5]-$F[4])' refFlat_hg38.txt |sort -k1,1 -k3,3gr |sort -k1,1 -u |cut -f 2 > refFlat_hg38_longiso.eid
perl select_ID.pl refFlat_hg38.txt refFlat_hg38_longiso.eid 2 > refFlat.txt
./gene_to_window refFlat.txt 300 > refFlat_bins.txt-
- To reduce time consumption of TERate, proposal for split 'bedgraph file' (accepted_hits.bedgraph) and 'refFlat file' (refFlat_bins.txt) into each chromosome with 'split_bedgraph.sh' and 'split_refFlat.sh' scripts. Create split work directory 'split/' and split bedgraph and refFlat into 300 bp bins/windows with 'nohup' for backstage running.
mkdir split
cd split/
sh ../split_bedgraph.sh ../accepted_hits.bedgraph
sh ../split_refFlat.sh ../refFlat_bins.txt-
- After 'split_refFlat.sh' and 'split_bedgraph.sh' finished, then using 'bedgraph_to_hits' to calculate Hits for each bins/windows (~ 3-4 hr time consumption). Calculate each bin reads number (Hits) with 'nohup' for backstage running.
ls |grep "bin" |awk -F"_" '{print "nohup ../bedgraph_to_hits "$1"_bedgraph.txt "$1"_bin.txt > "$1"_hits.txt &"}' |sh-
- When script 'bedgraph_to_hits' finished, return to 'TERate_output/' directory to combine all hit results and sort with gene name.
cd ../
cat split/chr1_hits.txt split/chr2_hits.txt split/chr3_hits.txt split/chr4_hits.txt split/chr5_hits.txt split/chr6_hits.txt split/chr7_hits.txt split/chr8_hits.txt split/chr9_hits.txt split/chr10_hits.txt split/chr11_hits.txt split/chr12_hits.txt split/chr13_hits.txt split/chr14_hits.txt split/chr15_hits.txt split/chr16_hits.txt split/chr17_hits.txt split/chr18_hits.txt split/chr19_hits.txt split/chr20_hits.txt split/chr21_hits.txt split/chr22_hits.txt split/chrX_hits.txt split/chrY_hits.txt split/chrM_hits.txt > combine_hits.txt
sort -k4,4 -k1,1 -k2,2n -k3,3nr combine_hits.txt > sorted_hits.txt-
- Calculate transcription elongation rate for each gene with 'calculate_TER' script.
./calculate_TER sorted_hits.txt 10 300 |sort -k1,1 -k4,4nr |awk '{a[$1,++b[$1]]=$0}END{for(i in b)print a[i,1]}' > TERate_output.txt'TERate_output.txt' is the result of TERate pipeline.
Gene annotation file refFlat.txt is in the format (Gene Predictions and RefSeq Genes with Gene Names) below (see details in the example file).
| Field | Description |
|---|---|
| geneName | Name of gene |
| isoformName | Name of isoform |
| chrom | Reference sequence |
| strand | + or - for strand |
| txStart | Transcription start position |
| txEnd | Transcription end position |
| cdsStart | Coding region start |
| cdsEnd | Coding region end |
| exonCount | Number of exons |
| exonStarts | Exon start positions |
| exonEnds | Exon end positions |
See details in the example file.
| Field | Description |
|---|---|
| geneName | Name of gene |
| isoformName | Name of isoform |
| strand | + or - for strand |
| TER | Transcription elongation rate (bp/m) |
- [GCC] gcc version 4.6.1
- [nohup] GNU GPL version 3
- [bedtools] (https://github.com/arq5x/bedtools2) v2.19.0
Copyright (C) 2016 YangLab. See the LICENSE file for license rights and limitations (MIT).