-
Notifications
You must be signed in to change notification settings - Fork 4
Running RUM
This page covers some common use cases for running RUM. Note that you
can get detailed help by running rum_runner help. All these examples
assume you have installed RUM system-wide or have added the bin
directory that contains rum_runner to your PATH.
RUM will create several output files and many intermediate temporary
files. When you run rum_runner, you will need to specify an output
directory for these files. Once you've started a job running in an
output directory, you can run other RUM operations like checking the
job status or killing the job just by specifying the output directory
(more details below). You can only use an output directory to contain
the results of mapping one set of input reads. If you have multiple
samples and you want to map them separately, please use a different
output directory for each one, or else the output file names will
clash.
Let's assume your data is in a directory called "data/Lane1" and your indexes are in "$RUM_INDEXES". For single end data, you would execute the command shown below. The reads files can be FASTA or FASTQ.
rum_runner align
--index $RUM_INDEXES/ORGANISM \
--output data/Lane1 \
--name Lane1 \
--chunks 1 \
data/Lane1/reads.txtFor paired-end run as follows
rum_runner align
--index $RUM_INDEXES/ORGANISM \
--output data/Lane1 \
--name Lane1 \
--chunks 1 \
data/Lane1/forwardreads.txt data/Lane1/reversereads.txtChange "ORGANISM" to your specific organism. Note that you'll need an index corresponding to that organism. You can find more information about installing indexes here.
This will run the entire alignment job in one piece. To parallelize,
you must be on a machine with multiple cores or have access to
multiple machines. To run in parallel on a single machine, simply add
a --chunks N option, where N is the number of chunks to split the
input into. If you are an a cluster, please take a look at these
instructions for running a job on a
cluster.
Unless you have processors or nodes left over, you should wait until it's completely done, running one lane before doing another. Each lane can take many hours.
The --name Lane1 argument is a name that you might want to change to
be more descriptive, however it must be all letters, numbers, dashes,
underscores, periods, nothing else.
After you start a job, you can check on its status by running
rum_runner status -o OUTPUT_DIRThis will print a short report showing which steps of the pipeline
have been completed. Please see rum_runner help status for more
information.
If you need to stop a job and restart it later, use
rum_runner stop -o OUTPUT_DIRPlease see rum_runner help stop for more information.
If you have a job that crashed or that you had to stop for some reason, you should be able to resume it from where it stopped with:
rum_runner resume -o OUTPUT_DIRIf you want to restart it from an earlier step, you can use the --from-step option:
rum_runner resume -o OUTPUT_DIR --from-step 17You can find the step numbers for your job using rum_runner status. Note that unless you ran RUM with the --no-clean option, it
may have to actually back up to an earlier step if intermediate files
have been cleaned up.
If you want to completely abort a job, you can use
rum_runner kill -o OUTPUT_DIRThis will stop the job and remove all the output files associated with
it. DO NOT run this if you want to restart the job later from where it
left off; use rum_runner stop instead in that case.
By default, RUM should remove all of the intermediate and temporary files it creates. However, if it encounters any errors during the run, it may leave some files around in order to help with debugging. To clean those files up, you can run
rum_runner clean -o OUTPUT_DIRPlease see rum_runner help clean for more information.
Once you have multiple lanes mapped, there is a script called 'featurequant2geneprofiles.pl' in the scripts directory that will create one spreadsheet of the normalized intensities with rows=genes and columns=samples. Run it without parameters to get the usage:
perl rum/bin/featurequant2geneprofiles.plNext: RUM output files