Skip to content

Release Notes 2.0.3

Mike DeLaurentis edited this page Sep 28, 2012 · 12 revisions

Note: If you are a RUM 1.x user and you haven't read the release notes for RUM 2.0.1, please do so. That version contains lots of changes from RUM 1.x.


The major changes between RUM 2.0.2 and 2.0.3 are to address some bugs, improve usability, and improve performance for some jobs. Please see more details below.

Bug fixes

The most notable bug that was addressed is #93. In the junction *.bed files, we were coloring known and novel junctions with the same colors, where they should have been two different colors. This has been fixed in 2.0.3.

Usability changes

We have added some actions and changed the meaning of some existing actions in order to improve the usability.

  • rum_runner stop now stops a running job but leaves it intact
  • rum_runner kill stops a running job and removes all of its output files
  • rum_runner init initializes a new job but does not run it
  • rum_runner align now starts a new job from scratch but does not restart an existing job
  • rum_runner resume will resume a previously run or initialized job from wherever it left off

Now if you run rum_runner align on a directory that already has a RUM job initialized in it, it will refuse to run the job. If you want to resume an existing job that crashed or was stopped, use rum_runner resume. If you want to abort the job and start from scratch, use rum_runner kill and then rum_runner align again.

Please see Running RUM for a full description of these commands.

Logging changes

We have attempted to tune the logging output so that the error log files only contain information about actual error conditions, and not false alarms.

We now start a new set of log files every time rum_runner is run. We archive any old log files and leave them around as a tar.gz file.

Performance improvements

RUM 2.0.3 includes some changes that should improve performance for many jobs.

We have added a default limit of 100 non-unique Bowtie alignments per read. This may make processing Bowtie's output much faster, and may result in fewer non-unique mappings, which would improve performance some.

Prior to RUM 2.0.3, RUM would wait for Bowtie and Blat to finish before starting to process their output. As of RUM 2.0.3 it now reads the output of those programs as it is being produced. In many situations this will reduce the amount of I/O and result in better CPU utilization.

We have attempted to reduce the running time of the "Quantify novel exons" step by changing the data structure that the program uses to accumulate quantification counts.

Using typical input files, we have seen performance improvements ranging from 17% to 24% in terms of total CPU time (across all chunks) and 14% to 24% in terms of wall clock time.

Alignment changes

Several of the changes have affected the alignment algorithm slightly. By default, we now cap the number of alignments produced by bowtie at 100. The purpose of this change is to avoid situations where we have to consider thousands of combinations of Bowtie mappings, resulting in poor performance. The effect of this change is that RUM may now suppress some ambiguous Bowtie alignments that used to be included in the results.

We also fixed a bug in the code that processes the Bowtie genome alignments. Prior to 2.0.3, we were accidentally suppressing many non-unique alignments for paired reads.

The net effect of both of these changes is that RUM 2.0.3 may produce more alignments for some reads and fewer alignments for other reads. We analyzed the changes in the number of alignments for reads for a typical job, consisting of approximately 87 million reads of length 101 mapped against the hg19 index. We place each read in a class according to the number of alignments that RUM produced for it: "none", "unique", and "non-unique". For this data set, 99.985% of the reads were put in the same class by both versions of RUM.

The following table shows the number of reads that moved from one class to another, for each combination of classes:

Old mapping New mapping Number of reads Percent of all reads for job
unique unique 68655126 78.08057%
none none 10011229 11.38564%
non-unique non-unique 9249449 10.51928%
unique non-unique 9641 0.01096%
non-unique unique 2514 0.00286%
unique none 410 0.00047%
non-unique none 173 0.00020%
none non-unique 18 0.00002%
none unique 3 0.00000%

Other changes

  • Make rum_runner status indicate whether rum is currently running.
  • Make --chunks a required option.
  • Documentation and error message improvements.
  • Get rid of .rum directory and move job configuration to "rum_job_config" in the output directory.

The full list of issues addressed in this release is here.

Clone this wiki locally