-
Notifications
You must be signed in to change notification settings - Fork 4
Release Notes 2.0.3
Note: If you are a RUM 1.x user and you haven't read the release notes for RUM 2.0.1, please do so. That version contains lots of changes from RUM 1.x.
The major changes between RUM 2.0.2 and 2.0.3 are to address some bugs, improve usability, and improve performance for some jobs. Please see more details below.
The most notable bug that was addressed is #93. In the junction *.bed files, we were coloring known and novel junctions with the same colors, where they should have been two different colors. This has been fixed in 2.0.3.
We have added some actions and changed the meaning of some existing actions in order to improve the usability.
-
rum_runner stopnow stops a running job but leaves it intact -
rum_runner killstops a running job and removes all of its output files -
rum_runner initinitializes a new job but does not run it -
rum_runner alignnow starts a new job from scratch but does not restart an existing job -
rum_runner resumewill resume a previously run or initialized job from wherever it left off
Now if you run rum_runner align on a directory that already has a
RUM job initialized in it, it will refuse to run the job. If you want
to resume an existing job that crashed or was stopped, use rum_runner resume. If you want to abort the job and start from scratch, use
rum_runner kill and then rum_runner align again.
Please see Running RUM for a full description of these commands.
We have attempted to tune the logging output so that the error log files only contain information about actual error conditions, and not false alarms.
We now start a new set of log files every time rum_runner is run. We
archive any old log files and leave them around as a tar.gz file.
RUM 2.0.3 includes some changes that should improve performance for many jobs.
We have added a default limit of 100 non-unique Bowtie alignments per read. This may make processing Bowtie's output much faster, and may result in fewer non-unique mappings, which would improve performance some.
Prior to RUM 2.0.3, RUM would wait for Bowtie and Blat to finish before starting to process their output. As of RUM 2.0.3 it now reads the output of those programs as it is being produced. In many situations this will reduce the amount of I/O and result in better CPU utilization.
We have attempted to reduce the running time of the "Quantify novel exons" step by changing the data structure that the program uses to accumulate quantification counts.
Using typical input files, we have seen performance improvements ranging from 17% to 24% in terms of total CPU time (across all chunks) and 14% to 24% in terms of wall clock time.
Several of the changes have affected the alignment algorithm slightly. By default, we now cap the number of alignments produced by bowtie at 100. The purpose of this change is to avoid situations where we have to consider thousands of combinations of Bowtie mappings, resulting in poor performance. The effect of this change is that RUM may now suppress some ambiguous Bowtie alignments that used to be included in the results.
We also fixed a bug in the code that processes the Bowtie genome alignments. Prior to 2.0.3, we were accidentally suppressing many non-unique alignments for paired reads.
The net effect of both of these changes is that RUM 2.0.3 may produce more alignments for some reads and fewer alignments for other reads. We analyzed the changes in the number of alignments for reads for a typical job, consisting of approximately 87 million reads of length 101 mapped against the hg19 index. We place each read in a class according to the number of alignments that RUM produced for it: "none", "unique", and "non-unique". For this data set, 99.985% of the reads were put in the same class by both versions of RUM.
The following table shows the number of reads that moved from one class to another, for each combination of classes:
| Old mapping | New mapping | Number of reads | Percent of all reads for job |
|---|---|---|---|
| unique | unique | 68655126 | 78.08057% |
| none | none | 10011229 | 11.38564% |
| non-unique | non-unique | 9249449 | 10.51928% |
| unique | non-unique | 9641 | 0.01096% |
| non-unique | unique | 2514 | 0.00286% |
| unique | none | 410 | 0.00047% |
| non-unique | none | 173 | 0.00020% |
| none | non-unique | 18 | 0.00002% |
| none | unique | 3 | 0.00000% |
- Make
rum_runner statusindicate whether rum is currently running. - Make --chunks a required option.
- Documentation and error message improvements.
- Get rid of .rum directory and move job configuration to "rum_job_config" in the output directory.
The full list of issues addressed in this release is here.