Improved memory handling by exploiting lazy memory allocation#129
Draft
Improved memory handling by exploiting lazy memory allocation#129
Conversation
Collaborator
Author
|
Inherent to the way that this method works is that the unrolled sequence is no longer in memory after the sequence has been run, this clashes with how averaging is done in console, which is on a console level, rather than a sequence level. Since Pulseq supports flags for averages and it's relatively trivial to add to the inbuilt sequences I propose removing the console level averaging in favour of averaging defined in the .seq file, particularly since the motivation behind console level averaging was the inability to handle long sequences due to memory consumption which is (for the most part) addressed in this PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The maximum sequence duration is limited by the available system memory, with a sequence taking around 10GB of memory per minute, limiting sequenced duration to a maximum of around 20 minutes (systems typically max out at 256GB of RAM). In this PR the way that the sequence is stored in memory is changed so that memory consumption on typical sequences is substantially reduced (around 95% for the TSE example script) by exploiting Python/Numpy's lazy memory allocation.
Currently the sequence is stored in memory as one array for a 10 minute sequence the size of the array will be around 100 GB. When the array is created Python checks if that array could fit in memory and reserves a memory block of that size but it doesn't occupy all of the 100 GB because Numpy arrays are lazily allocated meaning that the memory is only allocated once it's accessed. Accessing a sub-section of the array only allocates that sub-section of the array, it doesn't allocate the rest of the array but once a sub-section of the array has been accessed it will remain allocated until the entire array is deleted from memory.
The outputs of most MR sequences are sparsely populated; the RF channel will typically only be transmitting less than 1% of the time, and if sequences have a long repetition time the gradient outputs are also mostly zero (shim offsets are handled separately). The sequence provider only accesses the sequence array in places where the output is non-zero (and therefore the memory is only allocated in a few places), which means that while 100 GB is reserved for the sequence array the actually occupied memory is a few GB, however as the sequence is being played out by the console the entire array is being read (accessed) which means that by the time the sequence has finished playing out the entire array has been accessed and the 100 GB array is now fully allocated in memory, despite initially only occupying a few GB.
In this PR the sequence array is segmented in to a list of 'notify sized' arrays in the sequence provider, the tx_device now goes through the list and plays out these array subsections and deleting from the list as the sequence plays out so that once the memory has been allocated for that 'notify sized' array and it's been passed to the card the array is released from memory. This means that memory consumption peaks immediately after the sequence has been unrolled and is essentially determined by how sparsely the output of the Tx Card is in a sequence. For sequences where the outputs are always on this update doesn't improve memory performance, for sequences with a very long TR this can reduce memory consumption immensely.