-
Notifications
You must be signed in to change notification settings - Fork 7
Description
We are seeing evidence of excessive memory usage when attempting to store and calculate new quantities for very large sets of events. This is related to the event loop approach which currently requires the creation and maintenance of several large numpy arrays before values can be stored to temporary subtree files.
Due to Python's notoriously bad management of memory, each job looks like it requires a lot of resources and particularly large jobs cannot be run interactively without overloading the LXPLUS node. Variable deletion and garbage collection cannot be used to effectively free resources during runtime.
One solution which has worked in the past is to submit each calculation to a worker process and take advantage of multiprocessing's ability to clear and free resources which each worker has used. But this is arguably messy in other ways, and I don't want to implement it here. Scaling is also a concern, as we will be adding numerous additional calculations and interfacing with MELA.