Skip to content

Memory leak when running MCMC in parallel #230

@bocklund

Description

@bocklund

Due to a known memory leak when instantiating subclasses of SymEngine (one of our upstream dependencies) Symbol objects (see symengine/symengine.py#379), running ESPEI with parallelization will cause memory to grow in each worker.

Only running in parallel will trigger significant memory growth, because running in parallel uses the pickle library to serialize and deserialize symbol objects and create new objects that can't be freed. When running without parallelization (mcmc.scheduler: null), new symbols are not created.

Until symengine/symengine.py#379 is fixed, some mitigation strategies to avoid running out of memory are:

  • Run ESPEI without parallelization by setting scheduler: null
  • (Under consideration to implement): when parallelization is active, use an option to restart the workers every N iterations.
  • (Under consideration to implement): remove Model objects from the keyword arguments of ESPEI's likelihood functions. Model objects contribute a lot of symbol instances in the form of v.SiteFraction objects. We should be able to get away with only using PhaseRecord objects, but there are a few places Model.constituents to be able to infer the sublattice model and internal degrees of freedom that would need to be rewritten.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions