mBuild 2.0 polymer and simulation improvements: Introduce `Path` class, `Polymer.build_from_path()` and add new HOOMD-Blue simulation methods. #1261

chrisjonesBSU · 2025-08-14T14:12:09Z

This PR begins the first efforts towards development of mBuild 2.0 (i.e., we can break things), and focuses mostly on improving the polymer-building experience in mBuild as discussed in #1205 and #1212. There are also quite a few changes that introduce more energy minimization methods that utilize HOOMD-Blue, and can run on GPU for larger systems and/or HPC workflows. I didn't mean for so many changes to be included in this PR, but as I worked on this, more and more needs kept creeping up. This PR isn't necessarily the completed effort on the area of initializing polymer systems and adding more feature-rich energy minimization approaches, but it's close to a good stopping place for now. I'll try to summarize the most notable changes. This won't be merged to the main branch, but instead will be the first commit on the develop branch.

Addition of the `mbuild.path.Path` class

This introduces a new data structure for creating "paths" (i.e., molecular configurations). The primary purpose of this class is to generate better starting polymer-chain configurations, as opposed to the wonky ones you often get by default from the Polymer.build() method.
The basic idea is that a path represents a coarse-grained configuration that atomistic polymers can be mapped onto during the build step. Most of the classes include non-random, deterministic paths (3-D lamellar structures, ring polymers, etc.), but there is a HardSphereRandomWalk path as well.

Here are some examples of the idea and API behind this:

Build a 3D lamellar polymer by specifying parameters such as layer length, separation, number of layers and stacks (3D):

from mbuild import Polymer
from mbuild.path import Lamellar

lamellar_path = Lamellar(
    num_layers=4,
    layer_length=2,
    layer_separation=0.7,
    bond_length=0.25,
    num_stacks=4,
    stack_separation=0.5
)
pe_chain = Polymer()
pe_chain.add_monomer(compound=mb.load("CC", smiles=True), indices=[2, 6], replace=True, separation=0.154)
pe_chain.build_from_path(lamellar_path)

Build a polymer from a random walk:

from mbuild import Polymer
from mbuild.path import HardSphereRandomWalk

random_walk = HardSphereRandomWalk(
    N=30,
    bond_length=0.25,
    radius=0.23,
    min_angle=np.pi/3,
    max_angle=np.pi,
    max_attempts=1e4,
    seed=42,
    bond_graph=nx.path_graph(30)
)
pe_chain = Polymer()
pe_chain.add_monomer(compound=mb.load("CC", smiles=True), indices=[2, 6], replace=True, separation=0.154)
pe_chain.build_from_path(path=random_walk, energy_minimize=True)

`HardSphereRandomWalk` and `mbuild.utils.volume.Constraint`:

Some highlights of this Path class specifically:

Uses a simple "forcefield" with hard spheres, fixed bond lengths, and angles sampled uniformly between a min and max.
- Dihedrals aren't considered right now, and no energies are computed or used in the Monte Carlo accept/reject step.
- Choices for min and max angles allow some control over the shape of the random walk.
- Values for radius and bond length can be estimated from the monomer compounds (mbuild.Compound.volume is added in this PR, using RDKit)
Uses numba for most expensive tasks. As a result, random walks with relatively low constraints can achieve 200,000 steps in ~20 seconds. As a comparison, pure numpy implementation would take multiple minutes at this size.
You can add volume constraints via mbuild.utils.volume.Constraint. Right now this PR adds CuboidConstraint, SphereConstraint and CylinderConstraint. These objects can be passed into HardSphereRandomWalk, making it easy to add to and extend these constraints separately from the random walk logic.
You can start the random walk from another mbuild.path.Path instance.
- For example, create a Lamellar path, then run a random walk from the last site. The resulting path includes all the coordinates of the first, as well as the results of the random walk. This can be used to start constructing semi-crystalline morphologies.
- Run a long random walk, then iteratively choose sites from this backbone to run smaller random walks from, creating a branch-like structure
- In a for loop or while loop, run multiple random walks where the next starting point is chosen randomly. This can create a system of multiple random walks where from each one, a polymer can be mapped to.
Account for the coordinates of other mBuild Compounds in the random walk.
- For example, run a random walk around a system of carbon nanotubes.
For more constrained random walks (higher density of points, including other compounds, volume constraints, etc.) several moves can be attempted in one "batch", and the first successful one accepted (trial_batch_size parameter). This should help constrained random walks finish, but possibly at a performance cost. I haven't had a chance to profile this one yet.

Here are some examples:

Stitching together a lamellar path and random walk

lamellar_path = Lamellar(
    num_layers=5,
    layer_length=3,
    layer_separation=0.7,
    bond_length=0.25,
    num_stacks=4, stack_separation=0.7
)

lamellar_plus_rw = HardSphereRandomWalk(
    N=100,
    min_angle=np.pi/1.3,
    max_angle=np.pi,
    bond_length=0.25,
    radius=0.23,
    start_from_path=lamellar_path,
    start_from_path_index=-1,
)

Including a volume constraint in the random walk

from mbuild.utils.volumes import CuboidConstraint, SphereConstraint, CylinderConstraint
from mbuild.path import HardSphereRandomWalk

sphere = SphereConstraint(radius=3, center=(0,0,0))
rw_path = HardSphereRandomWalk(
    N=300,
    bond_length=0.25,
    radius=0.23,
    volume_constraint=sphere,
    min_angle=np.pi/4,
    max_angle=np.pi,
    trial_batch_size=30,
)

mbuild.simulation module

We already had a pretty great set of simulation methods for performing energy minimization. This PR pulled all of that code out of compound.py and added it to a simulation.py. Now, you pass the compound into these methods rather than calling it as a class method (e.g, energy_minimize(compound) instead of compound.energy_minimize()). This trims down the Compound class quite a bit (~700 lines of code).

This PR also adds a few things that allows us to use HOOMD in addition to OpenBabel and OpenMM. The main new addition here is a wrapper class around HOOMD's own simulation class. This makes it easy to design more specific workflows that can go beyond energy minimization. For example, a quick compress simulation is planned for a future PR, which can be ran after the packing.py methods. Hopefully, this approach makes it easy for users to make their own quick simulation protocols when needed, and contribute upstream when possible.

This PR adds 2 methods that use this HOOMD base class:

hoomd_cap_displacement: This uses the capped displacement method in HOOMD

hoomd_fire: This uses HOOMD's implementation of the FIRE algorithm.

These can both be better options for the energy minimization demands of building larger polymers from paths. They can use the GPU if available, and even on CPU, they are much faster than Openbabel after certain system sizes. hoomd_cap_displacement was able to optimize the geometry for a polyethylene 200mer in about the same amount of time it took openbabel to optimize an 80mer (around 12 seconds). They were nearly equally efficient at smaller system sizes.

The biggest bottleneck here is the lack of support for universal forcefields (e.g., UFF) that aren't hampered by missing parameters. For now, we are keeping the openbabel stuff in, even though there are some upstream issues with using it as a dependency. Also, we are planning on keeping and expanding the OpenMM energy minimization as well. I might look at making it more modular in the future, similar to the HOOMD implementation.

One important feature of the HOOMD implementation is that all the simulation information is saved, making it faster to call these methods multiple times (e.g., in a for loop or while loop) while skipping repeated calls to the under-the-hood steps of atom typing, applying the forcefield, writing out HOOMD data structures, etc.

For example run hoomd_cap_displacement in small increments with a condition to stop once all overlaps are removed:

oplsaa = foyer.Forcefield("oplsaa")

while len(compound.check_for_overlaps(excluded_bond_depth=1)) != 0:
    hoomd_cap_displacement(compound=compound, n_steps=500, max_displacement=1e-4, forcefield=oplsaa)

Combining Everything:

Here is the result of combining some of the things discussed above. Here, in a while loop, I ran 50 random walks of 25 steps each in a cubic box with 5nm edges. From each random walk, a polyethylene chain was built, and hoomd_cap_displacement was ran on the system of 50 polymers for 2,000 steps to relax bonds, angles and remove any overlaps. The result is a distribution of initial chain configurations packed at a density of $\approx 0.5 \frac{g}{cm^3}$. This is significantly better than using PACKMOL to pack a huge box with long, straight 25mers. This took ~3-4 minutes total.

Remaining To-dos:

Some of these may or may not be included in this PR

General:

- Add more unit tests
- Add more examples and notes to doc strings
- Create tutorials

Path and Polymer.build_from_path():

- Orient atomistic monomers along the direction of the path. This should greatly improve the initial bonding when building from, and mapping to a path.
- If you set the bond graph for a path, then we can add a method that gets all pairs of bonded path sites and stores the orientation vector. This should be pretty quick and all the information can be available via a dictionary or something. This can be used in Polymer.build_from_path or Path.apply_backmapping().
- The current implementation of these paths are mostly suited for linear polymers (nx.path_graph()), but everything we need to support things like branching polymers is there, it just needs to be implemented. Cross-linking will require some extra thinking and design. As of 45a6eed, bond graphs are built on the fly, and include ability to track branching paths.
- Add a helper function that can sort and filter coordinates by local density. This can help with running multiple random walks in highly dense systems. This could be a numba function as well. It can be a stand-alone function that a user can call and implement on their own for stringing together multiple random walks. (Done in aae02f8)
- Add a from_box() class method for CuboidConstraint so one can be created from a box of another compound. This would be nice when using an existing mBuild compound in a random walk that also has volume constraints.
- More thorough performance profiling. It will be interesting to see how performance scales with number density in constrained walks.

simulation.py:

- Smart (i.e., automatic) box setting for the hoomd methods in simulation.py
- Add a quick compress method to simulation.py that uses Hoomd's box updater.
- Add support for some kind of universal forcefield in foyer and gmso (UFF, Dreiding, make up our own). This is sort of in-progress else where, but is facing some challenges. Done-ish in dreiding-foyer
- Make sure some of the features previously added in the OpenBabel and OpenMM methods can be added to the HOOMD methods (anchors, shift CoM, etc..)

Some discussion points

Pretty much everything in path.py, density.py and volumes.py is completely separate from the Compound and Polymer data structures. It kind of makes me wonder if there would be some utility in path.py, density.py and volumes.py being in a separate MoSDeF package? It creates some overhead with handling an additional package, but it also keeps mBuild's code-base cleaner and more concise, and opens up more flexibility for further development around Path and Constraint that isn't necessarily limited by mBuild's existing design and data structures. I'm not opposed to it, but also not opposed to keeping all of this in mBuild. Just something to think about.
With these additions, there are a lot of workflows that become possible by essentially creating another Monte Carlo like layer of choices (e.g., stringing together multiple random walks, random branching from other random walks, random walks of different lengths to achieve a certain polydispersity, etc..). Once the new features in this PR, and subsequent PRs, become more fleshed-out, it might be worth thinking about adding some recipes/wrappers that implement these at a higher level.

…ate tests

updates: - [github.com/astral-sh/ruff-pre-commit: v0.12.8 → v0.12.9](astral-sh/ruff-pre-commit@v0.12.8...v0.12.9)

…te-config [pre-commit.ci] pre-commit autoupdate

updates: - [github.com/astral-sh/ruff-pre-commit: v0.12.8 → v0.12.9](astral-sh/ruff-pre-commit@v0.12.8...v0.12.9)

…ld-warnings

Transition to logging instead of warnings module

…andom walks

… to high density

…hard spheres

…ds, fixes to CompoundRandomWalk (tbd if this stays)

updates: - [github.com/astral-sh/ruff-pre-commit: v0.12.9 → v0.12.10](astral-sh/ruff-pre-commit@v0.12.9...v0.12.10)

…te-config [pre-commit.ci] pre-commit autoupdate

… the moment

… setting graph edges as the path is built

mbuild/compound.py

 from mbuild.coordinate_transform import _rotate, _translate
 from mbuild.exceptions import MBuildError
 from mbuild.periodic_kdtree import PeriodicKDTree
+from mbuild.utils.geometry import bounding_box


mbuild/path/path.py

+        )
+        return thetas, r
+
+    def _initial_points(self):


mbuild/simulation.py

mbuild/utils/volumes.py

+        self.mins = self.center - np.array([Lx / 2, Ly / 2, Lz / 2])
+        self.maxs = self.center + np.array([Lx / 2, Ly / 2, Lz / 2])
+
+    def is_inside(self, points, buffer):


mbuild/utils/volumes.py

+        self.mins = self.center - self.radius
+        self.maxs = self.center + self.radius
+
+    def is_inside(self, points, buffer):


mbuild/utils/volumes.py

+            ]
+        )
+
+    def is_inside(self, points, buffer):


mbuild/polymer.py

 __all__ = ["Polymer"]


+class Monomer(Compound):


chrisjonesBSU added 30 commits November 21, 2024 23:31

add random walk util, new methods to Polymer()

de8dfaf

add straighten method

9ef8c84

rename file to conformaitons, add lamellae method

f68e313

pass in seed param, fix import

67f22c6

clean up lamellar func

5528263

move polymer.py out of recipes

26572f0

remove Polymer from entry points in setup

d4704df

clean up random walk

2a4e002

remove Rg method from polymer class for now

dc6b8cf

change file location

cc0a3b6

merge with upstream

7ef0222

remove polymer from plugins during install

406c700

add new build methods

00c30ea

remove build_straight

73d4310

Merge branch 'main' of github.com:mosdef-hub/mbuild into polymer-config

9ff954a

Merge branch 'main' of github.com:mosdef-hub/mbuild into polymer-config

f2d87d2

Merge branch 'main' of github.com:mosdef-hub/mbuild into polymer-config

5c1248e

Merge branch 'main' of github.com:mosdef-hub/mbuild into polymer-config

7d52639

update imports in tests

fdc8cf3

update doc strings

43fb673

Merge branch 'main' of github.com:mosdef-hub/mbuild into polymer-config

590123a

move all energy minimization stuff to its own file, move polymer, upd…

5a5eda7

…ate tests

Merge branch 'main' of github.com:mosdef-hub/mbuild into polymer-config

85f4098

Merge branch 'main' of github.com:mosdef-hub/mbuild into polymer-config

e28e8a2

move chunk of bounding box code to utils, add Path classes

05c723e

add build_from_path method

b406f18

update doc strings

307671d

add notes to doc strings

ab17e4a

begin adding lamellar Path

8873069

add Lamellar path/template class

d40355b

chrisjonesBSU and others added 26 commits August 14, 2025 15:50

Add more doc strings and code comments

6eddebb

Transition to logging instead of warnings module

80bbc25

Update to f-strings

d6e14a3

Fix pybel warn bug

e1f55dd

Fix caplog level for cif tests

e4265da

Add dihedral and ignore params in gmso.apply

ea8bd27

[pre-commit.ci] pre-commit autoupdate

bb5cda3

updates: - [github.com/astral-sh/ruff-pre-commit: v0.12.8 → v0.12.9](astral-sh/ruff-pre-commit@v0.12.8...v0.12.9)

Merge pull request mosdef-hub#1263 from mosdef-hub/pre-commit-ci-upda…

baa1ee6

…te-config [pre-commit.ci] pre-commit autoupdate

[pre-commit.ci] pre-commit autoupdate

4ed5170

updates: - [github.com/astral-sh/ruff-pre-commit: v0.12.8 → v0.12.9](astral-sh/ruff-pre-commit@v0.12.8...v0.12.9)

Merge branch 'main' of https://github.com/mosdef-hub/mbuild into mbui…

e793e24

…ld-warnings

add ___init__.py to codecov ignore

f6ad3c3

Merge pull request mosdef-hub#1262 from CalCraven/mbuild-warnings

52c957b

Transition to logging instead of warnings module

Add 2 methods useful for picking next point when stringing together r…

aae02f8

…andom walks

finding low density points returns all tested points, sorted from low…

f7e9a63

… to high density

Add a new random walk method that operates with compounds instead of …

c8af3bd

…hard spheres

Replace param name in volume classes

3f5f493

Reorg path stuff, build bond-graph as the HardSphereRandomWalk procee…

45a6eed

…ds, fixes to CompoundRandomWalk (tbd if this stays)

[pre-commit.ci] pre-commit autoupdate

ac7f11f

updates: - [github.com/astral-sh/ruff-pre-commit: v0.12.9 → v0.12.10](astral-sh/ruff-pre-commit@v0.12.9...v0.12.10)

Merge pull request mosdef-hub#1264 from mosdef-hub/pre-commit-ci-upda…

fa7f667

…te-config [pre-commit.ci] pre-commit autoupdate

use built in importlib library

c263552

Remove neighborlist method from path, it's not being used anywhere at…

de4acdb

… the moment

Remove CompoundRandomWalk for now, it's getting a bit messy

999dde3

Always set bond graph when starting from another path, add method for…

d1c2af7

… setting graph edges as the path is built

Fix some counting and indexing logic when starting from a path

56692d4

Add beginning of data structure for RW bias

6b3ee2f

bug fix, some progress on coordinate bias

a6b6e0f

github-advanced-security bot found potential problems Aug 28, 2025

View reviewed changes

chrisjonesBSU added 2 commits August 29, 2025 16:21

CodeQL Fixes

67544ef

auto set box

c9ad217

github-advanced-security bot found potential problems Oct 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mBuild 2.0 polymer and simulation improvements: Introduce `Path` class, `Polymer.build_from_path()` and add new HOOMD-Blue simulation methods. #1261

mBuild 2.0 polymer and simulation improvements: Introduce `Path` class, `Polymer.build_from_path()` and add new HOOMD-Blue simulation methods. #1261

chrisjonesBSU commented Aug 14, 2025 •

edited

Loading

Uh oh!

Check notice

Uh oh!

Uh oh!

Check notice

Uh oh!

Uh oh!

Uh oh!

Check warning

Check warning

Check warning

Check failure

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mBuild 2.0 polymer and simulation improvements: Introduce Path class, Polymer.build_from_path() and add new HOOMD-Blue simulation methods. #1261

Are you sure you want to change the base?

mBuild 2.0 polymer and simulation improvements: Introduce Path class, Polymer.build_from_path() and add new HOOMD-Blue simulation methods. #1261

Conversation

chrisjonesBSU commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Addition of the mbuild.path.Path class

HardSphereRandomWalk and mbuild.utils.volume.Constraint:

mbuild.simulation module

Combining Everything:

Remaining To-dos:

Some discussion points

Uh oh!

Check notice

Uh oh!

Uh oh!

Uh oh!

Check notice

Uh oh!

Uh oh!

Uh oh!

Check warning

Uh oh!

Check warning

Uh oh!

Check warning

Uh oh!

Check failure

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mBuild 2.0 polymer and simulation improvements: Introduce `Path` class, `Polymer.build_from_path()` and add new HOOMD-Blue simulation methods. #1261

mBuild 2.0 polymer and simulation improvements: Introduce `Path` class, `Polymer.build_from_path()` and add new HOOMD-Blue simulation methods. #1261

chrisjonesBSU commented Aug 14, 2025 •

edited

Loading

Addition of the `mbuild.path.Path` class

`HardSphereRandomWalk` and `mbuild.utils.volume.Constraint`: