Skip to content

Conversation

@chrisjonesBSU
Copy link
Contributor

@chrisjonesBSU chrisjonesBSU commented Aug 14, 2025

This PR begins the first efforts towards development of mBuild 2.0 (i.e., we can break things), and focuses mostly on improving the polymer-building experience in mBuild as discussed in #1205 and #1212. There are also quite a few changes that introduce more energy minimization methods that utilize HOOMD-Blue, and can run on GPU for larger systems and/or HPC workflows. I didn't mean for so many changes to be included in this PR, but as I worked on this, more and more needs kept creeping up. This PR isn't necessarily the completed effort on the area of initializing polymer systems and adding more feature-rich energy minimization approaches, but it's close to a good stopping place for now. I'll try to summarize the most notable changes. This won't be merged to the main branch, but instead will be the first commit on the develop branch.

Addition of the mbuild.path.Path class

all-paths mult_rw_volume

This introduces a new data structure for creating "paths" (i.e., molecular configurations). The primary purpose of this class is to generate better starting polymer-chain configurations, as opposed to the wonky ones you often get by default from the Polymer.build() method.
The basic idea is that a path represents a coarse-grained configuration that atomistic polymers can be mapped onto during the build step. Most of the classes include non-random, deterministic paths (3-D lamellar structures, ring polymers, etc.), but there is a HardSphereRandomWalk path as well.

Here are some examples of the idea and API behind this:

Build a 3D lamellar polymer by specifying parameters such as layer length, separation, number of layers and stacks (3D):

from mbuild import Polymer
from mbuild.path import Lamellar

lamellar_path = Lamellar(
    num_layers=4,
    layer_length=2,
    layer_separation=0.7,
    bond_length=0.25,
    num_stacks=4,
    stack_separation=0.5
)
pe_chain = Polymer()
pe_chain.add_monomer(compound=mb.load("CC", smiles=True), indices=[2, 6], replace=True, separation=0.154)
pe_chain.build_from_path(lamellar_path)

Build a polymer from a random walk:

from mbuild import Polymer
from mbuild.path import HardSphereRandomWalk

random_walk = HardSphereRandomWalk(
    N=30,
    bond_length=0.25,
    radius=0.23,
    min_angle=np.pi/3,
    max_angle=np.pi,
    max_attempts=1e4,
    seed=42,
    bond_graph=nx.path_graph(30)
)
pe_chain = Polymer()
pe_chain.add_monomer(compound=mb.load("CC", smiles=True), indices=[2, 6], replace=True, separation=0.154)
pe_chain.build_from_path(path=random_walk, energy_minimize=True)

HardSphereRandomWalk and mbuild.utils.volume.Constraint:

Some highlights of this Path class specifically:

  • Uses a simple "forcefield" with hard spheres, fixed bond lengths, and angles sampled uniformly between a min and max.
    • Dihedrals aren't considered right now, and no energies are computed or used in the Monte Carlo accept/reject step.
    • Choices for min and max angles allow some control over the shape of the random walk.
    • Values for radius and bond length can be estimated from the monomer compounds (mbuild.Compound.volume is added in this PR, using RDKit)
  • Uses numba for most expensive tasks. As a result, random walks with relatively low constraints can achieve 200,000 steps in ~20 seconds. As a comparison, pure numpy implementation would take multiple minutes at this size.
  • You can add volume constraints via mbuild.utils.volume.Constraint. Right now this PR adds CuboidConstraint, SphereConstraint and CylinderConstraint. These objects can be passed into HardSphereRandomWalk, making it easy to add to and extend these constraints separately from the random walk logic.
  • You can start the random walk from another mbuild.path.Path instance.
    • For example, create a Lamellar path, then run a random walk from the last site. The resulting path includes all the coordinates of the first, as well as the results of the random walk. This can be used to start constructing semi-crystalline morphologies.
    • Run a long random walk, then iteratively choose sites from this backbone to run smaller random walks from, creating a branch-like structure
    • In a for loop or while loop, run multiple random walks where the next starting point is chosen randomly. This can create a system of multiple random walks where from each one, a polymer can be mapped to.
  • Account for the coordinates of other mBuild Compounds in the random walk.
    • For example, run a random walk around a system of carbon nanotubes.
  • For more constrained random walks (higher density of points, including other compounds, volume constraints, etc.) several moves can be attempted in one "batch", and the first successful one accepted (trial_batch_size parameter). This should help constrained random walks finish, but possibly at a performance cost. I haven't had a chance to profile this one yet.

Here are some examples:

Stitching together a lamellar path and random walk

lamellar_path = Lamellar(
    num_layers=5,
    layer_length=3,
    layer_separation=0.7,
    bond_length=0.25,
    num_stacks=4, stack_separation=0.7
)

lamellar_plus_rw = HardSphereRandomWalk(
    N=100,
    min_angle=np.pi/1.3,
    max_angle=np.pi,
    bond_length=0.25,
    radius=0.23,
    start_from_path=lamellar_path,
    start_from_path_index=-1,
)
rw_and_lamellar

Including a volume constraint in the random walk

from mbuild.utils.volumes import CuboidConstraint, SphereConstraint, CylinderConstraint
from mbuild.path import HardSphereRandomWalk

sphere = SphereConstraint(radius=3, center=(0,0,0))
rw_path = HardSphereRandomWalk(
    N=300,
    bond_length=0.25,
    radius=0.23,
    volume_constraint=sphere,
    min_angle=np.pi/4,
    max_angle=np.pi,
    trial_batch_size=30,
)
constraints

mbuild.simulation module

We already had a pretty great set of simulation methods for performing energy minimization. This PR pulled all of that code out of compound.py and added it to a simulation.py. Now, you pass the compound into these methods rather than calling it as a class method (e.g, energy_minimize(compound) instead of compound.energy_minimize()). This trims down the Compound class quite a bit (~700 lines of code).

This PR also adds a few things that allows us to use HOOMD in addition to OpenBabel and OpenMM. The main new addition here is a wrapper class around HOOMD's own simulation class. This makes it easy to design more specific workflows that can go beyond energy minimization. For example, a quick compress simulation is planned for a future PR, which can be ran after the packing.py methods. Hopefully, this approach makes it easy for users to make their own quick simulation protocols when needed, and contribute upstream when possible.

This PR adds 2 methods that use this HOOMD base class:

hoomd_cap_displacement: This uses the capped displacement method in HOOMD

hoomd_fire: This uses HOOMD's implementation of the FIRE algorithm.

These can both be better options for the energy minimization demands of building larger polymers from paths. They can use the GPU if available, and even on CPU, they are much faster than Openbabel after certain system sizes. hoomd_cap_displacement was able to optimize the geometry for a polyethylene 200mer in about the same amount of time it took openbabel to optimize an 80mer (around 12 seconds). They were nearly equally efficient at smaller system sizes.

The biggest bottleneck here is the lack of support for universal forcefields (e.g., UFF) that aren't hampered by missing parameters. For now, we are keeping the openbabel stuff in, even though there are some upstream issues with using it as a dependency. Also, we are planning on keeping and expanding the OpenMM energy minimization as well. I might look at making it more modular in the future, similar to the HOOMD implementation.

One important feature of the HOOMD implementation is that all the simulation information is saved, making it faster to call these methods multiple times (e.g., in a for loop or while loop) while skipping repeated calls to the under-the-hood steps of atom typing, applying the forcefield, writing out HOOMD data structures, etc.

For example run hoomd_cap_displacement in small increments with a condition to stop once all overlaps are removed:

oplsaa = foyer.Forcefield("oplsaa")

while len(compound.check_for_overlaps(excluded_bond_depth=1)) != 0:
    hoomd_cap_displacement(compound=compound, n_steps=500, max_displacement=1e-4, forcefield=oplsaa)

Combining Everything:

Here is the result of combining some of the things discussed above. Here, in a while loop, I ran 50 random walks of 25 steps each in a cubic box with 5nm edges. From each random walk, a polyethylene chain was built, and hoomd_cap_displacement was ran on the system of 50 polymers for 2,000 steps to relax bonds, angles and remove any overlaps. The result is a distribution of initial chain configurations packed at a density of $\approx 0.5 \frac{g}{cm^3}$. This is significantly better than using PACKMOL to pack a huge box with long, straight 25mers. This took ~3-4 minutes total.

Screenshot From 2025-08-14 13-52-58

Remaining To-dos:

Some of these may or may not be included in this PR

General:

  • - Add more unit tests
  • - Add more examples and notes to doc strings
  • - Create tutorials

Path and Polymer.build_from_path():

  • - Orient atomistic monomers along the direction of the path. This should greatly improve the initial bonding when building from, and mapping to a path.
  • - If you set the bond graph for a path, then we can add a method that gets all pairs of bonded path sites and stores the orientation vector. This should be pretty quick and all the information can be available via a dictionary or something. This can be used in Polymer.build_from_path or Path.apply_backmapping().
  • - The current implementation of these paths are mostly suited for linear polymers (nx.path_graph()), but everything we need to support things like branching polymers is there, it just needs to be implemented. Cross-linking will require some extra thinking and design. As of 45a6eed, bond graphs are built on the fly, and include ability to track branching paths.
  • - Add a helper function that can sort and filter coordinates by local density. This can help with running multiple random walks in highly dense systems. This could be a numba function as well. It can be a stand-alone function that a user can call and implement on their own for stringing together multiple random walks. (Done in aae02f8)
  • - Add a from_box() class method for CuboidConstraint so one can be created from a box of another compound. This would be nice when using an existing mBuild compound in a random walk that also has volume constraints.
  • - More thorough performance profiling. It will be interesting to see how performance scales with number density in constrained walks.

simulation.py:

  • - Smart (i.e., automatic) box setting for the hoomd methods in simulation.py
  • - Add a quick compress method to simulation.py that uses Hoomd's box updater.
  • - Add support for some kind of universal forcefield in foyer and gmso (UFF, Dreiding, make up our own). This is sort of in-progress else where, but is facing some challenges. Done-ish in dreiding-foyer
  • - Make sure some of the features previously added in the OpenBabel and OpenMM methods can be added to the HOOMD methods (anchors, shift CoM, etc..)

Some discussion points

  • Pretty much everything in path.py, density.py and volumes.py is completely separate from the Compound and Polymer data structures. It kind of makes me wonder if there would be some utility in path.py, density.py and volumes.py being in a separate MoSDeF package? It creates some overhead with handling an additional package, but it also keeps mBuild's code-base cleaner and more concise, and opens up more flexibility for further development around Path and Constraint that isn't necessarily limited by mBuild's existing design and data structures. I'm not opposed to it, but also not opposed to keeping all of this in mBuild. Just something to think about.

  • With these additions, there are a lot of workflows that become possible by essentially creating another Monte Carlo like layer of choices (e.g., stringing together multiple random walks, random branching from other random walks, random walks of different lengths to achieve a certain polydispersity, etc..). Once the new features in this PR, and subsequent PRs, become more fleshed-out, it might be worth thinking about adding some recipes/wrappers that implement these at a higher level.

chrisjonesBSU added 30 commits November 21, 2024 23:31
chrisjonesBSU and others added 26 commits August 14, 2025 15:50
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.12.8 → v0.12.9](astral-sh/ruff-pre-commit@v0.12.8...v0.12.9)
…te-config

[pre-commit.ci] pre-commit autoupdate
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.12.8 → v0.12.9](astral-sh/ruff-pre-commit@v0.12.8...v0.12.9)
Transition to logging instead of warnings module
…ds, fixes to CompoundRandomWalk (tbd if this stays)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.12.9 → v0.12.10](astral-sh/ruff-pre-commit@v0.12.9...v0.12.10)
…te-config

[pre-commit.ci] pre-commit autoupdate
from mbuild.coordinate_transform import _rotate, _translate
from mbuild.exceptions import MBuildError
from mbuild.periodic_kdtree import PeriodicKDTree
from mbuild.utils.geometry import bounding_box

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
mbuild.utils.geometry
begins an import cycle.
)
return thetas, r

def _initial_points(self):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note

Mixing implicit and explicit returns may indicate an error, as implicit returns always return None.
self.mins = self.center - np.array([Lx / 2, Ly / 2, Lz / 2])
self.maxs = self.center + np.array([Lx / 2, Ly / 2, Lz / 2])

def is_inside(self, points, buffer):

Check warning

Code scanning / CodeQL

Signature mismatch in overriding method Warning

Overriding method 'is_inside' has signature mismatch with
overridden method
.
self.mins = self.center - self.radius
self.maxs = self.center + self.radius

def is_inside(self, points, buffer):

Check warning

Code scanning / CodeQL

Signature mismatch in overriding method Warning

Overriding method 'is_inside' has signature mismatch with
overridden method
.
]
)

def is_inside(self, points, buffer):

Check warning

Code scanning / CodeQL

Signature mismatch in overriding method Warning

Overriding method 'is_inside' has signature mismatch with
overridden method
.
__all__ = ["Polymer"]


class Monomer(Compound):

Check failure

Code scanning / CodeQL

Missing call to superclass `__init__` during object initialization Error

This class does not call
Compound.__init__
during initialization. (
Monomer.__init__
may be missing a call to a base class __init__)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.0 Additions for mBuild 2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expand features of Polymer() to improve accessible polymer chain configurations.

2 participants