diff --git a/doc/src/Commands_pair.rst b/doc/src/Commands_pair.rst
index 4f65dca6eb0..90f2110dfd8 100644
--- a/doc/src/Commands_pair.rst
+++ b/doc/src/Commands_pair.rst
@@ -112,6 +112,7 @@ OPT.
    * :doc:`gran/hooke (o) <pair_gran>`
    * :doc:`gran/hooke/history (ko) <pair_gran>`
    * :doc:`granular <pair_granular>`
+   * :doc:`granular/superellipsoid <pair_granular_superellipsoid>`
    * :doc:`gw <pair_gw>`
    * :doc:`gw/zbl <pair_gw>`
    * :doc:`harmonic/cut (o) <pair_harmonic_cut>`
diff --git a/doc/src/Howto_granular.rst b/doc/src/Howto_granular.rst
index 57bd7ea2c17..924449274da 100644
--- a/doc/src/Howto_granular.rst
+++ b/doc/src/Howto_granular.rst
@@ -1,17 +1,21 @@
 Granular models
 ===============
 
-Granular systems are composed of spherical particles with a diameter, as
-opposed to point particles.  This means they have an angular velocity
-and torque can be imparted to them to cause them to rotate.
+Granular systems are typically composed of spherical particles with a diameter,
+as opposed to point particles.  This means they have an angular
+velocity and torque can be imparted to them to cause them to rotate.
 
-To run a simulation of a granular model, you will want to use
+To run a simulation of a granular model, you will typically want to use
 the following commands:
 
 * :doc:`atom_style sphere <atom_style>`
 * :doc:`fix nve/sphere <fix_nve_sphere>`
 * :doc:`fix gravity <fix_gravity>`
 
+Aspherical granular particles can be simulated by creating clusters of spherical
+particles using either the :doc:`rigid <fix_rigid>` or :doc:`BPM <Howto_bpm>`
+package or by using :doc:`superellipsoids <pair_granular_superellipsoid>`.
+
 This compute
 
 * :doc:`compute erotate/sphere <compute_erotate_sphere>`
diff --git a/doc/src/Howto_spherical.rst b/doc/src/Howto_spherical.rst
index d86ea12b4bf..bdbd7cc4de9 100644
--- a/doc/src/Howto_spherical.rst
+++ b/doc/src/Howto_spherical.rst
@@ -50,13 +50,16 @@ individual particles, after then are created.
 
 The ellipsoid style defines particles that are ellipsoids and thus can
 be aspherical.  Each particle has a shape, specified by 3 diameters,
-and mass (or density).  These particles store an angular momentum and
-their orientation (quaternion), and can be acted upon by torque.  They
-do not store an angular velocity (omega), which can be in a different
-direction than angular momentum, rather they compute it as needed.
-The "set" command can be used to modify the diameter, orientation, and
-mass of individual particles, after then are created.  It also has a
-brief explanation of what quaternions are.
+and mass (or density).  Superellipsoid particles can be defined by
+specifying 2 blockiness exponents (block) and adding the `superellipsoid`
+keyword to the `atom_style ellipsoid` command.  These particles store an angular
+momentum and their orientation (quaternion), and can be acted upon by
+torque.  They do not store an angular velocity (omega), which can be
+in a different direction than angular momentum, rather they compute it
+as needed.  The "set" command can be used to modify the diameter, orientation,
+and mass of individual particles, after they are created.
+The "set" command can also be used to modify the blockiness of superellipsoid
+particles. It also has a brief explanation of what quaternions are.
 
 The line style defines line segment particles with two end points and
 a mass (or density).  They can be used in 2d simulations, and they can
@@ -113,9 +116,11 @@ When a system with finite-size particles is defined, the particles
 will only rotate and experience torque if the force field computes
 such interactions.  These are the various :doc:`pair styles <pair_style>` that generate torque:
 
-* :doc:`pair_style gran/history <pair_gran>`
-* :doc:`pair_style gran/hertz <pair_gran>`
-* :doc:`pair_style gran/no_history <pair_gran>`
+* :doc:`pair_style granular <pair_granular>`
+* :doc:`pair_style gran/hooke <pair_gran>`
+* :doc:`pair_style gran/hooke/history <pair_gran>`
+* :doc:`pair_style gran/hertz/history <pair_gran>`
+* :doc:`pair_style granular/superellipsoid <pair_granular_superellipsoid>`
 * :doc:`pair_style dipole/cut <pair_dipole>`
 * :doc:`pair_style gayberne <pair_gayberne>`
 * :doc:`pair_style resquared <pair_resquared>`
@@ -126,7 +131,8 @@ such interactions.  These are the various :doc:`pair styles <pair_style>` that g
 * :doc:`pair_style body/nparticle <pair_body_nparticle>`
 
 The granular pair styles are used with spherical particles.  The
-dipole pair style is used with the dipole atom style, which could be
+*granular/superellipsoid* granular pair styles are used with superellipsoid particles.
+The dipole pair style is used with the dipole atom style, which could be
 applied to spherical or ellipsoidal particles.  The GayBerne and
 REsquared potentials require ellipsoidal particles, though they will
 also work if the 3 shape parameters are the same (a sphere).  The
diff --git a/doc/src/Packages_details.rst b/doc/src/Packages_details.rst
index d0dcdbe4be6..6d6ef08c870 100644
--- a/doc/src/Packages_details.rst
+++ b/doc/src/Packages_details.rst
@@ -253,8 +253,8 @@ ASPHERE package
 
 **Contents:**
 
-Computes, time-integration fixes, and pair styles for aspherical
-particle models including ellipsoids, 2d lines, and 3d triangles.
+Computes, time-integration fixes, and pair styles for aspherical particle models
+including ellipsoids, granular superellipsoids, 2d lines, and 3d triangles.
 
 **Supporting info:**
 
@@ -265,6 +265,7 @@ particle models including ellipsoids, 2d lines, and 3d triangles.
 * :doc:`pair_style ylz <pair_ylz>`
 * :doc:`pair_style line/lj <pair_line_lj>`
 * :doc:`pair_style tri/lj <pair_tri_lj>`
+* :doc:`pair_style granular/superellipsoid <pair_granular_superellipsoid>`
 * `doc/PDF/pair_gayberne_extra.pdf <PDF/pair_gayberne_extra.pdf>`_
 * `doc/PDF/pair_resquared_extra.pdf <PDF/pair_resquared_extra.pdf>`_
 * ``examples/ASPHERE``
diff --git a/doc/src/atom_style.rst b/doc/src/atom_style.rst
index f8ef7cd1e08..ca367cc0d9d 100644
--- a/doc/src/atom_style.rst
+++ b/doc/src/atom_style.rst
@@ -28,6 +28,7 @@ Syntax
          *template* arg = template-ID
            template-ID = ID of molecule template specified in a separate :doc:`molecule <molecule>` command
          *hybrid* args = list of one or more sub-styles, each with their args
+         *ellipsoid* arg = superellipsoid (optional) for superellipsoids instead of ellipsoids
 
 * accelerated styles (with same args) = *angle/kk* or *atomic/kk* or *bond/kk* or *charge/kk* or *full/kk* or *molecular/kk* or *spin/kk*
 
@@ -354,6 +355,14 @@ quaternion 4-vector with its orientation.  Each particle stores a flag
 in the ellipsoid vector which indicates whether it is an ellipsoid (1)
 or a point particle (0).
 
+.. versionadded:: TBD
+
+By adding the flag *superellipsoid* to the *ellipsoid* atom_style
+command, the particles can be superellipsoids, which are a
+generalization of ellipsoids with two additional blockiness parameters
+that control the shape.  Superellipsoids also store the principal
+moments of inertia of the particle.
+
 For the *line* style, particles can be are idealized line segments
 which store a per-particle mass and length and orientation (i.e. the
 end points of the line segment).  Each particle stores a flag in the
diff --git a/doc/src/compute_property_atom.rst b/doc/src/compute_property_atom.rst
index 996ef2092ed..79d3932b6cc 100644
--- a/doc/src/compute_property_atom.rst
+++ b/doc/src/compute_property_atom.rst
@@ -26,6 +26,8 @@ Syntax
                              temperature, heatflow,
                              angmomx, angmomy, angmomz,
                              shapex, shapey, shapez,
+                             block1, block2,
+                             inertiax, inertiay, inertiaz,
                              quatw, quati, quatj, quatk, tqx, tqy, tqz,
                              end1x, end1y, end1z, end2x, end2y, end2z,
                              corner1x, corner1y, corner1z,
@@ -64,6 +66,8 @@ Syntax
            *heatflow* = internal heat flow of spherical particle
            *angmomx,angmomy,angmomz* = angular momentum of aspherical particle
            *shapex,shapey,shapez* = 3 diameters of aspherical particle
+           *block1,block2* = 2 blockiness exponents of aspherical (superellipsoid) particle
+           *inertiax,inertiay,inertiaz* = 3 principal moments of inertia of aspherical (superellipsoid) particle
            *quatw,quati,quatj,quatk* = quaternion components for aspherical or body particles
            *tqx,tqy,tqz* = torque on finite-size particles
            *end12x, end12y, end12z* = end points of line segment
@@ -163,6 +167,20 @@ If :doc:`newton bond off <newton>` is set, it will be tallied with both atom
 The quantities *shapex*, *shapey*, and *shapez* are defined for ellipsoidal
 particles and define the 3d shape of each particle.
 
+.. versionadded:: TBD
+
+The quantities *block1*, and *block2*, are defined for superellipsoidal
+particles and define the blockiness of each superellipsoid particle.
+See the :doc:`set <set>` command for an explanation of the blockiness.
+
+.. versionadded:: TBD
+
+The quantities *inertiax*, *inertiay*, and *inertiaz* are defined for
+superellipsoidal particles and define the 3 principal moments of inertia
+of each particle.  These are with respect to the particle's center of
+mass and in a reference system aligned with the particle's principal
+axes.
+
 The quantities *quatw*, *quati*, *quatj*, and *quatk* are defined for
 ellipsoidal particles and body particles and store the 4-vector quaternion
 representing the orientation of each particle.  See the :doc:`set <set>`
diff --git a/doc/src/pair_granular.rst b/doc/src/pair_granular.rst
index 3c6e6fcefba..82403fe5433 100644
--- a/doc/src/pair_granular.rst
+++ b/doc/src/pair_granular.rst
@@ -63,7 +63,7 @@ global, but can be set to different values for different combinations
 of particle types, as determined by the :doc:`pair_coeff <pair_coeff>`
 command.  If the contact model choice is the same for two particle
 types, the mixing for the cross-coefficients can be carried out
-automatically. This is shown in the last example, where model
+automatically. This is shown in one of the examples, where model
 choices are the same for type 1 - type 1 as for type 2 - type2
 interactions, but coefficients are different. In this case, the
 mixed coefficients for type 1 - type 2 interactions can be determined from
diff --git a/doc/src/pair_granular_superellipsoid.rst b/doc/src/pair_granular_superellipsoid.rst
new file mode 100644
index 00000000000..4ede2c07355
--- /dev/null
+++ b/doc/src/pair_granular_superellipsoid.rst
@@ -0,0 +1,561 @@
+.. index:: pair_style granular/superellipsoid
+
+pair_style granular/superellipsoid command
+===========================
+
+Syntax
+""""""
+
+.. code-block:: LAMMPS
+
+   pair_style granular/superellipsoid cutoff no_bounding_box curvature_gaussian
+
+   Optional settings, see discussion below.
+   * cutoff = global cutoff value
+   * no_bounding_box = skip oriented bounding box check
+   * curvature_gaussian = gaussian curvature coeff approximation for contact patch
+
+Examples
+""""""""
+
+.. code-block:: LAMMPS
+
+   pair_style granular/superellipsoid
+   pair_coeff * * hooke 1000.0 50.0 tangential linear_history 1000.0 1.0 0.5 damping mass_velocity
+
+   pair_style granular/superellipsoid 10.0 curvature_gaussian
+   pair_coeff 1 1 hertz 1000.0 50.0 tangential linear_history 500.0 1.0 0.4 damping viscoelastic
+   pair_coeff 2 2 hertz 500.0 50.0 tangential linear_history 250.0 1.0 0.1 damping viscoelastic
+
+Description
+"""""""""""
+
+.. versionadded:: TBD
+
+The *granular/superellipsoid* style calculates granular contact forces
+between superellipsoidal particles (see :doc:`atom style ellipsoid
+<atom_style>`). Similar to the :doc:`granular pairstyle <pair_granular>`
+which is designed for spherical particles, various normal, damping, and
+tangential contact models are available (rolling and twisting may be
+added later). The total computed forces and torques are the sum of various
+models selected.
+
+All model choices and parameters are entered in the
+:doc:`pair_coeff <pair_coeff>` command, as described below.  Coefficient
+values are not global, but can be set to different values for different
+combinations of particle types, as determined by the :doc:`pair_coeff
+<pair_coeff>` command.  If the contact model choice is the same for two
+particle types, the mixing for the cross-coefficients can be carried out
+automatically. This is shown in the last example, where model
+choices are the same for type 1 - type 1 as for type 2 - type2
+interactions, but coefficients are different. In this case, the
+mixed coefficients for type 1 - type 2 interactions can be determined from
+mixing rules discussed below.  For additional flexibility,
+coefficients as well as model forms can vary between particle types.
+
+----------
+
+This pair_style allows granular contact between two superellipsoid particles
+whose surface is implicitly defined as:
+
+.. math::
+
+    f(\mathbf{x}) = \left(
+    \left|\frac{x}{a}\right|^{n_2} + \left|\frac{y}{b}\right|^{n_2}
+    \right)^{n_1 / n_2}
+    + \left|\frac{z}{c}\right|^{n_1} - 1 = 0
+
+for a point :math:`\mathbf{x} = (x, y, z)` where the coordinates are given
+in the reference of the principal directions of inertia of the particle.
+The half-diameters :math:`a`, :math:`b`, and :math:`c` correspond to the *shape*
+property, and the exponents :math:`n_1` and :math:`n_2` to the *block* property
+of the ellipsoid atom. See the doc page for the :doc:`set <set>` command for
+more details.
+
+.. note::
+
+    The contact solver strictly requires convex particle shapes to ensure a mathematically
+    unique point of deepest penetration. Therefore, the blockiness parameters must be
+    :math:`n_1 \ge 2.0` and :math:`n_2 \ge 2.0`. Attempting to simulate concave or "pointy"
+    particles (:math:`n < 2.0`) will result in an error.
+
+.. note::
+
+    For particles with high blockiness exponents (:math:`n > 4.0`) involved in edge-to-edge
+    or corner-to-corner contacts, the surface normal vector varies rapidly over small
+    distances. The Newton solver may occasionally fail to converge to the strict gradient
+    alignment tolerance (typically :math:`10^{-10}`). You may see warning messages in the
+    log indicating that the solver returned a sub-optimal solution, but the simulation will
+    proceed using this best-effort contact point.
+
+Contact detection for these aspherical particles uses the so-called ''midway''
+minimization approach from :ref:`(Houlsby) <Houlsby>`. Considering two
+particles with shape functions,  :math:`F_i` and :math:`F_j`,
+the contact point :math:`\mathbf{X}_0` in the global frame is obtained as:
+
+.. math::
+
+    \mathbf{X}_0 = \underset{\mathbf{X}}{\text{argmin}}
+                   \ F_i(\mathbf{X}) + F_j(\mathbf{X})
+                   \text{, subject to } F_i(\mathbf{X}) = F_j(\mathbf{X})
+
+where the shape function is given by
+:math:`F_i(\mathbf{X}) = f_i(\mathbf{R}_i^T (\mathbf{X} - \mathbf{X}_i))`
+and where :math:`\mathbf{X}_i` and :math:`\mathbf{R}_i` are the center of mass
+and rotation matrix of the particle, respectively. The constrained minimization
+problem is solved using Lagrange multipliers and Newton's method with a line
+search as described by :ref:`(Podlozhnyuk) <Podlozhnyuk>`.
+
+.. note::
+
+    The shape function :math:`F` is not a signed distance function and
+    does not have unit gradient :math:`\|\nabla F \| \neq 1` so that the
+    so-called ''midway'' point is not actually located at an equal distance from
+    the surface of both particles. For contact between non-identical particles,
+    the contact point tends to be closer to the surface of the smaller and
+    blockier particle.
+
+.. note::
+
+    This formulation leads to a 4x4 system of non-linear equations. Tikhonov
+    regularization and step clumping is used to ensure robustness of the direct
+    solver and high convergence rate, even for blocky particles with near flat
+    faces.
+
+The particles overlap if both shape functions are negative at the contact point.
+The contact normal is obtained as: :math:`\mathbf{n}_{ij} = \nabla F_i(\mathbf{X}_0) / \| \nabla F_i(\mathbf{X}_0)\| = - \nabla F_j(\mathbf{X}_0) / \| \nabla F_j(\mathbf{X}_0)\|`
+and the overlap :math:`\delta = \|\mathbf{X}_j^{\mathrm{surf}} - \mathbf{X}_i^{\mathrm{surf}}\|`
+is computed as the distance between the points on the
+particles surfaces that are closest to the contact point in the
+direction of the contact normal: :math:`F_i(\mathbf{X}_i^{\mathrm{surf}} = \mathbf{X}_0 + \lambda_i \mathbf{n}_{ij}) = 0`
+and :math:`F_j(\mathbf{X}_j^{\mathrm{surf}} = \mathbf{X}_0 + \lambda_j \mathbf{n}_{ij}) = 0`.
+Newton's method is used to solve this equation for the scalars
+:math:`\lambda_i` and :math:`\lambda_j` and find the surface points
+:math:`\mathbf{X}_i^{\mathrm{surf}}` and :math:`\mathbf{X}_j^{\mathrm{surf}}`.
+
+.. note::
+    A modified representation of the particle surface is defined as
+    :math:`G(\mathbf{X}) = (F(\mathbf{X})+1)^{1/n_1}-1` which is a quasi-radial distance function formulation.
+    This formulation is used to compute the surface points once the midway contact point is found.
+    This formulation is also used when the *geometric* keyword is specified in the pair_style command and the following optimization problem is solved instead for the contact point:
+    :math:`\mathbf{X}_0 = \underset{\mathbf{X}}{\text{argmin}} \, \left( r_i G_i(\mathbf{X}) + r_j G_j(\mathbf{X}) \right) \text{, subject to } r_i G_i(\mathbf{X}) = r_j G_j(\mathbf{X})`,
+    where :math:`r_i` and :math:`r_j` are the average radii of the two particles.
+    The geometric formulation thus yields a better approximation of the contact point
+    for particles with different sizes, and it is slightly more robust for particles with high *block* exponents,
+    albeit more computationally expensive.
+
+A hierarchical approach is used to limit the cost of contact detection.
+First, intersection of the bounding spheres of the two particles of bounding
+radii :math:`r_i` and :math:`r_j` is checked. If the distance
+between the particles center is more than the sum of the radii
+:math:`\|\mathbf{X}_j - \mathbf{X}_j\| > r_i + r_j`, the particles do not intersect.
+Then, if the bounding spheres intersect, intersection of the oriented
+bounding box is checked. This is done following the equations of
+:ref:`(Eberly) <GeometricTools>`.
+This check is always performed, unless the *no_bounding_box* keyword is used.
+This is advantageous for all particle shapes except for superellipses with
+aspect ratio close to one and both blockiness indexes close to 2.
+
+.. warning::
+
+    The Newton-Raphson minimization used to find the midway contact point can
+    fail to converge if the initial starting guess is too far from the true
+    physical surface. This typically occurs if a user specifies a manual global
+    *cutoff* that is significantly larger than the particles **and** enables the
+    *no_bounding_box* keyword. Under these conditions, the solver attempts to
+    resolve contacts between widely separated particles, which might cause the
+    math to diverge and instantly crashing the simulation. It is strongly
+    recommended to keep bounding box checks enabled if a large cutoff is specified.
+
+----------
+
+This section provides an overview of the various normal, tangential,
+and damping contact models available. For additional context, see the
+discussion in the :doc:`granular pairstyle <pair_granular>` doc page
+which includes all of these options.
+
+The first required keyword for the *pair_coeff* command is the normal
+contact model. Currently supported options for normal contact models
+and their required arguments are:
+
+1. *hooke* : :math:`k_n`, :math:`\eta_{n0}` (or :math:`e`)
+2. *hertz* : :math:`k_n`, :math:`\eta_{n0}` (or :math:`e`)
+
+Here, :math:`k_n` is spring stiffness (with units that depend on model
+choice, see below); :math:`\eta_{n0}` is a damping prefactor (or, in its
+place a coefficient of restitution :math:`e`, depending on the choice of
+damping mode, see below).
+
+For the *hooke* model, the normal, elastic component of force acting
+on particle *i* due to contact with particle *j* is given by:
+
+.. math::
+
+   \mathbf{F}_{ne, Hooke} = k_n \delta_{ij} \mathbf{n}
+
+Where :math:`\delta_{ij}` is the particle overlap, (note the i-j ordering so
+that :math:`\mathbf{F}_{ne}` is positive for repulsion), and :math:`\mathbf{n}`
+is the contact normal vector at the contact point. Therefore, for *hooke*, the units
+of the spring constant :math:`k_n` are *force*\ /\ *distance*, or equivalently
+*mass*\ /*time\^2*.
+
+For the *hertz* model, the normal component of force is given by:
+
+.. math::
+
+   \mathbf{F}_{ne, Hertz} = k_n R_{eff}^{1/2}\delta_{ij}^{3/2} \mathbf{n}
+
+Here, :math:`R_{eff} = R = \frac{R_i R_j}{R_i + R_j}` is the effective radius,
+and :math:`R_i` is the equivalent radius of the i-th particle at the surface
+contact point with the j-th particle. This radius is either the inverse of the
+mean curvature coefficient, :math:`R_i = 2 / (\kappa_1 + \kappa_2)`, or the
+gaussian curvature coefficient :math:`R_i = 1 / \sqrt{\kappa_1 \kappa_2}`, where
+:math:`\kappa_{1,2}` are the principal curvatures of the particle surface at the
+contact point. For *hertz*, the units of the spring constant :math:`k_n` are
+*force*\ /\ *length*\ \^2, or equivalently *pressure*\ .
+
+.. note::
+
+    To ensure numerical stability and preserve physical realism, the computed
+    contact radius is mathematically capped. For highly blocky particles
+    undergoing flat-on-flat contact, the theoretical curvature approaches zero,
+    which would yield an infinite contact radius and cause a force explosion. To
+    prevent this, the maximum contact radius is capped at the physical bounding
+    radius of the smallest interacting particle. Conversely, for sharp corner
+    contacts where curvature approaches infinity, the calculated radius would
+    drop to zero, eliminating the repulsive force entirely. The contact radius
+    is therefore lower-bounded by a minimum fraction of the physical radius
+    (:math:`10^{-4} \min(r_i, r_j)`) to prevent particles from unphysically interpenetrating.
+
+In addition, the normal force is augmented by a damping term of the
+following general form:
+
+.. math::
+
+   \mathbf{F}_{n,damp} = -\eta_n \mathbf{v}_{n,rel}
+
+Here, :math:`\mathbf{v}_{n,rel} = (\mathbf{v}_j - \mathbf{v}_i) \cdot
+\mathbf{n}\ \mathbf{n}` is the component of relative velocity along
+:math:`\mathbf{n}`.
+
+The optional *damping* keyword to the *pair_coeff* command followed by a keyword
+determines the model form of the damping factor :math:`\eta_n`, and the
+interpretation of the :math:`\eta_{n0}` or :math:`e` coefficients specified as
+part of the normal contact model settings. The *damping* keyword and
+corresponding model form selection may be appended anywhere in the *pair coeff*
+command.  Note that the choice of damping model affects both the normal and
+tangential damping.  The options for the damping model currently supported are:
+
+1. *mass_velocity*
+2. *viscoelastic*
+
+If the *damping* keyword is not specified, the *viscoelastic* model is
+used by default.
+
+For *damping mass_velocity*, the normal damping is given by:
+
+.. math::
+
+   \eta_n = \eta_{n0} m_{eff}
+
+Here, :math:`\eta_{n0}` is the damping coefficient specified for the normal
+contact model, in units of 1/\ *time* and
+:math:`m_{eff} = m_i m_j/(m_i + m_j)` is the effective mass.
+Use *damping mass_velocity* to reproduce the damping behavior of
+*pair gran/hooke/\**.
+
+The *damping viscoelastic* model is based on the viscoelastic
+treatment of :ref:`(Brilliantov et al) <Brill1996>`, where the normal
+damping is given by:
+
+.. math::
+
+   \eta_n = \eta_{n0}\ a m_{eff}
+
+Here, *a* is the contact radius, given by :math:`a =\sqrt{R\delta}`
+for all models.  For *damping viscoelastic*,
+:math:`\eta_{n0}` is in units of 1/(\ *time*\ \*\ *distance*\ ).
+
+The total normal force is computed as the sum of the elastic and
+damping components:
+
+.. math::
+
+   \mathbf{F}_n = \mathbf{F}_{ne} + \mathbf{F}_{n,damp}
+
+----------
+
+The *pair_coeff* command also requires specification of the tangential
+contact model. The required keyword *tangential* is expected, followed
+by the model choice and associated parameters. Currently there is only
+one supported tangential model with expected parameters as follows:
+
+1. *linear_history* : :math:`k_t`, :math:`x_{\gamma,t}`, :math:`\mu_s`
+
+Here, :math:`x_{\gamma,t}` is a dimensionless multiplier for the normal
+damping :math:`\eta_n` that determines the magnitude of the tangential
+damping, :math:`\mu_t` is the tangential (or sliding) friction
+coefficient, and :math:`k_t` is the tangential stiffness coefficient.
+
+The tangential damping force :math:`\mathbf{F}_\mathrm{t,damp}` is given by:
+
+.. math::
+
+   \mathbf{F}_\mathrm{t,damp} = -\eta_t \mathbf{v}_{t,rel}
+
+The tangential damping prefactor :math:`\eta_t` is calculated by scaling
+the normal damping :math:`\eta_n` (see above):
+
+.. math::
+
+   \eta_t = -x_{\gamma,t} \eta_n
+
+The normal damping prefactor :math:`\eta_n` is determined by the choice
+of the *damping* keyword, as discussed above.  Thus, the *damping*
+keyword also affects the tangential damping.  The parameter
+:math:`x_{\gamma,t}` is a scaling coefficient. Several works in the
+literature use :math:`x_{\gamma,t} = 1` (:ref:`Marshall <Marshall2009>`,
+:ref:`Tsuji et al <Tsuji1992>`, :ref:`Silbert et al <Silbert2001>`).  The relative
+tangential velocity at the point of contact is given by
+:math:`\mathbf{v}_{t, rel} = \mathbf{v}_{t} - (R_i\boldsymbol{\Omega}_i + R_j\boldsymbol{\Omega}_j) \times \mathbf{n}`, where :math:`\mathbf{v}_{t} = \mathbf{v}_r - \mathbf{v}_r\cdot\mathbf{n}\ \mathbf{n}`,
+:math:`\mathbf{v}_r = \mathbf{v}_j - \mathbf{v}_i` .
+The direction of the applied force is :math:`\mathbf{t} = \mathbf{v_{t,rel}}/\|\mathbf{v_{t,rel}}\|` .
+
+The normal force value :math:`F_{n0}` used to compute the critical force
+depends on the form of the contact model. It is given by the magnitude of
+the normal force:
+
+.. math::
+
+   F_{n0} = \|\mathbf{F}_n\|
+
+The remaining tangential options all use accumulated tangential
+displacement (i.e. contact history).
+The accumulated tangential displacement is discussed in details below
+in the context of the *linear_history* option. The same treatment of
+the accumulated displacement will apply to other (future) options as well.
+
+For *tangential linear_history*, the tangential force is given by:
+
+.. math::
+
+   \mathbf{F}_t =  -\min(\mu_t F_{n0}, \|-k_t\mathbf{\xi} + \mathbf{F}_\mathrm{t,damp}\|) \mathbf{t}
+
+Here, :math:`\mathbf{\xi}` is the tangential displacement accumulated
+during the entire duration of the contact:
+
+.. math::
+
+   \mathbf{\xi} = \int_{t0}^t \mathbf{v}_{t,rel}(\tau) \mathrm{d}\tau
+
+This accumulated tangential displacement must be adjusted to account
+for changes in the frame of reference of the contacting pair of
+particles during contact. This occurs due to the overall motion of the
+contacting particles in a rigid-body-like fashion during the duration
+of the contact. There are two modes of motion that are relevant: the
+'tumbling' rotation of the contacting pair, which changes the
+orientation of the plane in which tangential displacement occurs; and
+'spinning' rotation of the contacting pair about the vector connecting
+their centers of mass (:math:`\mathbf{n}`).  Corrections due to the
+former mode of motion are made by rotating the accumulated
+displacement into the plane that is tangential to the contact vector
+at each step, or equivalently removing any component of the tangential
+displacement that lies along :math:`\mathbf{n}`, and rescaling to
+preserve the magnitude.  This follows the discussion in
+:ref:`Luding <Luding2008>`, see equation 17 and relevant discussion in that
+work:
+
+.. math::
+
+   \mathbf{\xi} = \left(\mathbf{\xi'} - (\mathbf{n} \cdot \mathbf{\xi'})\mathbf{n}\right) \frac{\|\mathbf{\xi'}\|}{\|\mathbf{\xi'} - (\mathbf{n}\cdot\mathbf{\xi'})\mathbf{n}\|}
+
+Here, :math:`\mathbf{\xi'}` is the accumulated displacement prior to the
+current time step and :math:`\mathbf{\xi}` is the corrected
+displacement. Corrections to the displacement due to the second mode
+of motion described above (rotations about :math:`\mathbf{n}`) are not
+currently implemented, but are expected to be minor for most
+simulations.
+
+Furthermore, when the tangential force exceeds the critical force, the
+tangential displacement is re-scaled to match the value for the
+critical force (see :ref:`Luding <Luding2008>`, equation 20 and related
+discussion):
+
+.. math::
+
+   \mathbf{\xi} = -\frac{1}{k_t}\left(\mu_t F_{n0}\mathbf{t} - \mathbf{F}_{t,damp}\right)
+
+The tangential force is added to the total normal force (elastic plus
+damping) to produce the total force on the particle.
+
+Unlike perfect spheres, the surface normal at the contact point of a superellipsoid
+does not generally pass through the particle's center of mass. Therefore, both the
+normal and tangential forces act at the contact point to induce a torque on each
+particle.
+
+Using the exact contact point :math:`\mathbf{X}_0` determined by the geometric solver,
+the branch vectors from the particle centers of mass to the contact point are
+defined as :math:`\mathbf{r}_{ci} = \mathbf{X}_0 - \mathbf{x}_i` and
+:math:`\mathbf{r}_{cj} = \mathbf{X}_0 - \mathbf{x}_j`. The resulting torques
+are calculated as:
+
+.. math::
+
+   \mathbf{\tau}_i = \mathbf{r}_{ci} \times \mathbf{F}_{tot}
+
+.. math::
+
+   \mathbf{\tau}_j = -\mathbf{r}_{cj} \times \mathbf{F}_{tot}
+
+----------
+
+If two particles are moving away from each other while in contact, there
+is a possibility that the particles could experience an effective attractive
+force due to damping. If the optional *limit_damping* keyword is used, this option
+will zero out the normal component of the force if there is an effective
+attractive force.
+
+----------
+
+LAMMPS automatically sets pairwise cutoff values for *pair_style
+granular/superellipsoid* based on particle radii. In the vast majority of situations,
+this is adequate. However, a cutoff value can optionally be appended
+to the *pair_style granular/superellipsoid* command to specify a global cutoff (i.e.
+a cutoff for all atom types). This option may be useful in some rare
+cases where the automatic cutoff determination is not sufficient.
+
+----------
+
+Mixing, shift, table, tail correction, restart, rRESPA info
+"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
+
+The :doc:`pair_modify <pair_modify>` mix, shift, table, and tail options
+are not relevant for granular pair styles.
+
+Mixing of coefficients is carried out using geometric averaging for
+most quantities, e.g. if friction coefficient for type 1-type 1
+interactions is set to :math:`\mu_1`, and friction coefficient for type
+2-type 2 interactions is set to :math:`\mu_2`, the friction coefficient
+for type1-type2 interactions is computed as :math:`\sqrt{\mu_1\mu_2}`
+(unless explicitly specified to a different value by a *pair_coeff 1 2
+...* command).
+
+These pair styles write their information to :doc:`binary restart files <restart>`,
+so a pair_style command does not need to be specified in an input script that reads
+a restart file.
+
+These pair styles can only be used via the *pair* keyword of the
+:doc:`run_style respa <run_style>` command.  They do not support the
+*inner*, *middle*, *outer* keywords.
+
+The single() function of these pair styles returns 0.0 for the energy of a
+pairwise interaction, since energy is not conserved in these dissipative
+potentials.  It also returns only the normal component of the pairwise
+interaction force.
+
+----------
+
+Restrictions
+""""""""""""
+
+The *atom_style* must be set to *ellipsoid superellipsoid* to enable superellipsoid
+particles' shape parameters (3 lengths and two blockiness parameters), see
+:doc:`atom_style <atom_style>` for more details.
+
+This pair style require Newton's third law be set to *off* for pair interactions..
+
+There are currently no versions of *fix wall/gran* or *fix wall/gran/region* that
+are compatible with the superellipsoid particles.
+
+This pair style is part of the ASPHERE package.  It is
+only enabled if LAMMPS was built with that package.
+See the :doc:`Build package <Build_package>` page for more info.
+
+This pair style requires that atoms store per-particle bounding radius, shapes, blockiness, inertia,
+torque, and angular momentum (omega) as defined by the
+:doc:`atom_style ellipsoid superellipsoid <atom_style>`.
+
+This pair style requires you to use the :doc:`comm_modify vel yes <comm_modify>`
+command so that velocities are stored by ghost atoms.
+
+This pair style will not restart exactly when using the
+:doc:`read_restart <read_restart>` command, though it should provide
+statistically similar results.  This is because the forces it
+computes depend on atom velocities and the atom velocities have
+been propagated half a timestep between the force computation and
+when the restart is written, due to using Velocity Verlet time
+integration. See the :doc:`read_restart <read_restart>` command
+for more details.
+
+Accumulated values for individual contacts are saved to restart
+files but are not saved to data files. Therefore, forces may
+differ significantly when a system is reloaded using the
+:doc:`read_data <read_data>` command.
+
+Related commands
+""""""""""""""""
+
+:doc:`pair_coeff <pair_coeff>`
+:doc:`pair granular <pair_granular>`
+
+Default
+"""""""
+
+For the *pair_coeff* settings: *damping viscoelastic*
+
+References
+""""""""""
+
+.. _Brill1996:
+
+**(Brilliantov et al, 1996)** Brilliantov, N. V., Spahn, F., Hertzsch,
+J. M., & Poschel, T. (1996).  Model for collisions in granular
+gases. Physical review E, 53(5), 5382.
+
+.. _Luding2008:
+
+**(Luding, 2008)** Luding, S. (2008). Cohesive, frictional powders:
+contact models for tension. Granular matter, 10(4), 235.
+
+.. _Marshall2009:
+
+**(Marshall, 2009)** Marshall, J. S. (2009). Discrete-element modeling
+of particulate aerosol flows.  Journal of Computational Physics,
+228(5), 1541-1561.
+
+.. _Silbert2001:
+
+**(Silbert, 2001)** Silbert, L. E., Ertas, D., Grest, G. S., Halsey,
+T. C., Levine, D., & Plimpton, S. J. (2001).  Granular flow down an
+inclined plane: Bagnold scaling and rheology. Physical Review E,
+64(5), 051302.
+
+
+.. _Thornton1991:
+
+**(Thornton, 1991)** Thornton, C. (1991). Interparticle sliding in the
+presence of adhesion.  J. Phys. D: Appl. Phys. 24 1942
+
+.. _Thornton2013:
+
+**(Thornton et al, 2013)** Thornton, C., Cummins, S. J., & Cleary,
+P. W. (2013).  An investigation of the comparative behavior of
+alternative contact force models during inelastic collisions. Powder
+Technology, 233, 30-46.
+
+.. _WaltonPC:
+
+**(Otis R. Walton)** Walton, O.R., Personal Communication
+
+.. _Podlozhnyuk:
+
+**(Podlozhnyuk)** Podlozhnyuk, Pirker, Kloss, Comp. Part. Mech., 4:101-118 (2017).
+
+.. _Houlsby:
+
+**(Houlsby)** Houlsby, Computers and Geotechnics, 36, 953-959 (2009).
+
+.. _GeometricTools:
+
+**(Eberly)** Eberly, Geometric Tools: Dynamic Collision Detection Using Oriented Bounding Boxes (2008).
+
diff --git a/doc/src/pair_style.rst b/doc/src/pair_style.rst
index 4cbe1c7d965..fdc54304cea 100644
--- a/doc/src/pair_style.rst
+++ b/doc/src/pair_style.rst
@@ -201,9 +201,10 @@ accelerated styles exist.
 * :doc:`gauss/cut <pair_gauss>` - generalized Gaussian potential
 * :doc:`gayberne <pair_gayberne>` - Gay-Berne ellipsoidal potential
 * :doc:`granular <pair_granular>` - Generalized granular potential
+* :doc:`granular/superellipsoid <pair_granular_superellipsoid>` - Generalized granular potential for superellipsoids
 * :doc:`gran/hertz/history <pair_gran>` - granular potential with Hertzian interactions
-* :doc:`gran/hooke <pair_gran>` - granular potential with history effects
-* :doc:`gran/hooke/history <pair_gran>` - granular potential without history effects
+* :doc:`gran/hooke <pair_gran>` - granular potential without history effects
+* :doc:`gran/hooke/history <pair_gran>` - granular potential with history effects
 * :doc:`gw <pair_gw>` - Gao-Weber potential
 * :doc:`gw/zbl <pair_gw>` - Gao-Weber potential with a repulsive ZBL core
 * :doc:`harmonic/cut <pair_harmonic_cut>` - repulsive-only harmonic potential
diff --git a/doc/src/read_data.rst b/doc/src/read_data.rst
index 53de3a2a5c1..645d27a75f3 100644
--- a/doc/src/read_data.rst
+++ b/doc/src/read_data.rst
@@ -1328,18 +1328,21 @@ and a general discussion of how type labels can be used.
 
 * one line per ellipsoid
 * line syntax: atom-ID shapex shapey shapez quatw quati quatj quatk
+* line syntax (*superellipsoids*): atom-ID shapex shapey shapez quatw quati quatj quatk block1 block2
 
   .. parsed-literal::
 
        atom-ID = ID of atom which is an ellipsoid
        shapex,shapey,shapez = 3 diameters of ellipsoid (distance units)
        quatw,quati,quatj,quatk = quaternion components for orientation of atom
+       block1,block2 = 2 blockiness parameters for superellipsoids only
 
-* example:
+* examples:
 
   .. parsed-literal::
 
        12 1 2 1 1 0 0 0
+       12 1 2 1 1 0 0 0 2 2
 
 The *Ellipsoids* section must appear if :doc:`atom_style ellipsoid
 <atom_style>` is used and any atoms are listed in the *Atoms* section
@@ -1362,6 +1365,16 @@ the quaternion that represents its new orientation is given by
 LAMMPS normalizes each atom's quaternion in case (a,b,c) is not
 specified as a unit vector.
 
+.. versionadded:: TBD
+
+The blockiness values *block1*, *block2* generalize the geometry to a
+super ellipsoid for use in granular simulations. Sections through the
+center and parallel to the z-axis are superellipses with squareness
+*block1* and sections in the x-y plane are superellipses with squareness
+*block2*.  These parameters are optional and default to a value of 2,
+recovering ellipsoid geometry.  When specified, both values must be
+greater than or equal to 2.
+
 If the data file defines a general triclinic box, then the quaternion
 for each ellipsoid should be specified for its orientation relative to
 the standard x,y,z coordinate axes.  When the system is converted to a
diff --git a/doc/src/set.rst b/doc/src/set.rst
index 5a6995ee09b..6af2b15a93d 100644
--- a/doc/src/set.rst
+++ b/doc/src/set.rst
@@ -23,7 +23,7 @@ Syntax
 
 * one or more keyword/value pairs may be appended
 
-* keyword = *angle* or *angmom* or *apip/lambda* or *bond* or *cc* or *charge*
+* keyword = *angle* or *angmom* or *apip/lambda* or *block* or *bond* or *cc* or *charge*
   or *density* or *density/disc* or *diameter* or *dihedral* or *dipole*
   or *dipole/random* or *dpd/theta* or *edpd/cv* or *edpd/temp* or
   *epsilon* or *image* or *improper* or *length* or *mass* or *mol* or
@@ -45,6 +45,8 @@ Syntax
          fast = switching parameter of fast potential (1)
          precise = switching parameter of fast potential (0)
          float = constant float or atom-style variable (between 0 and 1)
+       *block* value = block1, block2
+         block1,block2 = 2 blockiness parameters for superellipsoids
        *bond* value = numeric bond type or bond type label, for all bonds between selected atoms
        *cc* values = index cc
          index = index of a chemical species (1 to Nspecies)
@@ -182,6 +184,7 @@ Examples
    set atom * charge v_atomfile
    set atom 100*200 x 0.5 y 1.0
    set atom 100 vx 0.0 vy 0.0 vz -1.0
+   set atom 200 shape 1.5 2.0 4.0 block 2.0 4.0
    set atom 1492 type 3
    set atom 1492 type H
    set atom * i_myVal 5
@@ -538,6 +541,25 @@ other. Note that the SPH smoothing kernel diameter used for computing
 long range, nonlocal interactions, is set using the *diameter*
 keyword.
 
+.. versionadded:: TBD
+
+Keyword *block* sets the blockiness of the selected atoms.  The
+particles must be ellipsoids as defined by the :doc:`atom_style
+ellipsoid <atom_style>` command.  This command is used to define
+superellipsoid particle shapes for use in granular simulations.  The
+*block1*, *block2* settings are the 2 exponents of the superellipsoid in
+the vertical and horizontal directions.  Vertical sections through the
+center are superellipses with squareness *block1* and horizontal
+sections are superellipses with squareness *block2*.  If both parameters
+are set to a value of 2 (the default), the atom is a regular ellipsoid.
+The keyword *block* should be used together with the keyword *shape* to
+give the particle the desired shape.  If the keyword *block* is given
+alone, and the *shape* has not been defined, e.g., in a previous *set*
+command, the 3 diameters would be set to a value of 1 internally.  Note
+that this command does not adjust the particle mass, even if it was
+defined with a density, e.g. via the :doc:`read_data <read_data>`
+command.
+
 Keyword *smd/mass/density* sets the mass of all selected particles,
 but it is only applicable to the Smooth Mach Dynamics package MACHDYN.
 It assumes that the particle volume has already been correctly set and
diff --git a/examples/ASPHERE/superellipsoid_gran/in.bowling b/examples/ASPHERE/superellipsoid_gran/in.bowling
new file mode 100644
index 00000000000..83e4f42dc35
--- /dev/null
+++ b/examples/ASPHERE/superellipsoid_gran/in.bowling
@@ -0,0 +1,59 @@
+units           si
+atom_style      ellipsoid superellipsoid
+dimension       3
+boundary p p p
+comm_modify vel yes
+newton off
+
+region          box block 0 10 0 10 0 10
+create_box      2 box
+
+# Pins
+create_atoms 1 single 5 5 5
+
+create_atoms 1 single 4 6 5
+create_atoms 1 single 6 6 5
+
+create_atoms 1 single 3 7 5
+create_atoms 1 single 5 7 5
+create_atoms 1 single 7 7 5
+
+create_atoms 1 single 2 8 5
+create_atoms 1 single 4 8 5
+create_atoms 1 single 6 8 5
+create_atoms 1 single 8 8 5
+
+set             type 1 shape 1.0 1.0 4.0
+set             type 1 block 8.0 2.0
+set             type 1 mass 1.0
+
+# Bowling ellipsoids
+create_atoms 2 single 5.5 1.5 6
+set             type 2 shape 2.0 2.0 1.0
+set             type 2 block 2.0 2.0
+set             type 2 mass 10.0
+group ball type 2
+
+pair_style      granular/superellipsoid
+pair_coeff      * * hooke 1000.0 0.0 tangential linear_history 285 0.0 0.5 damping mass_velocity
+
+compute diameter all property/atom shapex shapey shapez
+compute orient all property/atom quatw quati quatj quatk
+compute block all property/atom block1 block2
+# Ovito uses the reciprocal exponents for the blockiness
+# https://docs.ovito.org/advanced_topics/aspherical_particles.html#howto-aspherical-particles-superquadrics
+# Define atom variables from block
+variable phi atom "2/c_block[2]"
+variable theta atom "2/c_block[1]"
+
+# dump mydump all custom 1 shapes.lammpstrj id type x y z fx fy fz tqx tqy tqz c_diameter[*] c_orient[*] v_phi v_theta
+# # Ovito maps c_orient[*] on its XYZW axes, which is not correct. Map components explicitly
+# dump_modify mydump colname c_orient[1] quatw colname c_orient[2] quati colname c_orient[3] quatj colname c_orient[4] quatk
+
+set group ball angmom 1.0 0.0 0.5 vy 7.0
+
+fix 3 all nve/asphere
+
+thermo 10
+timestep 0.01
+run             100
diff --git a/examples/ASPHERE/superellipsoid_gran/in.drop_test b/examples/ASPHERE/superellipsoid_gran/in.drop_test
new file mode 100644
index 00000000000..3d75b1479b9
--- /dev/null
+++ b/examples/ASPHERE/superellipsoid_gran/in.drop_test
@@ -0,0 +1,82 @@
+# Lattice wall drop test
+
+units           si
+atom_style      ellipsoid superellipsoid
+dimension       3
+boundary        p p f 
+comm_modify     vel yes
+newton          off
+processors      * * 1
+
+neighbor 0.5 bin
+neigh_modify   delay 0 every 1 check yes
+
+# Setup Simulation Box
+variable        box_length equal 9
+variable        box_height equal 15
+
+region          box block 0 ${box_length} 0 ${box_length} 0 ${box_height}
+create_box      2 box
+
+# Create Lattice Wall (Type 1)
+lattice         sc 1.0
+region          floor_reg block 0 ${box_length} 0 ${box_length} 0 0.1
+
+variable        insertion_length equal ${box_length}-1.0
+
+create_atoms    1 region floor_reg
+
+# Create Falling Particles (Type 2)
+region          drop_zone block 1 ${insertion_length} 1 ${insertion_length} 5 ${box_height}
+create_atoms    2 random 100 12345 drop_zone overlap 2.0 maxtry 100
+
+# Material Properties
+# Type 1: Wall particles
+set             type 1 mass 1.0
+set             type 1 shape 1.0 1.0 1.0
+
+# Type 2: Falling particles
+set             type 2 mass 1.0
+set             type 2 shape 2.0 1.0 1.0
+variable        blockiness equal 4.0
+set             type 1 block 4.0 4.0 
+set             type 2 block ${blockiness} ${blockiness} 
+set             type 2 quat/random 84729
+
+# Define Groups
+group           wall type 1
+group           mobile type 2
+
+# Interaction / Pair Style
+pair_style granular/superellipsoid curvature_gaussian
+pair_coeff * * hertz 10000.0 200.0 tangential classic 2850 0.0 0.5 damping viscoelastic
+# Computes and Output
+compute         diameter all property/atom shapex shapey shapez
+compute         orient all property/atom quatw quati quatj quatk
+compute         block all property/atom block1 block2
+
+# Define atom variables from block
+variable        phi atom "2/c_block[2]"
+variable        theta atom "2/c_block[1]"
+
+# dump            mydump all custom 1000 dump_drop_algebraic_block_${blockiness}_new_code.lammpstrj id type x y z fx fy fz tqx tqy tqz c_diameter[*] c_orient[*] v_phi v_theta
+# # Explicit mapping for Ovito
+# dump_modify     mydump colname c_orient[1] quatw colname c_orient[2] quati colname c_orient[3] quatj colname c_orient[4] quatk
+
+# Prevent the frozen wall particles (type 1) from interacting with each other
+neigh_modify exclude group wall wall
+
+# Apply gravity only to the mobile particles (Type 2)
+fix             2 mobile gravity 9.81 vector 0 0 -1
+
+# Integrate Equations of Motion
+fix             3 mobile nve/asphere
+
+compute rke all erotate/asphere
+
+# Run
+thermo          1000
+thermo_style  custom time step ke c_rke 
+
+timestep        0.0001
+run             20000
diff --git a/examples/ASPHERE/superellipsoid_gran/in.ellipsoid_gran b/examples/ASPHERE/superellipsoid_gran/in.ellipsoid_gran
new file mode 100644
index 00000000000..b8f6358e9ed
--- /dev/null
+++ b/examples/ASPHERE/superellipsoid_gran/in.ellipsoid_gran
@@ -0,0 +1,45 @@
+units           si
+atom_style      ellipsoid superellipsoid
+dimension       3
+boundary p p p
+comm_modify vel yes
+newton off
+# create big ellipsoidal particles
+
+region          box block 0 10 0 10 0 10
+create_box      2 box
+create_atoms 1 single 5 5 4.5
+create_atoms 1 single 5 5 6
+group bot id 1
+group top id 2
+#create_atoms    1 region box
+
+set             type 1 mass 1.0
+set             type 1 shape 2.0 1.0 1.0
+set             type 1 block 2.0 2.0
+
+pair_style      granular/superellipsoid
+pair_coeff      * * hooke 1000.0 0.0 tangential linear_history 285 0.0 0.5 damping mass_velocity
+# Hertz model instead
+# pair_coeff      * * hertz 1000.0 0.0 tangential linear_history 285.714 0.0 0.5 damping viscoelastic
+
+compute diameter all property/atom shapex shapey shapez
+compute orient all property/atom quatw quati quatj quatk
+compute block all property/atom block1 block2
+# Ovito uses the reciprocal exponents for the blockiness
+# https://docs.ovito.org/advanced_topics/aspherical_particles.html#howto-aspherical-particles-superquadrics
+# Define atom variables from block
+variable phi atom "2/c_block[2]"
+variable theta atom "2/c_block[1]"
+
+# dump mydump all custom 10 dump.lammpstrj id x y z fx fy fz tqx tqy tqz c_diameter[*] c_orient[*] v_phi v_theta
+# # Ovito maps c_orient[*] on its XYZW axes, which is not correct. Map components explicitly
+# dump_modify mydump colname c_orient[1] quatw colname c_orient[2] quati colname c_orient[3] quatj colname c_orient[4] quatk
+
+fix 1 bot freeze
+fix 2 top gravity 9.81 vector 0 0 -1
+fix 3 all nve/asphere
+
+thermo 10
+timestep 0.001
+run             3000
diff --git a/src/.gitignore b/src/.gitignore
index ef50bf2b835..7e7e7198c72 100644
--- a/src/.gitignore
+++ b/src/.gitignore
@@ -1200,6 +1200,8 @@
 /kissfft.h
 /lj_spica_common.h
 /math_complex.h
+/math_extra_superellipsoids.cpp
+/math_extra_superellipsoids.h
 /math_vector.h
 /message.cpp
 /message.h
@@ -1337,6 +1339,8 @@
 /pair_gayberne.h
 /pair_granular.cpp
 /pair_granular.h
+/pair_granular_superellipsoid.cpp
+/pair_granular_superellipsoid.h
 /pair_gran_easy.cpp
 /pair_gran_easy.h
 /pair_gran_hertz_history.cpp
diff --git a/src/ASPHERE/compute_erotate_asphere.cpp b/src/ASPHERE/compute_erotate_asphere.cpp
index 95a323b4e3f..4608b41319e 100644
--- a/src/ASPHERE/compute_erotate_asphere.cpp
+++ b/src/ASPHERE/compute_erotate_asphere.cpp
@@ -79,7 +79,11 @@ double ComputeERotateAsphere::compute_scalar()
   invoked_scalar = update->ntimestep;
 
   AtomVecEllipsoid::Bonus *ebonus = nullptr;
-  if (avec_ellipsoid) ebonus = avec_ellipsoid->bonus;
+  AtomVecEllipsoid::BonusSuper *ebonus_super = nullptr;
+  if (avec_ellipsoid) {
+    if (atom->superellipsoid_flag) ebonus_super = avec_ellipsoid->bonus_super;
+    else ebonus = avec_ellipsoid->bonus;
+  }
   AtomVecLine::Bonus *lbonus = nullptr;
   if (avec_line) lbonus = avec_line->bonus;
   AtomVecTri::Bonus *tbonus = nullptr;
@@ -98,22 +102,31 @@ double ComputeERotateAsphere::compute_scalar()
   // no point particles since divide by inertia
 
   double length;
-  double *shape, *quat;
+  double *shape, *quat, *block;
   double wbody[3], inertia[3];
   double rot[3][3];
   double erotate = 0.0;
 
   for (int i = 0; i < nlocal; i++)
     if (mask[i] & groupbit) {
-      if (ellipsoid && ebonus && (ellipsoid[i] >= 0)) {
-        shape = ebonus[ellipsoid[i]].shape;
-        quat = ebonus[ellipsoid[i]].quat;
-
-        // principal moments of inertia
-
-        inertia[0] = rmass[i] * (shape[1]*shape[1]+shape[2]*shape[2]) / 5.0;
-        inertia[1] = rmass[i] * (shape[0]*shape[0]+shape[2]*shape[2]) / 5.0;
-        inertia[2] = rmass[i] * (shape[0]*shape[0]+shape[1]*shape[1]) / 5.0;
+      if (ellipsoid && (ebonus || ebonus_super) && (ellipsoid[i] >= 0)) {
+
+        if (atom->superellipsoid_flag) {
+          shape = ebonus_super[ellipsoid[i]].shape;
+          quat = ebonus_super[ellipsoid[i]].quat;
+          block = ebonus_super[ellipsoid[i]].block;
+          // principal moments of inertia
+          inertia[0] = ebonus_super[ellipsoid[i]].inertia[0];
+          inertia[1] = ebonus_super[ellipsoid[i]].inertia[1];
+          inertia[2] = ebonus_super[ellipsoid[i]].inertia[2];
+        } else {
+          shape = ebonus[ellipsoid[i]].shape;
+          quat = ebonus[ellipsoid[i]].quat;
+          // principal moments of inertia
+          inertia[0] = rmass[i] * (shape[1]*shape[1]+shape[2]*shape[2]) / 5.0;
+          inertia[1] = rmass[i] * (shape[0]*shape[0]+shape[2]*shape[2]) / 5.0;
+          inertia[2] = rmass[i] * (shape[0]*shape[0]+shape[1]*shape[1]) / 5.0;
+        }
 
         // wbody = angular velocity in body frame
 
diff --git a/src/ASPHERE/compute_temp_asphere.cpp b/src/ASPHERE/compute_temp_asphere.cpp
index d99d9f30c8c..d55ab5dc341 100644
--- a/src/ASPHERE/compute_temp_asphere.cpp
+++ b/src/ASPHERE/compute_temp_asphere.cpp
@@ -35,6 +35,7 @@ using namespace LAMMPS_NS;
 enum { ROTATE, ALL };
 static constexpr double INERTIA = 0.2;    // moment of inertia prefactor for ellipsoid
 
+
 /* ---------------------------------------------------------------------- */
 
 ComputeTempAsphere::ComputeTempAsphere(LAMMPS *lmp, int narg, char **arg) :
@@ -185,17 +186,11 @@ void ComputeTempAsphere::dof_compute()
 }
 
 /* ---------------------------------------------------------------------- */
-
-double ComputeTempAsphere::compute_scalar()
+template<bool is_super>
+void ComputeTempAsphere::compute_scalar_templated(double &t)
 {
-  invoked_scalar = update->ntimestep;
-
-  if (tempbias) {
-    if (tbias->invoked_scalar != update->ntimestep) tbias->compute_scalar();
-    tbias->remove_bias_all();
-  }
-
   AtomVecEllipsoid::Bonus *bonus = avec->bonus;
+  AtomVecEllipsoid::BonusSuper *bonus_super = avec->bonus_super;
   double **v = atom->v;
   double **angmom = atom->angmom;
   double *rmass = atom->rmass;
@@ -203,67 +198,67 @@ double ComputeTempAsphere::compute_scalar()
   int *mask = atom->mask;
   int nlocal = atom->nlocal;
 
-  double *shape,*quat;
+  double *shape, *quat;
   double wbody[3],inertia[3];
   double rot[3][3];
 
   // sum translational and rotational energy for each particle
   // no point particles since divide by inertia
+  for (int i = 0; i < nlocal; i++) {
+    if (mask[i] & groupbit) {
 
-  double t = 0.0;
-
-  if (mode == ALL) {
-    for (int i = 0; i < nlocal; i++)
-      if (mask[i] & groupbit) {
+      if (mode == ALL) {
         t += (v[i][0]*v[i][0] + v[i][1]*v[i][1] + v[i][2]*v[i][2]) * rmass[i];
+      }
 
-        // principal moments of inertia
-
-        shape = bonus[ellipsoid[i]].shape;
-        quat = bonus[ellipsoid[i]].quat;
-
-        inertia[0] = INERTIA*rmass[i] * (shape[1]*shape[1]+shape[2]*shape[2]);
-        inertia[1] = INERTIA*rmass[i] * (shape[0]*shape[0]+shape[2]*shape[2]);
-        inertia[2] = INERTIA*rmass[i] * (shape[0]*shape[0]+shape[1]*shape[1]);
-
-        // wbody = angular velocity in body frame
-
-        MathExtra::quat_to_mat(quat,rot);
-        MathExtra::transpose_matvec(rot,angmom[i],wbody);
-        wbody[0] /= inertia[0];
-        wbody[1] /= inertia[1];
-        wbody[2] /= inertia[2];
+      int j = ellipsoid[i];
 
-        t += inertia[0]*wbody[0]*wbody[0] +
-          inertia[1]*wbody[1]*wbody[1] + inertia[2]*wbody[2]*wbody[2];
+      if (is_super) {
+        quat = bonus_super[j].quat;
+        // principal moments of inertia
+        inertia[0] = bonus_super[j].inertia[0];
+        inertia[1] = bonus_super[j].inertia[1];
+        inertia[2] = bonus_super[j].inertia[2];
+      } else {
+        quat = bonus[j].quat;
+        shape = bonus[j].shape;
+        // principal moments of inertia
+        inertia[0] = INERTIA*rmass[i] * (shape[1]*shape[1] + shape[2]*shape[2]);
+        inertia[1] = INERTIA*rmass[i] * (shape[0]*shape[0] + shape[2]*shape[2]);
+        inertia[2] = INERTIA*rmass[i] * (shape[0]*shape[0] + shape[1]*shape[1]);
       }
 
-  } else {
-    for (int i = 0; i < nlocal; i++)
-      if (mask[i] & groupbit) {
 
-        // principal moments of inertia
+      MathExtra::quat_to_mat(quat, rot);
+      MathExtra::transpose_matvec(rot, angmom[i], wbody);
+      // wbody = angular velocity in body frame
+      wbody[0] /= inertia[0];
+      wbody[1] /= inertia[1];
+      wbody[2] /= inertia[2];
 
-        shape = bonus[ellipsoid[i]].shape;
-        quat = bonus[ellipsoid[i]].quat;
+      t += inertia[0]*wbody[0]*wbody[0] +
+           inertia[1]*wbody[1]*wbody[1] +
+           inertia[2]*wbody[2]*wbody[2];
+    }
+  }
+}
 
-        inertia[0] = INERTIA*rmass[i] * (shape[1]*shape[1]+shape[2]*shape[2]);
-        inertia[1] = INERTIA*rmass[i] * (shape[0]*shape[0]+shape[2]*shape[2]);
-        inertia[2] = INERTIA*rmass[i] * (shape[0]*shape[0]+shape[1]*shape[1]);
 
-        // wbody = angular velocity in body frame
 
-        MathExtra::quat_to_mat(quat,rot);
-        MathExtra::transpose_matvec(rot,angmom[i],wbody);
-        wbody[0] /= inertia[0];
-        wbody[1] /= inertia[1];
-        wbody[2] /= inertia[2];
+double ComputeTempAsphere::compute_scalar()
+{
+  invoked_scalar = update->ntimestep;
 
-        t += inertia[0]*wbody[0]*wbody[0] +
-          inertia[1]*wbody[1]*wbody[1] + inertia[2]*wbody[2]*wbody[2];
-      }
+  if (tempbias) {
+    if (tbias->invoked_scalar != update->ntimestep) tbias->compute_scalar();
+    tbias->remove_bias_all();
   }
 
+  double t = 0.0;
+
+  if (atom->superellipsoid_flag) compute_scalar_templated<true>(t);
+  else compute_scalar_templated<false>(t);
+
   if (tempbias) tbias->restore_bias_all();
 
   MPI_Allreduce(&t,&scalar,1,MPI_DOUBLE,MPI_SUM,world);
@@ -275,19 +270,11 @@ double ComputeTempAsphere::compute_scalar()
 }
 
 /* ---------------------------------------------------------------------- */
-
-void ComputeTempAsphere::compute_vector()
+template<bool is_super>
+void ComputeTempAsphere::compute_vector_templated(double *t)
 {
-  int i;
-
-  invoked_vector = update->ntimestep;
-
-  if (tempbias) {
-    if (tbias->invoked_vector != update->ntimestep) tbias->compute_vector();
-    tbias->remove_bias_all();
-  }
-
   AtomVecEllipsoid::Bonus *bonus = avec->bonus;
+  AtomVecEllipsoid::BonusSuper *bonus_super = avec->bonus_super;
   double **v = atom->v;
   double **angmom = atom->angmom;
   double *rmass = atom->rmass;
@@ -295,37 +282,44 @@ void ComputeTempAsphere::compute_vector()
   int *mask = atom->mask;
   int nlocal = atom->nlocal;
 
-  double *shape,*quat;
-  double wbody[3],inertia[3],t[6];
+  double *shape, *quat;
+  double wbody[3],inertia[3];
   double rot[3][3];
   double massone;
 
-  // sum translational and rotational energy for each particle
-  // no point particles since divide by inertia
+    for (int i = 0; i < nlocal; i++) {
+      if (mask[i] & groupbit) {
+         massone = rmass[i];
 
-  for (i = 0; i < 6; i++) t[i] = 0.0;
+        if (mode == ALL) {
+          t[0] += massone * v[i][0]*v[i][0];
+          t[1] += massone * v[i][1]*v[i][1];
+          t[2] += massone * v[i][2]*v[i][2];
+          t[3] += massone * v[i][0]*v[i][1];
+          t[4] += massone * v[i][0]*v[i][2];
+          t[5] += massone * v[i][1]*v[i][2];
+        }
 
-  if (mode == ALL) {
-    for (i = 0; i < nlocal; i++)
-      if (mask[i] & groupbit) {
-        massone = rmass[i];
-        t[0] += massone * v[i][0]*v[i][0];
-        t[1] += massone * v[i][1]*v[i][1];
-        t[2] += massone * v[i][2]*v[i][2];
-        t[3] += massone * v[i][0]*v[i][1];
-        t[4] += massone * v[i][0]*v[i][2];
-        t[5] += massone * v[i][1]*v[i][2];
+        int j = ellipsoid[i];
 
         // principal moments of inertia
+        if (is_super) {
+          quat = bonus_super[j].quat;
 
-        shape = bonus[ellipsoid[i]].shape;
-        quat = bonus[ellipsoid[i]].quat;
+          inertia[0] = bonus_super[j].inertia[0];
+          inertia[1] = bonus_super[j].inertia[1];
+          inertia[2] = bonus_super[j].inertia[2];
 
-        inertia[0] = INERTIA*massone * (shape[1]*shape[1]+shape[2]*shape[2]);
-        inertia[1] = INERTIA*massone * (shape[0]*shape[0]+shape[2]*shape[2]);
-        inertia[2] = INERTIA*massone * (shape[0]*shape[0]+shape[1]*shape[1]);
+        } else {
+          quat = bonus[j].quat;
+          shape = bonus[j].shape;
 
-        // wbody = angular velocity in body frame
+          inertia[0] = INERTIA*massone * (shape[1]*shape[1] + shape[2]*shape[2]);
+          inertia[1] = INERTIA*massone * (shape[0]*shape[0] + shape[2]*shape[2]);
+          inertia[2] = INERTIA*massone * (shape[0]*shape[0] + shape[1]*shape[1]);
+        }
+
+         // wbody = angular velocity in body frame
 
         MathExtra::quat_to_mat(quat,rot);
         MathExtra::transpose_matvec(rot,angmom[i],wbody);
@@ -342,39 +336,27 @@ void ComputeTempAsphere::compute_vector()
         t[4] += inertia[1]*wbody[0]*wbody[2];
         t[5] += inertia[2]*wbody[1]*wbody[2];
       }
+    }
+}
 
-  } else {
-    for (i = 0; i < nlocal; i++)
-      if (mask[i] & groupbit) {
-
-        // principal moments of inertia
-
-        shape = bonus[ellipsoid[i]].shape;
-        quat = bonus[ellipsoid[i]].quat;
-        massone = rmass[i];
-
-        inertia[0] = INERTIA*massone * (shape[1]*shape[1]+shape[2]*shape[2]);
-        inertia[1] = INERTIA*massone * (shape[0]*shape[0]+shape[2]*shape[2]);
-        inertia[2] = INERTIA*massone * (shape[0]*shape[0]+shape[1]*shape[1]);
 
-        // wbody = angular velocity in body frame
+void ComputeTempAsphere::compute_vector()
+{
+  int i;
+  invoked_vector = update->ntimestep;
 
-        MathExtra::quat_to_mat(quat,rot);
-        MathExtra::transpose_matvec(rot,angmom[i],wbody);
-        wbody[0] /= inertia[0];
-        wbody[1] /= inertia[1];
-        wbody[2] /= inertia[2];
+  if (tempbias) {
+    if (tbias->invoked_vector != update->ntimestep) tbias->compute_vector();
+    tbias->remove_bias_all();
+  }
 
-        // rotational kinetic energy
+  // sum translational and rotational energy for each particle
+  // no point particles since divide by inertia
+  double t[6];
+  for (i = 0; i < 6; i++) t[i] = 0.0;
 
-        t[0] += inertia[0]*wbody[0]*wbody[0];
-        t[1] += inertia[1]*wbody[1]*wbody[1];
-        t[2] += inertia[2]*wbody[2]*wbody[2];
-        t[3] += inertia[0]*wbody[0]*wbody[1];
-        t[4] += inertia[1]*wbody[0]*wbody[2];
-        t[5] += inertia[2]*wbody[1]*wbody[2];
-      }
-  }
+  if (atom->superellipsoid_flag) compute_vector_templated<true>(t);
+  else compute_vector_templated<false>(t);
 
   if (tempbias) tbias->restore_bias_all();
 
diff --git a/src/ASPHERE/compute_temp_asphere.h b/src/ASPHERE/compute_temp_asphere.h
index c8c09b445b1..3c86330b163 100644
--- a/src/ASPHERE/compute_temp_asphere.h
+++ b/src/ASPHERE/compute_temp_asphere.h
@@ -46,6 +46,9 @@ class ComputeTempAsphere : public Compute {
   class AtomVecEllipsoid *avec;
 
   void dof_compute();
+
+  template <bool is_super> void compute_scalar_templated(double &t);
+  template <bool is_super> void compute_vector_templated(double *t);
 };
 
 }    // namespace LAMMPS_NS
diff --git a/src/ASPHERE/fix_nh_asphere.cpp b/src/ASPHERE/fix_nh_asphere.cpp
index 35d0e404be5..0491e47f292 100644
--- a/src/ASPHERE/fix_nh_asphere.cpp
+++ b/src/ASPHERE/fix_nh_asphere.cpp
@@ -43,6 +43,8 @@ void FixNHAsphere::init()
   int *mask = atom->mask;
   int nlocal = atom->nlocal;
 
+  if (atom->superellipsoid_flag) error->all(FLERR, "Fix {} does not support superellipsoids", style);
+
   for (int i = 0; i < nlocal; i++)
     if (mask[i] & groupbit)
       if (ellipsoid[i] < 0)
diff --git a/src/ASPHERE/fix_nve_asphere.cpp b/src/ASPHERE/fix_nve_asphere.cpp
index a5655b875cc..7c7f170df2d 100644
--- a/src/ASPHERE/fix_nve_asphere.cpp
+++ b/src/ASPHERE/fix_nve_asphere.cpp
@@ -58,13 +58,19 @@ void FixNVEAsphere::init()
 
 /* ---------------------------------------------------------------------- */
 
-void FixNVEAsphere::initial_integrate(int /*vflag*/)
+template <bool is_super>
+void FixNVEAsphere::initial_integrate_templated()
 {
   double dtfm;
-  double inertia[3],omega[3];
-  double *shape,*quat;
+  double omega[3];
+  double *inertia,*quat, *shape;
+  double inertia_to_compute[3];
+
+  AtomVecEllipsoid::Bonus *bonus = nullptr;
+  AtomVecEllipsoid::BonusSuper *bonus_super = nullptr;
+  if (is_super) bonus_super = avec->bonus_super;
+  else bonus = avec->bonus;
 
-  AtomVecEllipsoid::Bonus *bonus = avec->bonus;
   int *ellipsoid = atom->ellipsoid;
   double **x = atom->x;
   double **v = atom->v;
@@ -97,13 +103,19 @@ void FixNVEAsphere::initial_integrate(int /*vflag*/)
       angmom[i][2] += dtf * torque[i][2];
 
       // principal moments of inertia
-
-      shape = bonus[ellipsoid[i]].shape;
-      quat = bonus[ellipsoid[i]].quat;
-
-      inertia[0] = INERTIA*rmass[i] * (shape[1]*shape[1]+shape[2]*shape[2]);
-      inertia[1] = INERTIA*rmass[i] * (shape[0]*shape[0]+shape[2]*shape[2]);
-      inertia[2] = INERTIA*rmass[i] * (shape[0]*shape[0]+shape[1]*shape[1]);
+      int j = ellipsoid[i];
+      if (is_super) {
+        inertia = bonus_super[j].inertia;
+        quat = bonus_super[j].quat;
+      } else {
+        quat = bonus[j].quat;
+        shape = bonus[j].shape;
+        inertia = inertia_to_compute;
+
+        inertia[0] = INERTIA*rmass[i] * (shape[1]*shape[1]+shape[2]*shape[2]);
+        inertia[1] = INERTIA*rmass[i] * (shape[0]*shape[0]+shape[2]*shape[2]);
+        inertia[2] = INERTIA*rmass[i] * (shape[0]*shape[0]+shape[1]*shape[1]);
+      }
 
       // compute omega at 1/2 step from angmom at 1/2 step and current q
       // update quaternion a full step via Richardson iteration
@@ -114,6 +126,16 @@ void FixNVEAsphere::initial_integrate(int /*vflag*/)
     }
 }
 
+
+
+/* ---------------------------------------------------------------------- */
+
+void FixNVEAsphere::initial_integrate(int /*vflag*/)
+{
+  if (atom->superellipsoid_flag) initial_integrate_templated<true>();
+  else initial_integrate_templated<false>();
+}
+
 /* ---------------------------------------------------------------------- */
 
 void FixNVEAsphere::final_integrate()
diff --git a/src/ASPHERE/fix_nve_asphere.h b/src/ASPHERE/fix_nve_asphere.h
index b614f4083f6..097948c2770 100644
--- a/src/ASPHERE/fix_nve_asphere.h
+++ b/src/ASPHERE/fix_nve_asphere.h
@@ -34,6 +34,8 @@ class FixNVEAsphere : public FixNVE {
  private:
   double dtq;
   class AtomVecEllipsoid *avec;
+  template <bool is_super> void initial_integrate_templated();
+
 };
 
 }    // namespace LAMMPS_NS
diff --git a/src/ASPHERE/fix_nve_asphere_noforce.cpp b/src/ASPHERE/fix_nve_asphere_noforce.cpp
index aaa21d9550f..0d7b99e06d5 100644
--- a/src/ASPHERE/fix_nve_asphere_noforce.cpp
+++ b/src/ASPHERE/fix_nve_asphere_noforce.cpp
@@ -59,10 +59,15 @@ void FixNVEAsphereNoforce::init()
 
 /* ---------------------------------------------------------------------- */
 
-void FixNVEAsphereNoforce::initial_integrate(int /*vflag*/)
+template <bool is_super>
+void FixNVEAsphereNoforce::initial_integrate_templated()
 {
-  AtomVecEllipsoid::Bonus *bonus;
-  if (avec) bonus = avec->bonus;
+  AtomVecEllipsoid::Bonus *bonus = nullptr;
+  AtomVecEllipsoid::BonusSuper *bonus_super = nullptr;
+  if (avec) {
+    if (is_super) bonus_super = avec->bonus_super;
+    else bonus = avec->bonus;
+  }
   double **x = atom->x;
   double **v = atom->v;
   double **angmom = atom->angmom;
@@ -72,8 +77,8 @@ void FixNVEAsphereNoforce::initial_integrate(int /*vflag*/)
   int nlocal = atom->nlocal;
   if (igroup == atom->firstgroup) nlocal = atom->nfirst;
 
-  double *shape,*quat;
-  double inertia[3],omega[3];
+  double *shape,*quat, *inertia;
+  double inertia_to_compute[3],omega[3];
 
   // update positions and quaternions for all particles
 
@@ -86,13 +91,18 @@ void FixNVEAsphereNoforce::initial_integrate(int /*vflag*/)
 
       // principal moments of inertia
 
-      shape = bonus[ellipsoid[i]].shape;
-      quat = bonus[ellipsoid[i]].quat;
-
-      inertia[0] = rmass[i] * (shape[1]*shape[1]+shape[2]*shape[2]) / 5.0;
-      inertia[1] = rmass[i] * (shape[0]*shape[0]+shape[2]*shape[2]) / 5.0;
-      inertia[2] = rmass[i] * (shape[0]*shape[0]+shape[1]*shape[1]) / 5.0;
-
+      if (is_super) {
+        quat = bonus_super[ellipsoid[i]].quat;
+        inertia = bonus_super[ellipsoid[i]].inertia;
+      } else {
+        shape = bonus[ellipsoid[i]].shape;
+        quat = bonus[ellipsoid[i]].quat;
+        inertia = inertia_to_compute;
+
+        inertia[0] = rmass[i] * (shape[1]*shape[1]+shape[2]*shape[2]) / 5.0;
+        inertia[1] = rmass[i] * (shape[0]*shape[0]+shape[2]*shape[2]) / 5.0;
+        inertia[2] = rmass[i] * (shape[0]*shape[0]+shape[1]*shape[1]) / 5.0;
+      }
       // compute omega at 1/2 step from angmom at 1/2 step and current q
       // update quaternion a full step via Richardson iteration
       // returns new normalized quaternion
@@ -102,3 +112,10 @@ void FixNVEAsphereNoforce::initial_integrate(int /*vflag*/)
     }
   }
 }
+/* ---------------------------------------------------------------------- */
+
+void FixNVEAsphereNoforce::initial_integrate(int /*vflag*/)
+{
+  if (atom->superellipsoid_flag) initial_integrate_templated<true>();
+  else initial_integrate_templated<false>();
+}
diff --git a/src/ASPHERE/fix_nve_asphere_noforce.h b/src/ASPHERE/fix_nve_asphere_noforce.h
index 8f7548633c1..28a938a0bec 100644
--- a/src/ASPHERE/fix_nve_asphere_noforce.h
+++ b/src/ASPHERE/fix_nve_asphere_noforce.h
@@ -33,6 +33,7 @@ class FixNVEAsphereNoforce : public FixNVENoforce {
  private:
   double dtq;
   class AtomVecEllipsoid *avec;
+  template <bool is_super> void initial_integrate_templated();
 };
 
 }    // namespace LAMMPS_NS
diff --git a/src/ASPHERE/math_extra_superellipsoids.cpp b/src/ASPHERE/math_extra_superellipsoids.cpp
new file mode 100644
index 00000000000..d32a396fa2f
--- /dev/null
+++ b/src/ASPHERE/math_extra_superellipsoids.cpp
@@ -0,0 +1,820 @@
+// clang-format off
+/* ----------------------------------------------------------------------
+   LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
+   https://www.lammps.org/, Sandia National Laboratories
+   LAMMPS development team: developers@lammps.org
+
+   Copyright (2003) Sandia Corporation.  Under the terms of Contract
+   DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
+   certain rights in this software.  This software is distributed under
+   the GNU General Public License.
+
+   See the README file in the top-level LAMMPS directory.
+------------------------------------------------------------------------- */
+
+/* ----------------------------------------------------------------------
+   Contributing author: Jacopo Bilotto (EPFL), Jibril B. Coulibaly
+------------------------------------------------------------------------- */
+
+#include "math_extra_superellipsoids.h"
+
+#include "math_const.h"
+#include "math_extra.h"
+
+#include <cmath>
+
+namespace MathExtraSuperellipsoids {
+
+inline constexpr double TIKHONOV_SCALE =
+    1e-14;
+
+static constexpr int ITERMAX_NR = 100;
+static constexpr double TOL_NR_RES = 1e-10 * 1e-10;
+static constexpr double TOL_NR_POS = 1e-6 * 1e-6;
+
+static constexpr int ITERMAX_LS = 10;
+static constexpr double PARAMETER_LS = 1e-4;
+static constexpr double CUTBACK_LS = 0.5;
+
+static constexpr double TOL_OVERLAP = 1e-8;
+static constexpr unsigned int ITERMAX_OVERLAP = 20;
+static constexpr double MINSLOPE_OVERLAP = 1e-12;
+
+static constexpr double REGULARIZATION_EPSILON = 1e-12;
+static constexpr double MAX_B_FAST = 1e30;
+
+
+/* ----------------------------------------------------------------------
+   curvature of superellipsoid
+   source https://en.wikipedia.org/wiki/Mean_curvature
+------------------------------------------------------------------------- */
+
+double mean_curvature_superellipsoid(const double *shape, const double *block, const int flag, const double R[3][3], const double *surf_global_point, const double *xc)
+{
+  // this code computes the mean curvature on the superellipsoid surface
+  // for the given global point
+  double hess[3][3], grad[3], normal[3];
+  double shapefunc, xlocal[3], tmp_v[3];
+  MathExtra::sub3(surf_global_point, xc, tmp_v); // here tmp_v is the vector from center to surface point
+  MathExtra::transpose_matvec(R, tmp_v, xlocal);
+  shapefunc = shape_and_derivatives_local(xlocal, shape, block, flag, grad, hess); // computation of curvature is independent of local or global frame
+  MathExtra::normalize3(grad, normal);
+  MathExtra::matvec(hess, normal, tmp_v); // here tmp_v is intermediate product
+  double F_mag = sqrt(MathExtra::dot3(grad, grad));
+  double curvature = fabs(MathExtra::dot3(normal, tmp_v) - (hess[0][0] + hess[1][1] + hess[2][2])) / (2.0 * F_mag);
+  return curvature;
+}
+
+double gaussian_curvature_superellipsoid(const double *shape, const double *block, const int flag, const double R[3][3], const double *surf_global_point, const double *xc)
+{
+  // this code computes the gaussian curvature coefficient
+  // for the given global point
+  double hess[3][3], grad[3], normal[3];
+  double shapefunc, xlocal[3], tmp_v[3];
+  MathExtra::sub3(surf_global_point, xc, tmp_v); // here tmp_v is the vector from center to surface point
+  MathExtra::transpose_matvec(R, tmp_v, xlocal);
+  shapefunc = shape_and_derivatives_local(xlocal, shape, block, flag, grad, hess); // computation of curvature is independent of local or global frame
+  MathExtra::normalize3(grad, normal);
+
+  double temp[3];
+  MathExtra::matvec(hess, normal, temp);
+  double F_mag = sqrt(MathExtra::dot3(grad, grad));
+
+  double fx = grad[0];
+  double fy = grad[1];
+  double fz = grad[2];
+
+  double fxx = hess[0][0];
+  double fxy = hess[0][1];
+  double fxz = hess[0][2];
+
+  double fyy = hess[1][1];
+  double fyz = hess[1][2];
+
+  double fzz = hess[2][2];
+
+  double mat[4][4] = {
+    {fxx, fxy, fxz, fx},
+    {fxy, fyy, fyz, fy},
+    {fxz, fyz, fzz, fz},
+    {fx,  fy,  fz, 0.0}
+  };
+
+  double K = -det4_M44_zero(mat) / (F_mag*F_mag*F_mag*F_mag);
+  double curvature =  sqrt(fabs(K));
+  return curvature;
+}
+
+
+/* ----------------------------------------------------------------------
+   express global (system level) to local (particle level) coordinates
+------------------------------------------------------------------------- */
+
+void global2local_vector(const double *v, const double *quat, double *local_v)
+{
+    double qc[4];
+    MathExtra::qconjugate(const_cast<double*>(quat), qc);
+    MathExtra::quatrotvec(qc, const_cast<double*>(v), local_v);
+}
+
+/* ----------------------------------------------------------------------
+   Possible regularization for the shape functions
+   Instead of F(x,y,z) = 0 we use (F(x,y,z)+1)^(1/n1) -1 = G(x,y,z) = 0
+   We also scale G by the average radius to have better "midway" points
+------------------------------------------------------------------------- */
+
+void apply_regularization_shape_function(double n1, const double avg_radius, double *value, double *grad, double hess[3][3])
+{
+  // value is F - 1
+  double base = std::fmax(*value + 1.0, REGULARIZATION_EPSILON);
+  const double inv_F = 1.0 / base;
+  const double inv_n1 = 1.0 / n1;
+
+  // P = base^(1/n)
+  const double F_pow_inv_n1 = std::pow(base, inv_n1);
+
+  // Scale for Gradient: S1 = R * (1/n) * base^(1/n - 1)
+  const double scale_grad = avg_radius * inv_n1 * F_pow_inv_n1 * inv_F;
+
+  // Scale for Hessian addition: S2 = S1 * (1/n - 1) * base^-1
+  const double scale_hess_add = scale_grad * (inv_n1 - 1.0) * inv_F;
+
+  // H_new = scale_grad * H_old + scale_hess_add * (grad_old x grad_old^T)
+  for (int i = 0; i < 3; i++) {
+      for (int j = 0; j < 3; j++) {
+          double grad_outer_prod = grad[i] * grad[j];
+          hess[i][j] = (hess[i][j] * scale_grad) + (scale_hess_add * grad_outer_prod);
+      }
+  }
+
+  // grad_new = scale_grad * grad_old
+  for (int i = 0; i < 3; i++) {
+      grad[i] *= scale_grad;
+  }
+
+  // G = R * (base^(1/n) - 1)
+  *value = avg_radius * (F_pow_inv_n1 - 1.0);
+};
+
+/* ----------------------------------------------------------------------
+   shape function computations for superellipsoids
+------------------------------------------------------------------------- */
+
+double shape_and_derivatives_local(const double* xlocal, const double* shape, const double* block, const int flag, double* grad, double hess[3][3])
+{
+  double shapefunc;
+  // TODO: Not sure how to make flag values more clear
+  // Cannot forward declare the enum AtomVecEllipsoid::BlockType
+  // Could use scoped (enum class) but no implicit conversion:
+  //    must pass type `LAMMPS_NS::AtomVecEllipsoid::BlockType` instead of int,
+  //    and/or static_cast the enum class to int, which is similar to current
+  // Could define the enum in a dedicated header
+  //    seems overkill just for one enum
+  // I think the comment below making reference to the BlockType should be enough
+  // Feel free to change to a better design
+  switch (flag) { // LAMMPS_NS::AtomVecEllipsoid::BlockType
+    case 0: {
+      shapefunc = shape_and_derivatives_local_ellipsoid(xlocal, shape, grad, hess);
+      break;
+    }
+    case 1: {
+      shapefunc = shape_and_derivatives_local_n1equaln2(xlocal, shape, block[0], grad, hess);
+      break;
+    }
+    case 2: {
+      shapefunc = shape_and_derivatives_local_superquad(xlocal, shape, block, grad, hess);
+      break;
+    }
+  }
+
+  return shapefunc;
+}
+
+/* ----------------------------------------------------------------------
+   General case for n1 != n2 > 2
+------------------------------------------------------------------------- */
+
+double shape_and_derivatives_local_superquad(const double* xlocal, const double* shape, const double* block, double* grad, double hess[3][3])
+{
+  double a_inv = 1.0 / shape[0];
+  double b_inv = 1.0 / shape[1];
+  double c_inv = 1.0 / shape[2];
+  double x_a = std::fabs(xlocal[0] * a_inv);
+  double y_b = std::fabs(xlocal[1] * b_inv);
+  double z_c = std::fabs(xlocal[2] * c_inv);
+  double n1 = block[0];
+  double n2 = block[1];
+  double x_a_pow_n2_m2 = std::pow(x_a, n2 - 2.0);
+  double x_a_pow_n2_m1 = x_a_pow_n2_m2 * x_a;
+  double y_b_pow_n2_m2 = std::pow(y_b, n2 - 2.0);
+  double y_b_pow_n2_m1 = y_b_pow_n2_m2 * y_b;
+
+  double nu = (x_a_pow_n2_m1 * x_a) + (y_b_pow_n2_m1 * y_b);
+  double nu_pow_n1_n2_m2 = std::pow(nu, n1/n2 - 2.0);
+  double nu_pow_n1_n2_m1 = nu_pow_n1_n2_m2 * nu;
+
+  double z_c_pow_n1_m2 = std::pow(z_c, n1 -2.0);
+  double z_c_pow_n1_m1 = z_c_pow_n1_m2 * z_c;
+
+  // Equation (14)
+  double signx = xlocal[0] > 0.0 ? 1.0 : -1.0;
+  double signy = xlocal[1] > 0.0 ? 1.0 : -1.0;
+  double signz = xlocal[2] > 0.0 ? 1.0 : -1.0;
+  grad[0] = n1 * a_inv * x_a_pow_n2_m1 * nu_pow_n1_n2_m1 * signx;
+  grad[1] = n1 * b_inv * y_b_pow_n2_m1 * nu_pow_n1_n2_m1 * signy;
+  grad[2] = n1 * c_inv * z_c_pow_n1_m1 * signz;
+
+  // Equation (15)
+  double signxy = signx * signy;
+  hess[0][0] = a_inv * a_inv * (n1 * (n2 - 1.0) * x_a_pow_n2_m2 * nu_pow_n1_n2_m1 +
+                                (n1 - n2) * n1 * (x_a_pow_n2_m1 * x_a_pow_n2_m1) * nu_pow_n1_n2_m2);
+  hess[1][1] = b_inv * b_inv * (n1 * (n2 - 1.0) * y_b_pow_n2_m2 * nu_pow_n1_n2_m1 +
+                                (n1 - n2) * n1 * (y_b_pow_n2_m1 * y_b_pow_n2_m1) * nu_pow_n1_n2_m2);
+  hess[0][1] = hess[1][0] = a_inv * b_inv * (n1 - n2) * n1 * x_a_pow_n2_m1 * y_b_pow_n2_m1 * nu_pow_n1_n2_m2 * signxy;
+  hess[2][2] = c_inv * c_inv * n1 * (n1 - 1.0) * z_c_pow_n1_m2;
+  hess[0][2] = hess[2][0] = hess[1][2] = hess[2][1] = 0.0;
+
+  return (nu_pow_n1_n2_m1 * nu) + (z_c_pow_n1_m1 * z_c) - 1.0;
+}
+
+/* ----------------------------------------------------------------------
+   Special case for n2 = n2 = n > 2
+------------------------------------------------------------------------- */
+
+double shape_and_derivatives_local_n1equaln2(const double* xlocal, const double* shape, const double n, double* grad, double hess[3][3])
+{
+  double a_inv = 1.0 / shape[0];
+  double b_inv = 1.0 / shape[1];
+  double c_inv = 1.0 / shape[2];
+  double x_a = std::fabs(xlocal[0] * a_inv);
+  double y_b = std::fabs(xlocal[1] * b_inv);
+  double z_c = std::fabs(xlocal[2] * c_inv);
+  double x_a_pow_n_m2 = std::pow(x_a, n - 2.0);
+  double x_a_pow_n_m1 = x_a_pow_n_m2 * x_a;
+  double y_b_pow_n_m2 = std::pow(y_b, n - 2.0);
+  double y_b_pow_n_m1 = y_b_pow_n_m2 * y_b;
+  double z_c_pow_n_m2 = std::pow(z_c, n - 2.0);
+  double z_c_pow_n_m1 = z_c_pow_n_m2 * z_c;
+
+  // Equation (14)
+  double signx = xlocal[0] > 0.0 ? 1.0 : -1.0;
+  double signy = xlocal[1] > 0.0 ? 1.0 : -1.0;
+  double signz = xlocal[2] > 0.0 ? 1.0 : -1.0;
+  grad[0] = n * a_inv * x_a_pow_n_m1 * signx;
+  grad[1] = n * b_inv * y_b_pow_n_m1 * signy;
+  grad[2] = n * c_inv * z_c_pow_n_m1 * signz;
+
+  // Equation (15)
+  double signxy = signx * signy;
+  hess[0][0] = a_inv * a_inv * n * (n - 1.0) * x_a_pow_n_m2;
+  hess[1][1] = b_inv * b_inv * n * (n - 1.0) * y_b_pow_n_m2;
+  hess[2][2] = c_inv * c_inv * n * (n - 1.0) * z_c_pow_n_m2;
+  hess[0][1] = hess[1][0] = hess[0][2] = hess[2][0] = hess[1][2] = hess[2][1] = 0.0;
+
+  return (x_a_pow_n_m1 * x_a) + (y_b_pow_n_m1 * y_b) + (z_c_pow_n_m1 * z_c) - 1.0;
+}
+
+/* ----------------------------------------------------------------------
+   Special case for n1 = n2 = 2
+------------------------------------------------------------------------- */
+
+double shape_and_derivatives_local_ellipsoid(const double* xlocal, const double* shape, double* grad, double hess[3][3])
+{
+  double a = 2.0 / (shape[0] * shape[0]);
+  double b = 2.0 / (shape[1] * shape[1]);
+  double c = 2.0 / (shape[2] * shape[2]);
+
+  // Equation (14) simplified for n1 = n2 = 2
+  grad[0] = a * xlocal[0];
+  grad[1] = b * xlocal[1];
+  grad[2] = c * xlocal[2];
+
+  // Equation (15)
+  hess[0][0] = a;
+  hess[1][1] = b;
+  hess[2][2] = c;
+  hess[0][1] = hess[1][0] = hess[0][2] = hess[2][0] = hess[1][2] = hess[2][1] = 0.0;
+
+  return 0.5 * (grad[0]*xlocal[0] + grad[1]*xlocal[1] + grad[2]*xlocal[2]) - 1.0;
+}
+
+/* ---------------------------------------------------------------------- */
+
+double shape_and_derivatives_global(const double* xc, const double R[3][3],
+    const double* shape, const double* block, const int flag,
+    const double* X0, double* grad, double hess[3][3],
+    const int formulation, const double avg_radius)
+{
+  double xlocal[3], tmp_v[3], tmp_m[3][3];
+  MathExtra::sub3(X0, xc, tmp_v);
+  MathExtra::transpose_matvec(R, tmp_v, xlocal);
+  double shapefunc = shape_and_derivatives_local(xlocal, shape, block, flag, tmp_v, hess);
+  if (formulation == FORMULATION_GEOMETRIC) {
+     apply_regularization_shape_function(block[0], avg_radius, &shapefunc, tmp_v, hess);
+  }
+  MathExtra::matvec(R, tmp_v, grad);
+  MathExtra::times3_transpose(hess, R, tmp_m);
+  MathExtra::times3(R, tmp_m, hess);
+
+  return shapefunc;
+}
+
+// double compute_residual(const double shapefunci, const double* gradi_global, const double shapefuncj, const double* gradj_global, const double mu2, double* residual) {
+//   // Equation (23)
+//   MathExtra::scaleadd3(mu2, gradj_global, gradi_global, residual);
+//   residual[3] = shapefunci - shapefuncj;
+//   // Normalize residual Equation (23)
+//   // shape functions and gradients dimensions are not homogeneous
+//   // Gradient equality F1' + mu2 * F2' evaluated relative to magnitude of gradient ||F1'|| = ||mu2 * F2'||
+//   // Shape function equality F1 - F2 evaluated relative to magnitude of shape function + 1
+//   //    the shift f = polynomial - 1 is not necessary and cancels out in F1 - F2
+//   // Last component homogeneous to shape function
+//   return MathExtra::lensq3(residual) / MathExtra::lensq3(gradi_global) +
+//          residual[3] * residual[3] / ((shapefunci + 1) * (shapefunci + 1));
+// }
+
+/* ---------------------------------------------------------------------- */
+
+double compute_residual(const double shapefunci, const double* gradi_global,
+                        const double shapefuncj, const double* gradj_global,
+                        const double mu2, double* residual,
+                        const int formulation, const double radius_scale)
+{
+  // Equation (23): Spatial residual (Gradient match)
+  MathExtra::scaleadd3(mu2, gradj_global, gradi_global, residual);
+  residual[3] = shapefunci - shapefuncj;
+
+  // --- Spatial Normalization ---
+  // Algebraic: Gradients are ~1/R. Dividing by lensq3 normalizes this.
+  // We take average of gradient for polydisperse case
+  // Geometric: Gradients are unit vectors. lensq3 is 1.0. This works for both.
+
+  double gradi_global_mag = 0.5 * (MathExtra::lensq3(gradi_global) + MathExtra::lensq3(gradj_global));
+  double spatial_norm = MathExtra::lensq3(residual) / gradi_global_mag;
+
+  // --- Scalar Normalization ---
+  double scalar_denom;
+
+  if (formulation == FORMULATION_GEOMETRIC) {
+      // GEOMETRIC: G is a distance (Length).
+      scalar_denom = radius_scale;
+  } else {
+      // ALGEBRAIC: F is dimensionless (approx 0 at surface).
+      scalar_denom = shapefunci + 1.0;
+  }
+
+  // Prevent division by zero in weird edge cases (e.g. very negative shape function)
+  if (fabs(scalar_denom) < 1e-12) scalar_denom = 1.0;
+
+  return spatial_norm + (residual[3] * residual[3]) / (scalar_denom * scalar_denom);
+}
+
+/* ---------------------------------------------------------------------- */
+
+void compute_jacobian(const double* gradi_global, const double hessi_global[3][3],
+                     const double* gradj_global, const double hessj_global[3][3], const double mu2, double* jacobian)
+{
+  // Jacobian (derivative of residual)
+  for (int row = 0 ; row < 3 ; row++) {
+    for (int col = 0 ; col < 3 ; col++) {
+      jacobian[row*4 + col] = hessi_global[row][col] + mu2 * hessj_global[row][col];
+    }
+    jacobian[row*4 + 3] = gradj_global[row];
+  }
+  for (int col = 0 ; col < 3 ; col++) {
+    jacobian[3*4 + col] = gradi_global[col] - gradj_global[col];
+  }
+  jacobian[15] = 0.0;
+}
+
+/* ---------------------------------------------------------------------- */
+
+double compute_residual_and_jacobian(const double* xci, const double Ri[3][3], const double* shapei, const double* blocki, const int flagi,
+                                     const double* xcj, const double Rj[3][3], const double* shapej, const double* blockj, const int flagj,
+                                     const double* X, double* shapefunc, double* residual, double* jacobian,
+                                     const int formulation, const double avg_radius_i, const double avg_radius_j)
+{
+  double gradi[3], hessi[3][3], gradj[3], hessj[3][3];
+  shapefunc[0] = shape_and_derivatives_global(xci, Ri, shapei, blocki, flagi, X, gradi, hessi, formulation, avg_radius_i);
+  shapefunc[1] = shape_and_derivatives_global(xcj, Rj, shapej, blockj, flagj, X, gradj, hessj, formulation, avg_radius_j);
+  compute_jacobian(gradi, hessi, gradj, hessj, X[3], jacobian);
+  return compute_residual(shapefunc[0], gradi, shapefunc[1], gradj, X[3], residual, formulation, (avg_radius_i + avg_radius_j) * 0.5);
+}
+
+/* ---------------------------------------------------------------------- */
+
+int determine_contact_point(const double* xci, const double Ri[3][3], const double* shapei, const double* blocki, const int flagi,
+                            const double* xcj, const double Rj[3][3], const double* shapej, const double* blockj, const int flagj,
+                            double* X0, double* nij, int formulation)
+{
+  double norm, norm_old, shapefunc[2], residual[4], jacobian[16];
+  double lsq = MathExtra::distsq3(xci, xcj);
+  bool converged(false);
+
+  double rhs_old[3];
+
+  // avg radii for regularization if GEOMETRIC formulation
+  double avg_radius_i = 1;
+  double avg_radius_j = 1;
+  double max_step = sqrt(lsq) * 0.2;
+  if (formulation == FORMULATION_GEOMETRIC) {
+    avg_radius_i = (shapei[0] + shapei[1] + shapei[2]) * LAMMPS_NS::MathConst::THIRD;
+    avg_radius_j = (shapej[0] + shapej[1] + shapej[2]) * LAMMPS_NS::MathConst::THIRD;
+  }
+
+  norm = compute_residual_and_jacobian(xci, Ri, shapei, blocki, flagi, xcj, Rj, shapej, blockj, flagj, X0, shapefunc, residual, jacobian, formulation, avg_radius_i, avg_radius_j);
+  // testing for convergence before attempting Newton's method.
+  // the initial guess is the old X0, so with temporal coherence, it might still pass tolerance if deformation is slow!
+  if (norm < TOL_NR_RES) {
+
+    //  must compute the normal vector nij before returning since the Newton loop normally handles this upon convergence.
+    double xilocal[3], tmp_v[3], gradi[3], val_dummy;
+
+    // Transform global X0 to local frame of particle I
+    MathExtra::sub3(X0, xci, tmp_v);
+    MathExtra::transpose_matvec(Ri, tmp_v, xilocal);
+
+    // Compute local gradient
+    // Algebraic gradient is fine for direction even if we used Geometric for solving
+    // TODO: might use a simpler function to simply compute the gradient, to
+    // avoid computing quantities already computed in compute_residual_and_jacobian
+    if (flagi <= 1)
+      val_dummy = shape_and_gradient_local_n1equaln2_surfacesearch(xilocal, shapei, blocki[0], tmp_v);
+    else
+      val_dummy = shape_and_gradient_local_superquad_surfacesearch(xilocal, shapei, blocki, tmp_v);
+
+    // Rotate gradient back to global frame to get normal
+    MathExtra::matvec(Ri, tmp_v, gradi);
+    MathExtra::normalize3(gradi, nij);
+
+    // Return status
+    if (shapefunc[0] > 0.0 || shapefunc[1] > 0.0)
+      return 1; // Converged, but no contact (separated)
+
+    return 0; // Converged and Contacting
+  }
+
+
+  for (int iter = 0 ; iter < ITERMAX_NR ; iter++) {
+    norm_old = norm;
+
+    double rhs[4];
+    bool gauss_elim_solved = false;
+    double A_fast[16];
+    double b_fast[4];
+
+    for(int i = 0; i < 16; ++i) {
+        A_fast[i] = jacobian[i];
+    }
+
+    b_fast[0] = -residual[0]; b_fast[1] = -residual[1];
+    b_fast[2] = -residual[2]; b_fast[3] = -residual[3];
+
+    // Try Fast Solver
+    gauss_elim_solved = MathExtraSuperellipsoids::solve_4x4_robust_unrolled(A_fast, b_fast);
+
+    // check for divergence or numerical issues in the fast solver
+    // and fall back to regularized solver if necessary
+    bool fail0 = !std::isfinite(b_fast[0]) | (std::abs(b_fast[0]) > MAX_B_FAST);
+    bool fail1 = !std::isfinite(b_fast[1]) | (std::abs(b_fast[1]) > MAX_B_FAST);
+    bool fail2 = !std::isfinite(b_fast[2]) | (std::abs(b_fast[2]) > MAX_B_FAST);
+    bool fail3 = !std::isfinite(b_fast[3]) | (std::abs(b_fast[3]) > MAX_B_FAST);
+    if (fail0 | fail1 | fail2 | fail3) {
+        gauss_elim_solved = false;
+    }
+
+    rhs[0] = b_fast[0]; rhs[1] = b_fast[1];
+    rhs[2] = b_fast[2]; rhs[3] = b_fast[3];
+
+    if (!gauss_elim_solved) {
+      // restore matrix
+      for(int i = 0; i < 16; ++i) {
+        A_fast[i] = jacobian[i];
+      }
+
+      b_fast[0] = -residual[0]; b_fast[1] = -residual[1];
+      b_fast[2] = -residual[2]; b_fast[3] = -residual[3];
+       // enforce a minimum regularization to avoid zero pivots in edge cases (flat on flat)
+      double trace = jacobian[0] + jacobian[5] + jacobian[10];
+      double diag_weight = std::fmax(TIKHONOV_SCALE * trace, TIKHONOV_SCALE);
+      A_fast[0]  += diag_weight;
+      A_fast[5]  += diag_weight;
+      A_fast[10] += diag_weight;
+
+      if (MathExtraSuperellipsoids::solve_4x4_robust_unrolled(A_fast, b_fast)) {
+          rhs[0] = b_fast[0]; rhs[1] = b_fast[1];
+          rhs[2] = b_fast[2]; rhs[3] = b_fast[3];
+          gauss_elim_solved = true;
+      }
+    }
+
+    MathExtra::copy3(rhs, rhs_old);
+
+    // Backtracking line search
+    double X_line[4];
+    int iter_ls;
+    double a = 1.0;
+
+    // Limit the max step size to avoid jumping too far
+    // normalize residual vector if step was limited
+    double spatial_residual_norm = std::sqrt(rhs[0]*rhs[0] + rhs[1]*rhs[1] + rhs[2]*rhs[2]);
+
+    if (spatial_residual_norm > max_step) {
+        double scale = max_step / spatial_residual_norm;
+        rhs[0] *= scale;
+        rhs[1] *= scale;
+        rhs[2] *= scale;
+    }
+
+    for (iter_ls = 0 ; iter_ls < ITERMAX_LS ; iter_ls++) {
+      X_line[0] = X0[0] + a * rhs[0];
+      X_line[1] = X0[1] + a * rhs[1];
+      X_line[2] = X0[2] + a * rhs[2];
+      X_line[3] = X0[3] + a * rhs[3];
+
+      // Line search iterates not selected for the next Newton iteration
+      // do not need to compute the expensive Jacobian, only the residual.
+      // We want to avoid calling `compute_residual_and_jacobian()` for each
+      // line search iterate.
+      // However, many intermediate variables that are costly to compute
+      // are shared by the local gradient and local hessian calculations.
+      // We want to avoid calling `compute_residual()` followed by `compute_jacobian()`
+      // for the iterates that satisfy the descent condition.
+      // To do so, we duplicate `compute_residual_and_jacobian()`, but only
+      // build the global hessians if the descent condition is satisfied and
+      // the iterate will be used in the next Newton step.
+      // This leads to some code duplication, and still computes
+      // the local hessians even when they are not necessary.
+      // This seems to be an acceptable in-between of performance and clean code.
+      // As most of the cost in the Hessian is in the 2 matrix products to
+      // Compute the global matrix from the local one
+
+      // One alternative would be to store the intermediate variables from
+      // the local gradient calculation when calling `shape_and_gradient_local()`,
+      // and re-use them during the local hessian calculation (function that
+      // calculates only the Hessian from these intermediate values would need
+      // to be implemented).
+      // This seems a bit clunky just to save the few multiplications of the
+      // local hessian calculation, that is why I did not do it. I am open to
+      // other ideas and solutions.
+      // Even then, we would have some code duplication with `compute_residual_and_jacobian()`
+      // So maybe I am overthinking this...
+
+      double xilocal[3], gradi[3], hessi[3][3], xjlocal[3], gradj[3], hessj[3][3], tmp_v[3];
+
+      MathExtra::sub3(X_line, xci, tmp_v);
+      MathExtra::transpose_matvec(Ri, tmp_v, xilocal);
+      shapefunc[0] = shape_and_derivatives_local(xilocal, shapei, blocki, flagi, tmp_v, hessi);
+      if (formulation == FORMULATION_GEOMETRIC) {
+          apply_regularization_shape_function(blocki[0], avg_radius_i, &shapefunc[0], tmp_v, hessi);
+      }
+      MathExtra::matvec(Ri, tmp_v, gradi);
+
+      MathExtra::sub3(X_line, xcj, tmp_v);
+      MathExtra::transpose_matvec(Rj, tmp_v, xjlocal);
+      shapefunc[1] = shape_and_derivatives_local(xjlocal, shapej, blockj, flagj, tmp_v, hessj);
+      if (formulation == FORMULATION_GEOMETRIC) {
+          apply_regularization_shape_function(blockj[0], avg_radius_j, &shapefunc[1], tmp_v, hessj);
+      }
+      MathExtra::matvec(Rj, tmp_v, gradj);
+
+      norm = compute_residual(shapefunc[0], gradi, shapefunc[1], gradj, X_line[3], residual, formulation, (avg_radius_i + avg_radius_j)/2.0);
+
+      if ((norm <= TOL_NR_RES) &&
+          (MathExtra::lensq3(rhs) * a * a <= TOL_NR_POS * lsq)) {
+        converged = true;
+
+        MathExtra::normalize3(gradi, nij);
+        break;
+      } else if (norm > norm_old - PARAMETER_LS * a * norm_old) { // Armijo - Goldstein condition not met
+        // Tested after convergence check because tiny values of norm and norm_old < TOL_NR
+        // Can still fail the Armijo - Goldstein condition`
+        a *= CUTBACK_LS;
+      } else {
+        // Only compute the jacobian if there is another Newton iteration to come
+        double tmp_m[3][3];
+        MathExtra::times3_transpose(hessi, Ri, tmp_m);
+        MathExtra::times3(Ri, tmp_m, hessi);
+        MathExtra::times3_transpose(hessj, Rj, tmp_m);
+        MathExtra::times3(Rj, tmp_m, hessj);
+        compute_jacobian(gradi, hessi, gradj, hessj, X_line[3], jacobian);
+        break;
+      }
+    }
+    // Take full step if no descent at the end of line search
+    // Try to escape bad region
+    if (iter_ls == ITERMAX_LS) {
+      X0[0] += rhs[0];
+      X0[1] += rhs[1];
+      X0[2] += rhs[2];
+      X0[3] += rhs[3];
+      norm = compute_residual_and_jacobian(xci, Ri, shapei, blocki, flagi, xcj, Rj, shapej, blockj, flagj, X0, shapefunc, residual, jacobian, formulation, avg_radius_i, avg_radius_j);
+      if (norm < TOL_NR_RES) {
+        converged = true;
+        // must re-compute the normal 'nij' for this final point
+        double xilocal[3], tmp_v[3], gradi[3], hess_dummy[3][3];
+        MathExtra::sub3(X0, xci, tmp_v);
+        MathExtra::transpose_matvec(Ri, tmp_v, xilocal);
+
+        // We only need the gradient for the normal
+        shape_and_derivatives_local(xilocal, shapei, blocki, flagi, tmp_v, hess_dummy);
+        if (formulation == FORMULATION_GEOMETRIC) {
+            // If you use regularization, apply it here too for consistency
+            apply_regularization_shape_function(blocki[0], avg_radius_i, &shapefunc[0], tmp_v, hess_dummy);
+        }
+        MathExtra::matvec(Ri, tmp_v, gradi);
+        MathExtra::normalize3(gradi, nij);
+      }
+
+    } else {
+      X0[0] = X_line[0];
+      X0[1] = X_line[1];
+      X0[2] = X_line[2];
+      X0[3] = X_line[3];
+    }
+
+    if (converged) break;
+  }
+
+  if (!converged) {
+    if (shapefunc[0] > 0.0 || shapefunc[1] > 0.0) return 1;
+    return 2;
+  } // not failing if not converged but shapefuncs positive (i.e., no contact)
+              // might be risky to assume no contact if not converged, NR might have gone to a far away point
+              // but no guarantee there is no contact
+  if (shapefunc[0] > 0.0 || shapefunc[1] > 0.0) return 1;
+  return 0;
+}
+
+/* ----------------------------------------------------------------------
+   Functions to compute shape function and gradient only when called for
+     newton method to avoid computing hessian when not needed and having
+     smoother landscape for the line search
+   General case for n1 != n2 > 2
+------------------------------------------------------------------------- */
+
+double shape_and_gradient_local_superquad_surfacesearch(const double* xlocal, const double* shape, const double* block, double* grad)
+{
+  double a_inv = 1.0 / shape[0];
+  double b_inv = 1.0 / shape[1];
+  double c_inv = 1.0 / shape[2];
+  double x_a = std::fabs(xlocal[0] * a_inv);
+  double y_b = std::fabs(xlocal[1] * b_inv);
+  double z_c = std::fabs(xlocal[2] * c_inv);
+  double n1 = block[0];
+  double n2 = block[1];
+  double x_a_pow_n2_m2 = std::pow(x_a, n2 - 2.0);
+  double x_a_pow_n2_m1 = x_a_pow_n2_m2 * x_a;
+  double y_b_pow_n2_m2 = std::pow(y_b, n2 - 2.0);
+  double y_b_pow_n2_m1 = y_b_pow_n2_m2 * y_b;
+
+  double nu = (x_a_pow_n2_m1 * x_a) + (y_b_pow_n2_m1 * y_b);
+  double nu_pow_n1_n2_m2 = std::pow(nu, n1/n2 - 2.0);
+  double nu_pow_n1_n2_m1 = nu_pow_n1_n2_m2 * nu;
+
+  double z_c_pow_n1_m2 = std::pow(z_c, n1 -2.0);
+  double z_c_pow_n1_m1 = z_c_pow_n1_m2 * z_c;
+
+  // Equation (14)
+  double signx = xlocal[0] > 0.0 ? 1.0 : -1.0;
+  double signy = xlocal[1] > 0.0 ? 1.0 : -1.0;
+  double signz = xlocal[2] > 0.0 ? 1.0 : -1.0;
+  grad[0] = n1 * a_inv * x_a_pow_n2_m1 * nu_pow_n1_n2_m1 * signx;
+  grad[1] = n1 * b_inv * y_b_pow_n2_m1 * nu_pow_n1_n2_m1 * signy;
+  grad[2] = n1 * c_inv * z_c_pow_n1_m1 * signz;
+
+  double F = (nu_pow_n1_n2_m1 * nu) + (z_c_pow_n1_m1 * z_c);
+
+  double scale_factor = std::pow(F, 1.0/n1 -1.0) / n1;
+
+  grad[0] *= scale_factor;
+  grad[1] *= scale_factor;
+  grad[2] *= scale_factor;
+
+  return std::pow(F, 1.0/n1) - 1.0;
+}
+
+/* ----------------------------------------------------------------------
+   Special case for n2 = n2 = n > 2
+------------------------------------------------------------------------- */
+
+double shape_and_gradient_local_n1equaln2_surfacesearch(const double* xlocal, const double* shape, const double n, double* grad)
+{
+  double a_inv = 1.0 / shape[0];
+  double b_inv = 1.0 / shape[1];
+  double c_inv = 1.0 / shape[2];
+  double x_a = std::fabs(xlocal[0] * a_inv);
+  double y_b = std::fabs(xlocal[1] * b_inv);
+  double z_c = std::fabs(xlocal[2] * c_inv);
+  double x_a_pow_n_m2 = std::pow(x_a, n - 2.0);
+  double x_a_pow_n_m1 = x_a_pow_n_m2 * x_a;
+  double y_b_pow_n_m2 = std::pow(y_b, n - 2.0);
+  double y_b_pow_n_m1 = y_b_pow_n_m2 * y_b;
+  double z_c_pow_n_m2 = std::pow(z_c, n - 2.0);
+  double z_c_pow_n_m1 = z_c_pow_n_m2 * z_c;
+
+  // Equation (14)
+  double signx = xlocal[0] > 0.0 ? 1.0 : -1.0;
+  double signy = xlocal[1] > 0.0 ? 1.0 : -1.0;
+  double signz = xlocal[2] > 0.0 ? 1.0 : -1.0;
+  grad[0] = n * a_inv * x_a_pow_n_m1 * signx;
+  grad[1] = n * b_inv * y_b_pow_n_m1 * signy;
+  grad[2] = n * c_inv * z_c_pow_n_m1 * signz;
+
+  double F = (x_a_pow_n_m1 * x_a) + (y_b_pow_n_m1 * y_b) + (z_c_pow_n_m1 * z_c);
+  double scale_factor = std::pow(F, 1.0/n -1.0) / n;
+
+  grad[0] *= scale_factor;
+  grad[1] *= scale_factor;
+  grad[2] *= scale_factor;
+
+  return std::pow(F, 1.0/n) - 1.0;
+}
+
+/* ----------------------------------------------------------------------
+   Newton Rapson method to find the overlap distance from the contact point given the normal
+------------------------------------------------------------------------- */
+
+double compute_overlap_distance(
+  const double* shape, const double* block, const double Rot[3][3], const int flag,
+  const double* global_point, const double* global_normal,
+  const double* center)
+{
+  double local_point[3], local_normal[3];
+  double del[3];
+  double overlap;
+  MathExtra::sub3(global_point, center, del);  // bring origin to 0.0
+  MathExtra::transpose_matvec(Rot, del, local_point);
+  MathExtra::transpose_matvec(Rot, global_normal, local_normal);
+
+  double local_f;
+  double local_grad[3];
+
+  // elliposid analytical solution, might need to double check the math
+  // there is an easy way to find this by parametrizing the straight line as
+  // X0 + t * n anf then substituting in the ellipsoid equation  for x, y, z
+  // this results in a quadratic equation and we take the positive solution since
+  // we are taking the outward facing normal for each grain
+
+  if (flag == 0) {
+
+    double a_inv2 = 1.0 / (shape[0] * shape[0]);
+    double b_inv2 = 1.0 / (shape[1] * shape[1]);
+    double c_inv2 = 1.0 / (shape[2] * shape[2]);
+
+    // Coefficients for At^2 + Bt + C = 0
+    double A = (local_normal[0] * local_normal[0] * a_inv2) +
+               (local_normal[1] * local_normal[1] * b_inv2) +
+               (local_normal[2] * local_normal[2] * c_inv2);
+
+    double B = 2.0 * ( (local_point[0] * local_normal[0] * a_inv2) +
+                     (local_point[1] * local_normal[1] * b_inv2) +
+                     (local_point[2] * local_normal[2] * c_inv2) );
+
+    double C = (local_point[0] * local_point[0] * a_inv2) +
+               (local_point[1] * local_point[1] * b_inv2) +
+               (local_point[2] * local_point[2] * c_inv2) - 1.0;
+
+    // Discriminant
+    double delta = B*B - 4.0*A*C;
+
+    // Clamp delta to zero just in case numerical noise makes it negative
+    if (delta < 0.0) delta = 0.0;
+    overlap = (-B + std::sqrt(delta)) / (2.0 * A);
+  } else {
+    // --- Superquadric Case (Newton-Raphson on Distance Estimator) ---
+
+    overlap = 0.0; // Distance along the normal
+    double current_p[3];
+    double val;
+    for (unsigned int iter = 0; iter < ITERMAX_OVERLAP; iter++) {
+      // Update current search position: P = Start + t * Normal
+      current_p[0] = local_point[0] + overlap * local_normal[0];
+      current_p[1] = local_point[1] + overlap * local_normal[1];
+      current_p[2] = local_point[2] + overlap * local_normal[2];
+
+      // Calculate Distance Estimator value and Gradient
+      if (flag == 1) {
+        val = shape_and_gradient_local_n1equaln2_surfacesearch(current_p, shape, block[0], local_grad);
+      } else {
+        val = shape_and_gradient_local_superquad_surfacesearch(current_p, shape, block, local_grad);
+      }
+
+      // Convergence Check
+      if (std::fabs(val) < TOL_OVERLAP) break;
+
+      // Newton Step
+      double slope = local_grad[0] * local_normal[0] +
+                     local_grad[1] * local_normal[1] +
+                     local_grad[2] * local_normal[2];
+
+      // Safety check to prevent divide-by-zero if ray grazes surface
+      if (std::fabs(slope) < MINSLOPE_OVERLAP) break;
+
+      overlap -= val / slope;
+    }
+  }
+  return overlap;
+}
+
+} // namespace MathExtraSuperellipsoids
diff --git a/src/ASPHERE/math_extra_superellipsoids.h b/src/ASPHERE/math_extra_superellipsoids.h
new file mode 100644
index 00000000000..375f7c40ccd
--- /dev/null
+++ b/src/ASPHERE/math_extra_superellipsoids.h
@@ -0,0 +1,834 @@
+/* -*- c++ -*- ----------------------------------------------------------
+   LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
+   https://www.lammps.org/, Sandia National Laboratories
+   LAMMPS development team: developers@lammps.org
+
+   Copyright (2003) Sandia Corporation.  Under the terms of Contract
+   DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
+   certain rights in this software.  This software is distributed under
+   the GNU General Public License.
+
+   See the README file in the top-level LAMMPS directory.
+------------------------------------------------------------------------- */
+
+/* ----------------------------------------------------------------------
+    Contributing author: Jacopo Bilotto (EPFL), Jibril B. Coulibaly
+------------------------------------------------------------------------- */
+
+#ifndef LMP_MATH_EXTRA_SUPERELLIPOIDS_H
+#define LMP_MATH_EXTRA_SUPERELLIPOIDS_H
+
+#include "math_extra.h"
+
+#include <cmath>
+#include <iostream>
+#include <limits>
+
+namespace MathExtraSuperellipsoids {
+
+enum ContactFormulation { FORMULATION_ALGEBRAIC = 0, FORMULATION_GEOMETRIC = 1 };
+
+enum CurvatureModel { CURV_MEAN = 0, CURV_GAUSSIAN = 1 };
+
+// needed for shape functions grad and matrix
+void global2local_vector(const double v[3], const double *quat, double local_v[3]);
+
+inline double det4_M44_zero(const double m[4][4]);
+
+// 4 by 4 sytems solvers, they all overwrite b with the solution
+inline bool solve_4x4_robust_unrolled(double A[16], double b[4]);
+
+inline int check_oriented_bounding_boxes(const double *xc1, const double R1[3][3],
+                                          const double *shape1, const double *xc2,
+                                          const double R2[3][3], const double *shape2,
+                                          int cached_axis);
+
+inline bool check_intersection_axis(const int axis_id, const double C[3][3],
+                                    const double AbsC[3][3], const double *center_distance_box1,
+                                    const double *center_distance_box2, const double *a,
+                                    const double *b);
+
+inline bool check_intersection_axis_and_get_seed(const double *xc1, const double R1[3][3],
+                                                 const double *shape1, const double *xc2,
+                                                 const double R2[3][3], const double *shape2,
+                                                 double *cached_axis, double *contact_point);
+
+inline int determine_contact_point_wall(const double *xci, const double Ri[3][3],
+                                        const double *shapei, const double *blocki, const int flagi,
+                                        const double *x_wall, const double *n_wall, double *X0,
+                                        double *nij, double *overlap);
+
+// shape function computations, using flag to optimize for special cases (ellipsoid, superquadric with n1=n2)
+double shape_and_derivatives_local(const double *xlocal, const double *shape, const double *block,
+                                   const int flag, double *grad, double hess[3][3]);
+double shape_and_derivatives_local_superquad(const double *xlocal, const double *shape,
+                                             const double *block, double *grad, double hess[3][3]);
+double shape_and_derivatives_local_n1equaln2(const double *xlocal, const double *shape,
+                                             const double n, double *grad, double hess[3][3]);
+double shape_and_derivatives_local_ellipsoid(const double *xlocal, const double *shape,
+                                             double *grad, double hess[3][3]);
+double shape_and_derivatives_global(const double *xc, const double R[3][3], const double *shape,
+                                    const double *block, const int flag, const double *X0,
+                                    double *grad, double hess[3][3], const int formulation,
+                                    const double avg_radius);
+
+double compute_residual(const double shapefunci, const double *gradi_global,
+                        const double shapefuncj, const double *gradj_global, const double mu2,
+                        double *residual, const int formulation, const double radius_scale);
+void compute_jacobian(const double *gradi_global, const double hessi_global[3][3],
+                      const double *gradj_global, const double hessj_global[3][3], const double mu2,
+                      double *jacobian);
+double compute_residual_and_jacobian(const double *xci, const double Ri[3][3], const double *shapei,
+                                     const double *blocki, const int flagi, const double *xcj,
+                                     const double Rj[3][3], const double *shapej,
+                                     const double *blockj, const int flagj, const double *X,
+                                     double *shapefunc, double *residual, double *jacobian,
+                                     const int formulation, const double avg_radius_i,
+                                     const double avg_radius_j);
+int determine_contact_point(const double *xci, const double Ri[3][3], const double *shapei,
+                            const double *blocki, const int flagi, const double *xcj,
+                            const double Rj[3][3], const double *shapej, const double *blockj,
+                            const int flagj, double *X0, double *nij, const int formulation);
+
+void apply_regularization_shape_function(double n1, const double avg_radius, double *value,
+                                         double *grad, double hess[3][3]);
+// functions to compute shape function and gradient only when called for surface point calculation given contact point
+double shape_and_gradient_local_superquad_surfacesearch(const double *xlocal, const double *shape,
+                                                        const double *block, double *grad);
+double shape_and_gradient_local_n1equaln2_surfacesearch(const double *xlocal, const double *shape,
+                                                        const double n, double *grad);
+
+double compute_overlap_distance(const double *shape, const double *block, const double Rot[3][3],
+                                const int flag, const double *global_point,
+                                const double *global_normal, const double *center);
+
+double mean_curvature_superellipsoid(const double *shape, const double *block, const int flag,
+                                     const double R[3][3], const double *surf_global_point,
+                                     const double *xc);
+double gaussian_curvature_superellipsoid(const double *shape, const double *block, const int flag,
+                                         const double R[3][3], const double *surf_global_point,
+                                         const double *xc);
+
+};    // namespace MathExtraSuperellipsoids
+
+/* ----------------------------------------------------------------------
+   determinant of a 4x4 matrix M with M[3][3] assumed to be zero
+------------------------------------------------------------------------- */
+
+inline double MathExtraSuperellipsoids::det4_M44_zero(const double m[4][4])
+{
+  // Define the 3x3 submatrices (M_41, M_42, M_43)
+
+  // Submatrix M_41
+  double m41[3][3] = {
+      {m[0][1], m[0][2], m[0][3]}, {m[1][1], m[1][2], m[1][3]}, {m[2][1], m[2][2], m[2][3]}};
+
+  // Submatrix M_42
+  double m42[3][3] = {
+      {m[0][0], m[0][2], m[0][3]}, {m[1][0], m[1][2], m[1][3]}, {m[2][0], m[2][2], m[2][3]}};
+
+  // Submatrix M_43
+  double m43[3][3] = {
+      {m[0][0], m[0][1], m[0][3]}, {m[1][0], m[1][1], m[1][3]}, {m[2][0], m[2][1], m[2][3]}};
+
+  // Calculate the determinant using the simplified Laplace expansion (M_44=0)
+  // det(M) = -M[3][0]*det(M_41) + M[3][1]*det(M_42) - M[3][2]*det(M_43)
+
+  double ans = -m[3][0] * MathExtra::det3(m41) + m[3][1] * MathExtra::det3(m42) -
+      m[3][2] * MathExtra::det3(m43);
+
+  return ans;
+}
+
+inline bool MathExtraSuperellipsoids::solve_4x4_robust_unrolled(double A[16], double b[4])
+{
+  // --- COLUMN 0 ---
+  // 1. Find Pivot in Col 0
+  int p = 0;
+  double max_val = std::abs(A[0]);
+  double val;
+
+  val = std::abs(A[4]);
+  if (val > max_val) {
+    max_val = val;
+    p = 1;
+  }
+  val = std::abs(A[8]);
+  if (val > max_val) {
+    max_val = val;
+    p = 2;
+  }
+  val = std::abs(A[12]);
+  if (val > max_val) {
+    max_val = val;
+    p = 3;
+  }
+
+  if (max_val <= 0.0) return false;
+  // 2. Swap Row 0 with Row p
+  if (p != 0) {
+    int row_offset = p * 4;
+    std::swap(b[0], b[p]);
+    std::swap(A[0], A[row_offset]);
+    std::swap(A[1], A[row_offset + 1]);
+    std::swap(A[2], A[row_offset + 2]);
+    std::swap(A[3], A[row_offset + 3]);
+  }
+
+  // 3. Eliminate Col 0
+  {
+    double inv = 1.0 / A[0];
+    // Row 1
+    double f1 = A[4] * inv;
+    A[5] -= f1 * A[1];
+    A[6] -= f1 * A[2];
+    A[7] -= f1 * A[3];
+    b[1] -= f1 * b[0];
+    // Row 2
+    double f2 = A[8] * inv;
+    A[9] -= f2 * A[1];
+    A[10] -= f2 * A[2];
+    A[11] -= f2 * A[3];
+    b[2] -= f2 * b[0];
+    // Row 3
+    double f3 = A[12] * inv;
+    A[13] -= f3 * A[1];
+    A[14] -= f3 * A[2];
+    A[15] -= f3 * A[3];
+    b[3] -= f3 * b[0];
+  }
+
+  // --- COLUMN 1 ---
+  // 1. Find Pivot in Col 1 (starting from row 1)
+  p = 1;
+  max_val = std::abs(A[5]);
+
+  val = std::abs(A[9]);
+  if (val > max_val) {
+    max_val = val;
+    p = 2;
+  }
+  val = std::abs(A[13]);
+  if (val > max_val) {
+    max_val = val;
+    p = 3;
+  }
+
+  if (max_val <= 0.0) return false;
+
+  // 2. Swap Row 1 with Row p
+  if (p != 1) {
+    int row_offset = p * 4;
+    std::swap(b[1], b[p]);
+    // Optimization: Col 0 is already 0, so we only swap cols 1,2,3
+    std::swap(A[5], A[row_offset + 1]);
+    std::swap(A[6], A[row_offset + 2]);
+    std::swap(A[7], A[row_offset + 3]);
+  }
+
+  // 3. Eliminate Col 1
+  {
+    double inv = 1.0 / A[5];
+    // Row 2
+    double f2 = A[9] * inv;
+    A[10] -= f2 * A[6];
+    A[11] -= f2 * A[7];
+    b[2] -= f2 * b[1];
+    // Row 3
+    double f3 = A[13] * inv;
+    A[14] -= f3 * A[6];
+    A[15] -= f3 * A[7];
+    b[3] -= f3 * b[1];
+  }
+
+  // --- COLUMN 2 ---
+  // 1. Find Pivot in Col 2 (starting from row 2)
+  p = 2;
+  max_val = std::abs(A[10]);
+
+  val = std::abs(A[14]);
+  if (val > max_val) {
+    max_val = val;
+    p = 3;
+  }
+
+  if (max_val <= 0.0) return false;
+
+  // 2. Swap Row 2 with Row p
+  if (p != 2) {
+    std::swap(b[2], b[3]);
+    // Optimization: Only swap cols 2,3
+    std::swap(A[10], A[14]);
+    std::swap(A[11], A[15]);
+  }
+
+  // 3. Eliminate Col 2
+  {
+    double inv = 1.0 / A[10];
+    // Row 3
+    double f3 = A[14] * inv;
+    A[15] -= f3 * A[11];
+    b[3] -= f3 * b[2];
+  }
+
+  // --- BACKWARD SUBSTITUTION ---
+  // Check last pivot
+  if (std::abs(A[15]) <= 0.0) return false;
+
+  double inv3 = 1.0 / A[15];
+  b[3] *= inv3;
+
+  double inv2 = 1.0 / A[10];
+  b[2] = (b[2] - A[11] * b[3]) * inv2;
+
+  double inv1 = 1.0 / A[5];
+  b[1] = (b[1] - A[6] * b[2] - A[7] * b[3]) * inv1;
+
+  double inv0 = 1.0 / A[0];
+  b[0] = (b[0] - A[1] * b[1] - A[2] * b[2] - A[3] * b[3]) * inv0;
+
+  return true;
+}
+
+/* ----------------------------------------------------------------------
+   Oriented Bounding Box intersection test
+     Logic and optimization strategies adapted from LIGGGHTS (CFDEMproject)
+     See: src/math_extra_liggghts_nonspherical.cpp in LIGGGHTS distribution
+     This implementation uses the "cached separating axis" optimization for temporal coherence
+     Algorithm from https://www.geometrictools.com/Documentation/DynamicCollisionDetection.pdf
+------------------------------------------------------------------------- */
+
+inline int MathExtraSuperellipsoids::check_oriented_bounding_boxes(
+    const double *xc1, const double R1[3][3], const double *shape1, const double *xc2,
+    const double R2[3][3], const double *shape2, int axis)
+{
+  // return -1 to skip contact detection
+
+  bool separated = false;
+
+  // for orientated bounding boxes we check the 15 separating axes
+  double C[3][3], AbsC[3][3];
+  MathExtra::transpose_times3(R1, R2, C);    // C = R1^T * R2
+  for (unsigned int i = 0; i < 3; i++) {
+    for (unsigned int j = 0; j < 3; j++) {
+      AbsC[i][j] = std::fabs(C[i][j]);    // for when absolute values are needed
+    }
+  }
+
+  double center_distance[3];
+  for (unsigned int i = 0; i < 3; i++) { center_distance[i] = xc2[i] - xc1[i]; }
+
+  // Project center distance into both local frames
+  double center_distance_box1[3], center_distance_box2[3];
+  MathExtra::transpose_matvec(R1, center_distance, center_distance_box1);
+  MathExtra::transpose_matvec(R2, center_distance, center_distance_box2);
+
+  // first check the cached axis, for temporal coherence
+  separated = check_intersection_axis(axis, C, AbsC, center_distance_box1, center_distance_box2,
+                                      shape1, shape2);
+
+  if (separated) return axis;
+  // then check all the other axes
+  for (int axis_id = 0; axis_id < 15; axis_id++) {
+    if (axis_id == axis) continue;    // already checked
+    separated = check_intersection_axis(axis_id, C, AbsC, center_distance_box1,
+                                        center_distance_box2, shape1, shape2);
+    if (separated)
+      return axis_id; // update cached axis
+  }
+  return -1;    // no separation found
+}
+
+/* ---------------------------------------------------------------------- */
+
+inline bool MathExtraSuperellipsoids::check_intersection_axis(const int axis_id,
+                                                              const double C[3][3],
+                                                              const double AbsC[3][3],
+                                                              const double *center_distance_box1,
+                                                              const double *center_distance_box2,
+                                                              const double *a, const double *b)
+{
+  // here axis_id goes from 0 to 14
+  // a and b are the half-sizes of the boxes along their local axes
+  // returns true if there is a separation along this axis
+  // changes the cached axis if separation found
+  double R1, R2, R;
+
+  switch (axis_id) {
+    case 0:    // A0
+      R1 = a[0];
+      R2 = b[0] * AbsC[0][0] + b[1] * AbsC[0][1] + b[2] * AbsC[0][2];
+      R = std::fabs(center_distance_box1[0]);
+      break;
+    case 1:    // A1
+      R1 = a[1];
+      R2 = b[0] * AbsC[1][0] + b[1] * AbsC[1][1] + b[2] * AbsC[1][2];
+      R = std::fabs(center_distance_box1[1]);
+      break;
+    case 2:    // A2
+      R1 = a[2];
+      R2 = b[0] * AbsC[2][0] + b[1] * AbsC[2][1] + b[2] * AbsC[2][2];
+      R = std::fabs(center_distance_box1[2]);
+      break;
+    case 3:    // B0
+      R1 = a[0] * AbsC[0][0] + a[1] * AbsC[1][0] + a[2] * AbsC[2][0];
+      R2 = b[0];
+      R = std::fabs(center_distance_box2[0]);
+      break;
+    case 4:    // B1
+      R1 = a[0] * AbsC[0][1] + a[1] * AbsC[1][1] + a[2] * AbsC[2][1];
+      R2 = b[1];
+      R = std::fabs(center_distance_box2[1]);
+      break;
+    case 5:    // B2
+      R1 = a[0] * AbsC[0][2] + a[1] * AbsC[1][2] + a[2] * AbsC[2][2];
+      R2 = b[2];
+      R = std::fabs(center_distance_box2[2]);
+      break;
+    case 6:    // A0 x B0
+      R1 = a[1] * AbsC[2][0] + a[2] * AbsC[1][0];
+      R2 = b[1] * AbsC[0][2] + b[2] * AbsC[0][1];
+      R = std::fabs(center_distance_box1[2] * C[1][0] - center_distance_box1[1] * C[2][0]);
+      break;
+    case 7:    // A0 x B1
+      R1 = a[1] * AbsC[2][1] + a[2] * AbsC[1][1];
+      R2 = b[0] * AbsC[0][2] + b[2] * AbsC[0][0];
+      R = std::fabs(center_distance_box1[2] * C[1][1] - center_distance_box1[1] * C[2][1]);
+      break;
+    case 8:    // A0 x B2
+      R1 = a[1] * AbsC[2][2] + a[2] * AbsC[1][2];
+      R2 = b[0] * AbsC[0][1] + b[1] * AbsC[0][0];
+      R = std::fabs(center_distance_box1[2] * C[1][2] - center_distance_box1[1] * C[2][2]);
+      break;
+    case 9:    // A1 x B0
+      R1 = a[0] * AbsC[2][0] + a[2] * AbsC[0][0];
+      R2 = b[1] * AbsC[1][2] + b[2] * AbsC[1][1];
+      R = std::fabs(center_distance_box1[0] * C[2][0] - center_distance_box1[2] * C[0][0]);
+      break;
+    case 10:    // A1 x B1
+      R1 = a[0] * AbsC[2][1] + a[2] * AbsC[0][1];
+      R2 = b[0] * AbsC[1][2] + b[2] * AbsC[1][0];
+      R = std::fabs(center_distance_box1[0] * C[2][1] - center_distance_box1[2] * C[0][1]);
+      break;
+    case 11:    // A1 x B2
+      R1 = a[0] * AbsC[2][2] + a[2] * AbsC[0][2];
+      R2 = b[0] * AbsC[1][1] + b[1] * AbsC[1][0];
+      R = std::fabs(center_distance_box1[0] * C[2][2] - center_distance_box1[2] * C[0][2]);
+      break;
+    case 12:    // A2 x B0
+      R1 = a[0] * AbsC[1][0] + a[1] * AbsC[0][0];
+      R2 = b[1] * AbsC[2][2] + b[2] * AbsC[2][1];
+      R = std::fabs(center_distance_box1[1] * C[0][0] - center_distance_box1[0] * C[1][0]);
+      break;
+    case 13:    // A2 x B1
+      R1 = a[0] * AbsC[1][1] + a[1] * AbsC[0][1];
+      R2 = b[0] * AbsC[2][2] + b[2] * AbsC[2][0];
+      R = std::fabs(center_distance_box1[1] * C[0][1] - center_distance_box1[0] * C[1][1]);
+      break;
+    case 14:    // A2 x B2
+      R1 = a[0] * AbsC[1][2] + a[1] * AbsC[0][2];
+      R2 = b[0] * AbsC[2][1] + b[1] * AbsC[2][0];
+      R = std::fabs(center_distance_box1[1] * C[0][2] - center_distance_box1[0] * C[1][2]);
+      break;
+  }
+
+  if (R > R1 + R2) {
+    return true;    // separation found
+  } else {
+    return false;    // no separation
+  }
+}
+
+/* ---------------------------------------------------------------------- */
+
+inline bool MathExtraSuperellipsoids::check_intersection_axis_and_get_seed(
+    const double *xc1, const double R1[3][3], const double *shape1, const double *xc2,
+    const double R2[3][3], const double *shape2, double *cached_axis, double *contact_point)
+{
+  // cache axis is the axis that separated the boxes last time
+  // due to temporal coherence we check it first
+
+  double C[3][3], AbsC[3][3];
+  MathExtra::transpose_times3(R1, R2, C);    // C = R1^T * R2
+
+  // for orientated bounding boxes we check the 15 separating axes
+  const double eps = 1e-20;
+  for (unsigned int i = 0; i < 3; i++) {
+    for (unsigned int j = 0; j < 3; j++) {
+      // Add epsilon to prevent division by zero in edge cases
+      AbsC[i][j] = std::fabs(C[i][j]) + eps;
+    }
+  }
+
+  double center_distance[3];    // Center distance in Global Frame
+  for (unsigned int i = 0; i < 3; i++) { center_distance[i] = xc2[i] - xc1[i]; }
+
+  // Project center distance into both local frames
+  double center_distance_box1[3], center_distance_box2[3];
+  MathExtra::transpose_matvec(R1, center_distance, center_distance_box1);
+  MathExtra::transpose_matvec(R2, center_distance, center_distance_box2);
+
+  int best_axis = -1;
+  double min_overlap = std::numeric_limits<double>::max();
+  const double edge_bias = 1.05;    // Prefer face contacts over edge contacts
+
+  // Lambda to test an axis. Returns TRUE if SEPARATED.
+  // I was reading that lambdas can be optimized away by the compiler.
+  // and have less overhead than function calls.
+  auto test_axis_separated = [&](int i) -> bool {
+    double R1_rad, R2_rad, dist, overlap;
+
+    // Switch is efficient here; compiler generates a jump table.
+    switch (i) {
+      case 0:    // A0
+        R1_rad = shape1[0];
+        R2_rad = shape2[0] * AbsC[0][0] + shape2[1] * AbsC[0][1] + shape2[2] * AbsC[0][2];
+        dist = std::fabs(center_distance_box1[0]);
+        break;
+      case 1:    // A1
+        R1_rad = shape1[1];
+        R2_rad = shape2[0] * AbsC[1][0] + shape2[1] * AbsC[1][1] + shape2[2] * AbsC[1][2];
+        dist = std::fabs(center_distance_box1[1]);
+        break;
+      case 2:    // A2
+        R1_rad = shape1[2];
+        R2_rad = shape2[0] * AbsC[2][0] + shape2[1] * AbsC[2][1] + shape2[2] * AbsC[2][2];
+        dist = std::fabs(center_distance_box1[2]);
+        break;
+      case 3:    // B0
+        R1_rad = shape1[0] * AbsC[0][0] + shape1[1] * AbsC[1][0] + shape1[2] * AbsC[2][0];
+        R2_rad = shape2[0];
+        dist = std::fabs(center_distance_box2[0]);
+        break;
+      case 4:    // B1
+        R1_rad = shape1[0] * AbsC[0][1] + shape1[1] * AbsC[1][1] + shape1[2] * AbsC[2][1];
+        R2_rad = shape2[1];
+        dist = std::fabs(center_distance_box2[1]);
+        break;
+      case 5:    // B2
+        R1_rad = shape1[0] * AbsC[0][2] + shape1[1] * AbsC[1][2] + shape1[2] * AbsC[2][2];
+        R2_rad = shape2[2];
+        dist = std::fabs(center_distance_box2[2]);
+        break;
+      case 6:    // A0 x B0
+        R1_rad = shape1[1] * AbsC[2][0] + shape1[2] * AbsC[1][0];
+        R2_rad = shape2[1] * AbsC[0][2] + shape2[2] * AbsC[0][1];
+        dist = std::fabs(center_distance_box1[2] * C[1][0] - center_distance_box1[1] * C[2][0]);
+        break;
+      case 7:    // A0 x B1
+        R1_rad = shape1[1] * AbsC[2][1] + shape1[2] * AbsC[1][1];
+        R2_rad = shape2[0] * AbsC[0][2] + shape2[2] * AbsC[0][0];
+        dist = std::fabs(center_distance_box1[2] * C[1][1] - center_distance_box1[1] * C[2][1]);
+        break;
+      case 8:    // A0 x B2
+        R1_rad = shape1[1] * AbsC[2][2] + shape1[2] * AbsC[1][2];
+        R2_rad = shape2[0] * AbsC[0][1] + shape2[1] * AbsC[0][0];
+        dist = std::fabs(center_distance_box1[2] * C[1][2] - center_distance_box1[1] * C[2][2]);
+        break;
+      case 9:    // A1 x B0
+        R1_rad = shape1[0] * AbsC[2][0] + shape1[2] * AbsC[0][0];
+        R2_rad = shape2[1] * AbsC[1][2] + shape2[2] * AbsC[1][1];
+        dist = std::fabs(center_distance_box1[0] * C[2][0] - center_distance_box1[2] * C[0][0]);
+        break;
+      case 10:    // A1 x B1
+        R1_rad = shape1[0] * AbsC[2][1] + shape1[2] * AbsC[0][1];
+        R2_rad = shape2[0] * AbsC[1][2] + shape2[2] * AbsC[1][0];
+        dist = std::fabs(center_distance_box1[0] * C[2][1] - center_distance_box1[2] * C[0][1]);
+        break;
+      case 11:    // A1 x B2
+        R1_rad = shape1[0] * AbsC[2][2] + shape1[2] * AbsC[0][2];
+        R2_rad = shape2[0] * AbsC[1][1] + shape2[1] * AbsC[1][0];
+        dist = std::fabs(center_distance_box1[0] * C[2][2] - center_distance_box1[2] * C[0][2]);
+        break;
+      case 12:    // A2 x B0
+        R1_rad = shape1[0] * AbsC[1][0] + shape1[1] * AbsC[0][0];
+        R2_rad = shape2[1] * AbsC[2][2] + shape2[2] * AbsC[2][1];
+        dist = std::fabs(center_distance_box1[1] * C[0][0] - center_distance_box1[0] * C[1][0]);
+        break;
+      case 13:    // A2 x B1
+        R1_rad = shape1[0] * AbsC[1][1] + shape1[1] * AbsC[0][1];
+        R2_rad = shape2[0] * AbsC[2][2] + shape2[2] * AbsC[2][0];
+        dist = std::fabs(center_distance_box1[1] * C[0][1] - center_distance_box1[0] * C[1][1]);
+        break;
+      case 14:    // A2 x B2
+        R1_rad = shape1[0] * AbsC[1][2] + shape1[1] * AbsC[0][2];
+        R2_rad = shape2[0] * AbsC[2][1] + shape2[1] * AbsC[2][0];
+        dist = std::fabs(center_distance_box1[1] * C[0][2] - center_distance_box1[0] * C[1][2]);
+        break;
+      default:
+        return false;
+    }
+
+    if (dist > R1_rad + R2_rad) return true;    // Separated!
+
+    // If not separated, track the overlap depth
+    overlap = (R1_rad + R2_rad) - dist;
+
+    // Bias: Penalize edge axes slightly to prefer stable face contacts
+    if (i >= 6) overlap *= edge_bias;
+
+    if (overlap < min_overlap) {
+      min_overlap = overlap;
+      best_axis = i;
+    }
+    return false;    // Not separated
+  };
+
+  // Check Cached Axis First (Temporal Coherence)
+  int c_axis = (int) (*cached_axis);
+  if (test_axis_separated(c_axis)) return false;
+
+  // Check remaining axes
+  for (int i = 0; i < 15; i++) {
+    if (i == c_axis) continue;
+    if (test_axis_separated(i)) {
+      *cached_axis = (double) i;
+      return false;
+    }
+  }
+
+  // If we reached here, 'best_axis' holds the axis index where the overlap is minimal
+  if (best_axis < 6) {
+    // Face-to-Face contact logic: Project "Incident" box onto "Reference" face, clip to find overlap center.
+    // Pointers to define who is Reference (the face) and who is Incident
+    const double *posRef = xc1;
+    const double *posInc = xc2;
+    const double(*RRef)[3] = R1;
+    const double(*RInc)[3] = R2;
+    const double *shapeRef = shape1;
+    const double *shapeInc = shape2;
+    double *D_local_Ref = center_distance_box1;    // Center dist in Ref frame
+
+    int axis = best_axis;
+
+    // Swap if Reference is Box 2 (Indices 3, 4, 5)
+    if (best_axis >= 3) {
+      posRef = xc2;
+      posInc = xc1;
+      RRef = R2;
+      RInc = R1;
+      shapeRef = shape2;
+      shapeInc = shape1;
+      D_local_Ref = center_distance_box2;
+      axis -= 3;
+    }
+
+    double seed_local[3];
+
+    //Normal Component: Midway through the penetration depth
+    // Calculate projected radius of Incident block onto this axis
+
+    double dir = (D_local_Ref[axis] > 0) ? 1.0 : -1.0;
+    double radInc_proj = 0.0;
+    for (int k = 0; k < 3; k++) {
+      // If swapped (Box 2 is Ref), we need AbsC^T, so we swap AbsC indices
+      double val = (best_axis < 3) ? AbsC[axis][k] : AbsC[k][axis];
+      radInc_proj += shapeInc[k] * val;
+    }
+
+    double surfRef = dir * shapeRef[axis];
+    double surfInc = D_local_Ref[axis] - (dir * radInc_proj);
+    seed_local[axis] = 0.5 * (surfRef + surfInc);
+
+    // Lateral Components: 1D Interval Overlap
+    for (int k = 0; k < 3; k++) {
+      if (k == axis) continue;    // Skip the normal axis
+
+      double minRef = -shapeRef[k];
+      double maxRef = shapeRef[k];
+
+      double radInc = 0.0;
+      for (int j = 0; j < 3; j++) {
+        double val = (best_axis < 3) ? AbsC[k][j] : AbsC[j][k];
+        radInc += shapeInc[j] * val;
+      }
+      double centerInc = D_local_Ref[k];
+
+      double minInc = centerInc - radInc;
+      double maxInc = centerInc + radInc;
+
+      // Find intersection of intervals [minRef, maxRef] and [minInc, maxInc]
+      double start = (minRef > minInc) ? minRef : minInc;
+      double end = (maxRef < maxInc) ? maxRef : maxInc;
+      seed_local[k] = 0.5 * (start + end);    // Midpoint of overlap
+    }
+
+    // Transform Local Seed -> World Space
+    MathExtra::matvec(RRef, seed_local, contact_point);
+    for (int k = 0; k < 3; k++) contact_point[k] += posRef[k];
+  } else {
+    // Edge-to-edge contact logic: Midpoint of the closest points on the two skew edge lines.
+    // The logic is that index 6 corresponds to A_0 x B_0, 7 to A_0 x B_1, ..., 14 to A_2 x B_2
+    int edgeA_idx = (best_axis - 6) / 3;
+    int edgeB_idx = (best_axis - 6) % 3;
+
+    // Get World directions of the edges
+    double u[3] = {R1[0][edgeA_idx], R1[1][edgeA_idx], R1[2][edgeA_idx]};
+    double v[3] = {R2[0][edgeB_idx], R2[1][edgeB_idx], R2[2][edgeB_idx]};
+
+    // Identify the specific edges by checking the normal direction
+    // The normal N is roughly the distance vector center_distance for the closest edges
+    double N_loc1[3], N_loc2[3];
+    MathExtra::transpose_matvec(R1, center_distance, N_loc1);
+    MathExtra::transpose_matvec(R2, center_distance, N_loc2);
+
+    // Find Center of Edge A in World Space
+    double midA[3];
+    for (int k = 0; k < 3; k++) midA[k] = xc1[k];
+    for (int k = 0; k < 3; k++) {
+      if (k == edgeA_idx) continue;
+      // Move to the face pointing towards B
+      double sign = (N_loc1[k] > 0) ? 1.0 : -1.0;
+      double offset = sign * shape1[k];
+      midA[0] += R1[0][k] * offset;
+      midA[1] += R1[1][k] * offset;
+      midA[2] += R1[2][k] * offset;
+    }
+
+    // Find Center of Edge B in World Space
+    double midB[3];
+    for (int k = 0; k < 3; k++) midB[k] = xc2[k];
+    for (int k = 0; k < 3; k++) {
+      if (k == edgeB_idx) continue;
+      // Move to the face pointing away from A (Since center_distance is A->B, we check -N_loc2)
+      double sign = (N_loc2[k] < 0) ? 1.0 : -1.0;
+      double offset = sign * shape2[k];
+      midB[0] += R2[0][k] * offset;
+      midB[1] += R2[1][k] * offset;
+      midB[2] += R2[2][k] * offset;
+    }
+
+    // Closest Points on Two Skew Lines
+    // Line1 parameterized by s: P_A = midA + s*u
+    // Line2 parameterized by t: P_B = midB + t*v
+    double r[3] = {midB[0] - midA[0], midB[1] - midA[1], midB[2] - midA[2]};
+    double u_dot_v = u[0] * v[0] + u[1] * v[1] + u[2] * v[2];
+    double u_dot_r = u[0] * r[0] + u[1] * r[1] + u[2] * r[2];
+    double v_dot_r = v[0] * r[0] + v[1] * r[1] + v[2] * r[2];
+
+    // Denom is 1 - (u.v)^2 because u and v are unit vectors
+    double denom = 1.0 - u_dot_v * u_dot_v + eps;
+    double s = (u_dot_r - u_dot_v * v_dot_r) / denom;
+    double t = (u_dot_v * u_dot_r - v_dot_r) / denom;    // Note: simplified derivation
+
+    // Compute World Points
+    double PA[3] = {midA[0] + s * u[0], midA[1] + s * u[1], midA[2] + s * u[2]};
+    double PB[3] = {midB[0] + t * v[0], midB[1] + t * v[1], midB[2] + t * v[2]};
+
+    // Seed is the midpoint
+    for (int k = 0; k < 3; k++) contact_point[k] = 0.5 * (PA[k] + PB[k]);
+  }
+
+  return true;    // Collision confirmed
+}
+
+inline int MathExtraSuperellipsoids::determine_contact_point_wall(
+    const double *xci, const double Ri[3][3], const double *shapei, const double *blocki,
+    const int flagi, const double *x_wall, const double *n_wall, double *X0, double *nij,
+    double *overlap)
+{
+  //x_wall is a point on the wall TODO: is this actually stored somewhere?
+  // n_wall is the wall normal pointing from wall to particle in the global frame
+  // We might hav to change the fix wall gran files to achieve contact with the wall.
+  // I implemented the function but we might not use it.
+  // Unlike for particle-particle contacts, here we get directly the overlap value.
+
+  double n_local[3];
+  // Transform wall normal into local frame
+  // If n_wall points from Wall->Particle, we want surface normal -n_wall.
+  double n_search[3] = {-n_wall[0], -n_wall[1], -n_wall[2]};
+  MathExtra::transpose_matvec(Ri, n_search, n_local);
+
+  double nx = n_local[0], ny = n_local[1], nz = n_local[2];
+  double a = shapei[0], b = shapei[1], c = shapei[2];
+  double X0_local[3];
+
+  // Calculate Deepest Point
+  if (flagi == 0) {
+    // Ellipsoid
+    double norm = std::sqrt(a * a * nx * nx + b * b * ny * ny + c * c * nz * nz);
+    double inv_norm = (norm > 1e-14) ? 1.0 / norm : 0.0;
+
+    X0_local[0] = a * a * nx * inv_norm;
+    X0_local[1] = b * b * ny * inv_norm;
+    X0_local[2] = c * c * nz * inv_norm;
+  } else {
+    // General Superellipsoid
+    double nx_abs = std::fabs(nx);
+    double ny_abs = std::fabs(ny);
+    double nz_abs = std::fabs(nz);
+    double n1 = blocki[0];
+    double n2 = blocki[1];
+
+    double x, y, z;
+
+    if (nx_abs < 1e-14 && ny_abs < 1e-14) {
+      x = 0.0;
+      y = 0.0;
+      z = c * ((nz > 0) ? 1.0 : -1.0);
+    } else {
+      double p2 = 1.0 / (n2 - 1.0);
+      double p1 = 1.0 / (n1 - 1.0);
+
+      if (nx_abs > ny_abs) {
+        double alpha = std::pow((b * ny_abs) / (a * nx_abs), p2);
+        double gamma = std::pow(1.0 + std::pow(alpha, n2), n1 / n2 - 1.0);
+        double beta = std::pow((c * nz_abs) / (a * nx_abs) * gamma, p1);
+
+        double den =
+            std::pow(std::pow(1.0 + std::pow(alpha, n2), n1 / n2) + std::pow(beta, n1), 1.0 / n1);
+        x = 1.0 / den;
+        y = alpha * x;
+        z = beta * x;
+      } else {
+        double alpha = std::pow((a * nx_abs) / (b * ny_abs), p2);
+        double gamma = std::pow(1.0 + std::pow(alpha, n2), n1 / n2 - 1.0);
+        double beta = std::pow((c * nz_abs) / (b * ny_abs) * gamma, p1);
+
+        double den =
+            std::pow(std::pow(1.0 + std::pow(alpha, n2), n1 / n2) + std::pow(beta, n1), 1.0 / n1);
+        y = 1.0 / den;
+        x = alpha * y;
+        z = beta * y;
+      }
+
+      x *= a;
+      y *= b;
+      z *= c;
+
+      if (n_local[0] < 0) x = -x;
+      if (n_local[1] < 0) y = -y;
+      if (n_local[2] < 0) z = -z;
+    }
+    X0_local[0] = x;
+    X0_local[1] = y;
+    X0_local[2] = z;
+  }
+
+  // Transform to Global Frame
+  MathExtra::matvec(Ri, X0_local, X0);
+  for (int k = 0; k < 3; k++) X0[k] += xci[k];    // Translate to Global Position
+
+  // Set Contact Normal (Always wall normal for plane contacts)
+  nij[0] = n_wall[0];
+  nij[1] = n_wall[1];
+  nij[2] = n_wall[2];
+
+  // Check Overlap
+  double dx = X0[0] - x_wall[0];
+  double dy = X0[1] - x_wall[1];
+  double dz = X0[2] - x_wall[2];
+
+  // Project onto Wall Normal, if dist < 0, the point is "behind" the wall face.
+  double dist = dx * n_wall[0] + dy * n_wall[1] + dz * n_wall[2];
+
+  if (dist < 0.0) {
+    *overlap = -dist;    // Store positive overlap value
+    return 0;            // contact
+  }
+
+  *overlap = 0.0;
+  return 1;    // no contact
+}
+
+#endif
diff --git a/src/ASPHERE/pair_granular_superellipsoid.cpp b/src/ASPHERE/pair_granular_superellipsoid.cpp
new file mode 100644
index 00000000000..15b2841e159
--- /dev/null
+++ b/src/ASPHERE/pair_granular_superellipsoid.cpp
@@ -0,0 +1,1325 @@
+/* ----------------------------------------------------------------------
+   LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
+   https://www.lammps.org/, Sandia National Laboratories
+   LAMMPS development team: developers@lammps.org
+
+   Copyright (2003) Sandia Corporation.  Under the terms of Contract
+   DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
+   certain rights in this software.  This software is distributed under
+   the GNU General Public License.
+
+   See the README file in the top-level LAMMPS directory.
+------------------------------------------------------------------------- */
+/* ----------------------------------------------------------------------
+   Contributing author: Jacopo Bilotto (EPFL), Jibril B. Coulibaly
+------------------------------------------------------------------------- */
+
+#include "pair_granular_superellipsoid.h"
+
+#include "atom.h"
+#include "atom_vec_ellipsoid.h"
+#include "comm.h"
+#include "error.h"
+#include "fix.h"
+#include "fix_dummy.h"
+#include "fix_neigh_history.h"
+#include "force.h"
+#include "math_extra.h"
+#include "math_extra_superellipsoids.h"
+#include "memory.h"
+#include "modify.h"
+#include "neigh_list.h"
+#include "neighbor.h"
+#include "update.h"
+
+#include <cmath>
+#include <cstring>
+#include <iostream>
+
+using namespace LAMMPS_NS;
+using namespace MathExtra;
+
+enum { HOOKE, HERTZ };
+enum { MASS_VELOCITY, VISCOELASTIC };
+enum { CLASSIC, LINEAR_HISTORY };
+
+static constexpr int NUMSTEP_INITIAL_GUESS = 5;
+static constexpr double EPSILON = 1e-10;
+static constexpr double MIN_RADIUS_RATIO = 1e-4;
+static constexpr double MIN_CURVATURE = 1e-12;
+
+/* ---------------------------------------------------------------------- */
+
+PairGranularSuperellipsoid::PairGranularSuperellipsoid(LAMMPS *lmp) : Pair(lmp)
+{
+  single_enable = 1;
+  no_virial_fdotr_compute = 1;
+  centroidstressflag = CENTROID_NOTAVAIL;
+  finitecutflag = 1;
+
+  single_extra = 17;
+  svector = new double[single_extra];
+
+  // Currently only option, generalize if more added
+
+  neighprev = 0;
+  nmax = 0;
+  mass_rigid = nullptr;
+
+  onerad_dynamic = nullptr;
+  onerad_frozen = nullptr;
+  maxrad_dynamic = nullptr;
+  maxrad_frozen = nullptr;
+
+  cutoff_type = nullptr;
+
+  limit_damping = nullptr;
+  normal_model = nullptr;
+  damping_model = nullptr;
+  tangential_model = nullptr;
+
+  kn = nullptr;
+  gamman = nullptr;
+  kt = nullptr;
+  xt = nullptr;
+  xmu = nullptr;
+
+  // set comm size needed by this Pair if used with fix rigid
+
+  comm_forward = 1;
+
+  default_hist_size = 5;
+  size_history = default_hist_size;    // default of 5 values, x0[4] and separating axis
+
+  beyond_contact = 0;
+  nondefault_history_transfer = 1;
+  heat_flag = 0;
+
+  // create dummy fix as placeholder for FixNeighHistory
+  // this is so final order of Modify:fix will conform to input script
+
+  fix_history = nullptr;
+  fix_dummy =
+      dynamic_cast<FixDummy *>(modify->add_fix("NEIGH_HISTORY_GRANULAR_SE_DUMMY all DUMMY"));
+
+  contact_formulation = MathExtraSuperellipsoids::FORMULATION_ALGEBRAIC;
+}
+
+/* ---------------------------------------------------------------------- */
+
+PairGranularSuperellipsoid::~PairGranularSuperellipsoid()
+{
+  delete[] svector;
+
+  if (!fix_history)
+    modify->delete_fix("NEIGH_HISTORY_GRANULAR_SE_DUMMY");
+  else
+    modify->delete_fix("NEIGH_HISTORY_GRANULAR_SE");
+
+  if (allocated) {
+    memory->destroy(setflag);
+    memory->destroy(cutsq);
+    memory->destroy(cutoff_type);
+    memory->destroy(limit_damping);
+    memory->destroy(normal_model);
+    memory->destroy(damping_model);
+    memory->destroy(tangential_model);
+    memory->destroy(kn);
+    memory->destroy(gamman);
+    memory->destroy(kt);
+    memory->destroy(xt);
+    memory->destroy(xmu);
+
+    // model variables
+
+    delete[] onerad_dynamic;
+    delete[] onerad_frozen;
+    delete[] maxrad_dynamic;
+    delete[] maxrad_frozen;
+  }
+
+  memory->destroy(mass_rigid);
+}
+
+/* ---------------------------------------------------------------------- */
+
+void PairGranularSuperellipsoid::compute(int eflag, int vflag)
+{
+  int i, j, k, ii, jj, inum, jnum;
+  double factor_lj, mi, mj;
+
+  int *ilist, *jlist, *numneigh, **firstneigh;
+  int *touch, **firsttouch;
+  double *history, *allhistory, **firsthistory;
+
+  bool touchflag = false;
+  history_update = update->setupflag == 0;
+
+  ev_init(eflag, vflag);
+
+  // update rigid body info for owned & ghost atoms if using FixRigid masses
+  // body[i] = which body atom I is in, -1 if none
+  // mass_body = mass of each rigid body
+
+  if (fix_rigid && neighbor->ago == 0) {
+    int tmp;
+    int *body = (int *) fix_rigid->extract("body", tmp);
+    auto *mass_body = (double *) fix_rigid->extract("masstotal", tmp);
+    if (atom->nmax > nmax) {
+      memory->destroy(mass_rigid);
+      nmax = atom->nmax;
+      memory->create(mass_rigid, nmax, "pair:mass_rigid");
+    }
+    int nlocal = atom->nlocal;
+    for (i = 0; i < nlocal; i++)
+      if (body[i] >= 0)
+        mass_rigid[i] = mass_body[body[i]];
+      else
+        mass_rigid[i] = 0.0;
+    comm->forward_comm(this);
+  }
+
+  tagint *tag = atom->tag;
+  int *type = atom->type;
+  double **x = atom->x;
+  double **v = atom->v;
+  double **f = atom->f;
+  double **angmom = atom->angmom;
+  double **torque = atom->torque;
+  double *radius = atom->radius;
+  double *rmass = atom->rmass;
+
+  int *mask = atom->mask;
+  int nlocal = atom->nlocal;
+  int newton_pair = force->newton_pair;
+  double *special_lj = force->special_lj;
+
+  auto avec_ellipsoid = dynamic_cast<AtomVecEllipsoid *>(atom->style_match("ellipsoid"));
+  AtomVecEllipsoid::BonusSuper *bonus = avec_ellipsoid->bonus_super;
+  int *ellipsoid = atom->ellipsoid;
+
+  inum = list->inum;
+  ilist = list->ilist;
+  numneigh = list->numneigh;
+  firstneigh = list->firstneigh;
+  firsttouch = fix_history->firstflag;
+  firsthistory = fix_history->firstvalue;
+
+  // loop over neighbors of my atoms
+
+  for (ii = 0; ii < inum; ii++) {
+    i = ilist[ii];
+    itype = type[i];
+
+    touch = firsttouch[i];
+    allhistory = firsthistory[i];
+    jlist = firstneigh[i];
+    jnum = numneigh[i];
+
+    for (jj = 0; jj < jnum; jj++) {
+      j = jlist[jj];
+      factor_lj = special_lj[sbmask(j)];
+      j &= NEIGHMASK;
+
+      if (factor_lj == 0) continue;
+
+      jtype = type[j];
+
+      // Reset model and copy initial geometric data
+
+      xi = x[i];
+      xj = x[j];
+      radi = radius[i];
+      radj = radius[j];
+      history_data = &allhistory[size_history * jj];
+      xref = (tag[i] < tag[j]) ? xi : xj;
+      tagi = tag[i];
+      tagj = tag[j];
+      flagi = bonus[ellipsoid[i]].type;
+      flagj = bonus[ellipsoid[j]].type;
+
+      radsum = radi + radj;
+      sub3(xi, xj, dx);
+      rsq = dot3(dx, dx);
+
+      MathExtra::copy3(bonus[ellipsoid[i]].shape, shapei0);
+      MathExtra::copy3(bonus[ellipsoid[j]].shape, shapej0);
+      MathExtra::copy3(bonus[ellipsoid[i]].block, blocki0);
+      MathExtra::copy3(bonus[ellipsoid[j]].block, blockj0);
+      MathExtra::copy3(bonus[ellipsoid[i]].shape, shapei);
+      MathExtra::copy3(bonus[ellipsoid[j]].shape, shapej);
+      MathExtra::copy3(bonus[ellipsoid[i]].block, blocki);
+      MathExtra::copy3(bonus[ellipsoid[j]].block, blockj);
+      MathExtra::quat_to_mat(bonus[ellipsoid[i]].quat, Ri);
+      MathExtra::quat_to_mat(bonus[ellipsoid[j]].quat, Rj);
+
+      touchjj = touch[jj];
+
+      touchflag = check_contact();
+
+      if (!touchflag) {
+        // unset non-touching neighbors
+        touch[jj] = 0;
+        history = &allhistory[size_history * jj];
+        for (k = 0; k < size_history; k++) {
+          if (bounding_box && k == 4) continue;    // Do not delete cached axis information
+          history[k] = 0.0;
+        }
+        continue;
+      }
+
+      touch[jj] = 1;
+
+      // meff = effective mass of pair of particles
+      // if I or J part of rigid body, use body mass
+      // if I or J is frozen, meff is other particle
+      mi = rmass[i];
+      mj = rmass[j];
+      if (fix_rigid) {
+        if (mass_rigid[i] > 0.0) mi = mass_rigid[i];
+        if (mass_rigid[j] > 0.0) mj = mass_rigid[j];
+      }
+      meff = mi * mj / (mi + mj);
+      if (mask[i] & freeze_group_bit) meff = mj;
+      if (mask[j] & freeze_group_bit) meff = mi;
+
+      // Copy additional information and prepare force calculations
+
+      vi = v[i];
+      vj = v[j];
+      angmomi = angmom[i];
+      angmomj = angmom[j];
+      quati = bonus[ellipsoid[i]].quat;
+      quatj = bonus[ellipsoid[j]].quat;
+      inertiai = bonus[ellipsoid[i]].inertia;
+      inertiaj = bonus[ellipsoid[j]].inertia;
+
+      calculate_forces();
+
+      // apply forces & torques
+      scale3(factor_lj, forces);
+      add3(f[i], forces, f[i]);
+
+      scale3(factor_lj, torquesi);
+      add3(torque[i], torquesi, torque[i]);
+
+      if (force->newton_pair || j < nlocal) {
+        sub3(f[j], forces, f[j]);
+        scale3(factor_lj, torquesj);
+        add3(torque[j], torquesj, torque[j]);
+      }
+
+      if (evflag)
+        ev_tally_xyz(i, j, nlocal, force->newton_pair, 0.0, 0.0, forces[0], forces[1], forces[2],
+                     dx[0], dx[1], dx[2]);    // Correct even for non-spherical particles
+    }
+  }
+
+  if (vflag_fdotr) virial_fdotr_compute();
+}
+
+/* ----------------------------------------------------------------------
+   allocate all arrays
+------------------------------------------------------------------------- */
+
+void PairGranularSuperellipsoid::allocate()
+{
+  allocated = 1;
+  int n = atom->ntypes;
+
+  memory->create(setflag, n + 1, n + 1, "pair:setflag");
+  for (int i = 1; i <= n; i++)
+    for (int j = i; j <= n; j++) setflag[i][j] = 0;
+
+  memory->create(cutsq, n + 1, n + 1, "pair:cutsq");
+  memory->create(cutoff_type, n + 1, n + 1, "pair:cutoff_type");
+
+  memory->create(limit_damping, n + 1, n + 1, "pair:limit_damping");
+  memory->create(normal_model, n + 1, n + 1, "pair:normal_model");
+  memory->create(damping_model, n + 1, n + 1, "pair:damping_model");
+  memory->create(tangential_model, n + 1, n + 1, "pair:tangential_model");
+
+  memory->create(kn, n + 1, n + 1, "pair:kn");
+  memory->create(gamman, n + 1, n + 1, "pair:gamman");
+  memory->create(kt, n + 1, n + 1, "pair:kt");
+  memory->create(xt, n + 1, n + 1, "pair:xt");
+  memory->create(xmu, n + 1, n + 1, "pair:xmu");
+
+  onerad_dynamic = new double[n + 1];
+  onerad_frozen = new double[n + 1];
+  maxrad_dynamic = new double[n + 1];
+  maxrad_frozen = new double[n + 1];
+}
+
+/* ----------------------------------------------------------------------
+   global settings
+------------------------------------------------------------------------- */
+
+void PairGranularSuperellipsoid::settings(int narg, char **arg)
+{
+  cutoff_global = -1;    // default: will be set based on particle sizes, model choice
+  curvature_model = MathExtraSuperellipsoids::CURV_MEAN;
+  bounding_box = 1;
+
+  int iarg = 0;
+  while (iarg < narg) {
+    if (strcmp(arg[iarg], "no_bounding_box") == 0) {
+      bounding_box = 0;
+      iarg++;
+    } else if (strcmp(arg[iarg], "geometric") == 0) {
+      contact_formulation = MathExtraSuperellipsoids::FORMULATION_GEOMETRIC;
+      iarg++;
+    } else if (strcmp(arg[iarg], "curvature_gaussian") == 0) {
+      curvature_model = MathExtraSuperellipsoids::CURV_GAUSSIAN;
+      iarg++;
+    } else if (iarg == 0) {
+      // if it is the first argument and not a keyword, assume it is a cutoff
+      cutoff_global = utils::numeric(FLERR, arg[iarg], false, lmp);
+      iarg++;
+    } else
+      error->all(FLERR, "Illegal pair_style command");
+  }
+
+  if (bounding_box == 0) {
+    default_hist_size--;
+    size_history--;
+  }
+}
+
+/* ----------------------------------------------------------------------
+   set coeffs for one or more type pairs
+------------------------------------------------------------------------- */
+
+void PairGranularSuperellipsoid::coeff(int narg, char **arg)
+{
+  double cutoff_one = -1;
+
+  if (narg < 3) error->all(FLERR, "Incorrect args for pair coefficients" + utils::errorurl(21));
+
+  if (!allocated) allocate();
+
+  int ilo, ihi, jlo, jhi;
+  utils::bounds(FLERR, arg[0], 1, atom->ntypes, ilo, ihi, error);
+  utils::bounds(FLERR, arg[1], 1, atom->ntypes, jlo, jhi, error);
+
+  int normal_one, damping_one, tangential_one, limit_one;
+  double kn_one, gamman_one, kt_one, xt_one, xmu_one;
+
+  int iarg = 2;
+  if (strcmp(arg[iarg], "hooke") == 0) {
+    normal_one = HOOKE;
+    if (iarg + 3 > narg) utils::missing_cmd_args(FLERR, "pair granular/superellipsoid", error);
+    kn_one = utils::numeric(FLERR, arg[iarg + 1], false, lmp);
+    gamman_one = utils::numeric(FLERR, arg[iarg + 2], false, lmp);
+    if (kn_one < 0.0 || gamman_one < 0.0) error->all(FLERR, "Illegal linear normal model");
+    iarg += 3;
+  } else if (strcmp(arg[iarg], "hertz") == 0) {
+    normal_one = HERTZ;
+    if (iarg + 3 > narg) utils::missing_cmd_args(FLERR, "pair granular/superellipsoid", error);
+    kn_one = utils::numeric(FLERR, arg[iarg + 1], false, lmp);
+    gamman_one = utils::numeric(FLERR, arg[iarg + 2], false, lmp);
+    if (kn_one < 0.0 || gamman_one < 0.0) error->all(FLERR, "Illegal linear normal model");
+    iarg += 3;
+  } else {
+    error->all(FLERR, "Unknown normal model {}", arg[iarg]);
+  }
+
+  damping_one = -1;
+
+  //Parse optional arguments
+  while (iarg < narg) {
+    if (strcmp(arg[iarg], "tangential") == 0) {
+      iarg++;
+      if (strcmp(arg[iarg], "linear_history") == 0) {
+        tangential_one = LINEAR_HISTORY;
+        if (iarg + 4 > narg) utils::missing_cmd_args(FLERR, "pair granular/superellipsoid", error);
+        kt_one = utils::numeric(FLERR, arg[iarg + 1], false, lmp);
+        xt_one = utils::numeric(FLERR, arg[iarg + 2], false, lmp);
+        xmu_one = utils::numeric(FLERR, arg[iarg + 3], false, lmp);
+        if (kt_one < 0.0 || xt_one < 0.0 || xmu_one < 0.0)
+          error->all(FLERR, "Illegal linear tangential model");
+        iarg += 4;
+      } else if (strcmp(arg[iarg], "classic") == 0) {
+        tangential_one = CLASSIC;
+        if (iarg + 4 > narg) utils::missing_cmd_args(FLERR, "pair granular/superellipsoid", error);
+        kt_one = utils::numeric(FLERR, arg[iarg + 1], false, lmp);
+        xt_one = utils::numeric(FLERR, arg[iarg + 2], false, lmp);
+        xmu_one = utils::numeric(FLERR, arg[iarg + 3], false, lmp);
+        if (kt_one < 0.0 || xt_one < 0.0 || xmu_one < 0.0)
+          error->all(FLERR, "Illegal linear tangential model");
+        iarg += 4;
+      } else {
+        error->all(FLERR, "Unknown tangential model {}", arg[iarg]);
+      }
+    } else if (strcmp(arg[iarg], "damping") == 0) {
+      iarg++;
+      if (strcmp(arg[iarg], "mass_velocity") == 0) {
+        damping_one = MASS_VELOCITY;
+        iarg += 1;
+      } else if (strcmp(arg[iarg], "viscoelastic") == 0) {
+        damping_one = VISCOELASTIC;
+        iarg += 1;
+      } else {
+        error->all(FLERR, "Unknown normal model {}", arg[iarg]);
+      }
+    } else if (strcmp(arg[iarg], "rolling") == 0) {
+      error->all(FLERR, "Rolling models not yet implemented for superellipsoids");
+    } else if (strcmp(arg[iarg], "twisting") == 0) {
+      error->all(FLERR, "Twisting models not yet implemented for superellipsoids");
+    } else if (strcmp(arg[iarg], "heat") == 0) {
+      error->all(FLERR, "Heat models not yet implemented for superellipsoids");
+      heat_flag = 1;
+    } else if (strcmp(arg[iarg], "cutoff") == 0) {
+      if (iarg + 1 >= narg)
+        error->all(FLERR, "Illegal pair_coeff command, not enough parameters for cutoff keyword");
+      cutoff_one = utils::numeric(FLERR, arg[iarg + 1], false, lmp);
+      iarg += 2;
+    } else if (strcmp(arg[iarg], "limit_damping") == 0) {
+      limit_one = 1;
+      iarg += 1;
+    } else
+      error->all(FLERR, "Illegal pair_coeff command {}", arg[iarg]);
+  }
+
+  // Define default damping sub model if unspecified, has no coeffs
+  if (damping_one == -1) damping_one = VISCOELASTIC;
+
+  // granular model init
+  contact_radius_flag = 0;
+  if (normal_one == HERTZ || damping_one == VISCOELASTIC) contact_radius_flag = 1;
+
+  int count = 0;
+  for (int i = ilo; i <= ihi; i++) {
+    for (int j = MAX(jlo, i); j <= jhi; j++) {
+      cutoff_type[i][j] = cutoff_type[j][i] = cutoff_one;
+      limit_damping[i][j] = limit_damping[j][i] = limit_one;
+
+      normal_model[i][j] = normal_model[j][i] = normal_one;
+      damping_model[i][j] = damping_model[j][i] = damping_one;
+      tangential_model[i][j] = tangential_model[j][i] = tangential_one;
+
+      kn[i][j] = kn[j][i] = kn_one;
+      gamman[i][j] = gamman[j][i] = gamman_one;
+
+      kt[i][j] = kt[j][i] = kt_one;
+      xt[i][j] = xt[j][i] = xt_one;
+      xmu[i][j] = xmu[j][i] = xmu_one;
+
+      setflag[i][j] = 1;
+      count++;
+    }
+  }
+
+  if (count == 0) error->all(FLERR, "Incorrect args for pair coefficients" + utils::errorurl(21));
+}
+
+/* ----------------------------------------------------------------------
+   init specific to this pair style
+------------------------------------------------------------------------- */
+
+void PairGranularSuperellipsoid::init_style()
+{
+  int i;
+
+  // error and warning checks
+
+  if (!atom->radius_flag || !atom->rmass_flag || !atom->angmom_flag || !atom->superellipsoid_flag)
+    error->all(FLERR,
+               "Pair granular/superellipsoid requires atom attributes radius, rmass, "
+               "angmom and superellipsoid flag");
+  if (comm->ghost_velocity == 0)
+    error->all(FLERR, "Pair granular/superellipsoid requires ghost atoms store velocity");
+
+  if (heat_flag) {
+    if (!atom->temperature_flag)
+      error->all(FLERR,
+                 "Heat conduction in pair granular/superellipsoid requires atom style with "
+                 "temperature property");
+    if (!atom->heatflow_flag)
+      error->all(FLERR,
+                 "Heat conduction in pair granular/superellipsoid requires atom style with "
+                 "heatflow property");
+  }
+
+  for (i = 0; i < atom->nlocal; i++)
+    if (atom->ellipsoid[i] < 0)
+      error->one(FLERR, "Pair granular/superellipsoid requires all atoms are ellipsoids");
+
+  // need a granular neighbor list
+
+  neighbor->add_request(this, NeighConst::REQ_SIZE | NeighConst::REQ_HISTORY);
+
+  dt = update->dt;
+
+  // grow history for contact models, right now this is superfluous and is just a placeholder
+
+  int size_history_tangential = 0;
+  for (int itype = 1; itype <= atom->ntypes; itype++)
+    for (int jtype = 1; jtype <= atom->ntypes; jtype++)
+      if (tangential_model[itype][jtype] == CLASSIC ||
+          tangential_model[itype][jtype] == LINEAR_HISTORY)
+        size_history_tangential = 3;
+  size_history += size_history_tangential;
+
+  // if history is stored and first init, create Fix to store history
+  // it replaces FixDummy, created in the constructor
+  // this is so its order in the fix list is preserved
+
+  if (fix_history == nullptr) {
+    fix_history =
+        dynamic_cast<FixNeighHistory *>(modify->replace_fix("NEIGH_HISTORY_GRANULAR_SE_DUMMY",
+                                                            "NEIGH_HISTORY_GRANULAR_SE"
+                                                            " all NEIGH_HISTORY " +
+                                                                std::to_string(size_history),
+                                                            1));
+    fix_history->pair = this;
+  } else {
+    fix_history =
+        dynamic_cast<FixNeighHistory *>(modify->get_fix_by_id("NEIGH_HISTORY_GRANULAR_SE"));
+    if (!fix_history) error->all(FLERR, "Could not find pair fix neigh history ID");
+  }
+
+  // check for FixFreeze and set freeze_group_bit
+
+  auto fixlist = modify->get_fix_by_style("^freeze");
+  if (fixlist.size() == 0)
+    freeze_group_bit = 0;
+  else if (fixlist.size() > 1)
+    error->all(FLERR, "Only one fix freeze command at a time allowed");
+  else
+    freeze_group_bit = fixlist.front()->groupbit;
+
+  // check for FixRigid so can extract rigid body masses
+
+  fix_rigid = nullptr;
+  for (const auto &ifix : modify->get_fix_list()) {
+    if (ifix->rigid_flag) {
+      if (fix_rigid)
+        error->all(FLERR, "Only one fix rigid command at a time allowed");
+      else
+        fix_rigid = ifix;
+    }
+  }
+
+  // check for FixPour and FixDeposit so can extract particle radii
+
+  auto pours = modify->get_fix_by_style("^pour");
+  auto deps = modify->get_fix_by_style("^deposit");
+
+  // set maxrad_dynamic and maxrad_frozen for each type
+  // include future FixPour and FixDeposit particles as dynamic
+
+  int itype;
+  for (int i = 1; i <= atom->ntypes; i++) {
+    onerad_dynamic[i] = onerad_frozen[i] = 0.0;
+    for (auto &ipour : pours) {
+      itype = i;
+      double maxrad = *((double *) ipour->extract("radius", itype));
+      if (maxrad > 0.0) onerad_dynamic[i] = maxrad;
+    }
+    for (auto &idep : deps) {
+      itype = i;
+      double maxrad = *((double *) idep->extract("radius", itype));
+      if (maxrad > 0.0) onerad_dynamic[i] = maxrad;
+    }
+  }
+
+  double *radius = atom->radius;
+  int *mask = atom->mask;
+  int *type = atom->type;
+  int nlocal = atom->nlocal;
+
+  for (int i = 0; i < nlocal; i++) {
+    if (mask[i] & freeze_group_bit)
+      onerad_frozen[type[i]] = MAX(onerad_frozen[type[i]], radius[i]);
+    else
+      onerad_dynamic[type[i]] = MAX(onerad_dynamic[type[i]], radius[i]);
+  }
+
+  MPI_Allreduce(&onerad_dynamic[1], &maxrad_dynamic[1], atom->ntypes, MPI_DOUBLE, MPI_MAX, world);
+  MPI_Allreduce(&onerad_frozen[1], &maxrad_frozen[1], atom->ntypes, MPI_DOUBLE, MPI_MAX, world);
+}
+
+/* ----------------------------------------------------------------------
+   init for one type pair i,j and corresponding j,i
+------------------------------------------------------------------------- */
+
+double PairGranularSuperellipsoid::init_one(int i, int j)
+{
+  double cutoff = 0.0;
+
+  if (setflag[i][j] == 0) {
+
+    limit_damping[i][j] = MAX(limit_damping[i][i], limit_damping[j][j]);
+
+    if (normal_model[i][i] != normal_model[j][j] ||
+        tangential_model[i][i] != tangential_model[j][j] ||
+        damping_model[i][i] != damping_model[j][j])
+      error->all(FLERR,
+                 "Granular pair style functional forms are different, "
+                 "cannot mix coefficients for types {} and {}.\n"
+                 "This combination must be set explicitly via a "
+                 "pair_coeff command",
+                 i, j);
+
+    kn[i][j] = mix_geom(kn[i][i], kn[j][j]);
+    gamman[i][j] = mix_geom(gamman[i][i], gamman[j][j]);
+    kt[i][j] = mix_geom(kt[i][i], kt[j][j]);
+    xt[i][j] = mix_geom(xt[i][i], xt[j][j]);
+    xmu[i][j] = mix_geom(xmu[i][i], xmu[j][j]);
+
+    cutoff_type[i][j] = cutoff_type[j][i] = MAX(cutoff_type[i][i], cutoff_type[j][j]);
+  }
+
+  // It is possible that cut[i][j] at this point is still 0.0.
+  // This can happen when
+  // there is a future fix_pour after the current run. A cut[i][j] = 0.0 creates
+  // problems because neighbor.cpp uses min(cut[i][j]) to decide on the bin size
+  // To avoid this issue, for cases involving  cut[i][j] = 0.0 (possible only
+  // if there is no current information about radius/cutoff of type i and j).
+  // we assign cutoff = max(cut[i][j]) for i,j such that cut[i][j] > 0.0.
+
+  if (cutoff_type[i][j] < 0 && cutoff_global < 0) {
+    if (((maxrad_dynamic[i] > 0.0) && (maxrad_dynamic[j] > 0.0)) ||
+        ((maxrad_dynamic[i] > 0.0) && (maxrad_frozen[j] > 0.0)) ||
+        // radius info about both i and j exist
+        ((maxrad_frozen[i] > 0.0) && (maxrad_dynamic[j] > 0.0))) {
+      cutoff = maxrad_dynamic[i] + maxrad_dynamic[j];
+      cutoff = MAX(cutoff, maxrad_dynamic[i] + maxrad_frozen[j]);
+      cutoff = MAX(cutoff, maxrad_frozen[i] + maxrad_dynamic[j]);
+    } else {
+      // radius info about either i or j does not exist
+      // (i.e. not present and not about to get poured;
+      // set to largest value to not interfere with neighbor list)
+
+      double cutmax = 0.0;
+      for (int k = 1; k <= atom->ntypes; k++) {
+        cutmax = MAX(cutmax, 2.0 * maxrad_dynamic[k]);
+        cutmax = MAX(cutmax, 2.0 * maxrad_frozen[k]);
+      }
+      cutoff = cutmax;
+    }
+  } else if (cutoff_type[i][j] > 0) {
+    cutoff = cutoff_type[i][j];
+  } else if (cutoff_global > 0) {
+    cutoff = cutoff_global;
+  }
+
+  dt = update->dt;
+  return cutoff;
+}
+
+/* ----------------------------------------------------------------------
+  proc 0 writes to restart file
+------------------------------------------------------------------------- */
+
+void PairGranularSuperellipsoid::write_restart(FILE *fp)
+{
+  int i, j;
+  for (i = 1; i <= atom->ntypes; i++) {
+    for (j = i; j <= atom->ntypes; j++) {
+      fwrite(&setflag[i][j], sizeof(int), 1, fp);
+      if (setflag[i][j]) {
+        fwrite(&cutoff_type[i][j], sizeof(double), 1, fp);
+        fwrite(&limit_damping[i][j], sizeof(int), 1, fp);
+        fwrite(&normal_model[i][j], sizeof(int), 1, fp);
+        fwrite(&tangential_model[i][j], sizeof(int), 1, fp);
+        fwrite(&damping_model[i][j], sizeof(int), 1, fp);
+
+        fwrite(&kn[i][j], sizeof(double), 1, fp);
+        fwrite(&gamman[i][j], sizeof(double), 1, fp);
+        fwrite(&kt[i][j], sizeof(double), 1, fp);
+        fwrite(&xt[i][j], sizeof(double), 1, fp);
+        fwrite(&xmu[i][j], sizeof(double), 1, fp);
+      }
+    }
+  }
+}
+
+/* ----------------------------------------------------------------------
+  proc 0 reads from restart file, bcasts
+------------------------------------------------------------------------- */
+
+void PairGranularSuperellipsoid::read_restart(FILE *fp)
+{
+  allocate();
+  int i, j;
+  int me = comm->me;
+  for (i = 1; i <= atom->ntypes; i++) {
+    for (j = i; j <= atom->ntypes; j++) {
+      if (me == 0) utils::sfread(FLERR, &setflag[i][j], sizeof(int), 1, fp, nullptr, error);
+      MPI_Bcast(&setflag[i][j], 1, MPI_INT, 0, world);
+      if (setflag[i][j]) {
+        if (me == 0) {
+          utils::sfread(FLERR, &cutoff_type[i][j], sizeof(double), 1, fp, nullptr, error);
+          utils::sfread(FLERR, &limit_damping[i][j], sizeof(int), 1, fp, nullptr, error);
+          utils::sfread(FLERR, &normal_model[i][j], sizeof(int), 1, fp, nullptr, error);
+          utils::sfread(FLERR, &tangential_model[i][j], sizeof(int), 1, fp, nullptr, error);
+          utils::sfread(FLERR, &damping_model[i][j], sizeof(int), 1, fp, nullptr, error);
+
+          utils::sfread(FLERR, &kn[i][j], sizeof(double), 1, fp, nullptr, error);
+          utils::sfread(FLERR, &gamman[i][j], sizeof(double), 1, fp, nullptr, error);
+          utils::sfread(FLERR, &kt[i][j], sizeof(double), 1, fp, nullptr, error);
+          utils::sfread(FLERR, &xt[i][j], sizeof(double), 1, fp, nullptr, error);
+          utils::sfread(FLERR, &xmu[i][j], sizeof(double), 1, fp, nullptr, error);
+        }
+        MPI_Bcast(&cutoff_type[i][j], 1, MPI_DOUBLE, 0, world);
+        MPI_Bcast(&limit_damping[i][j], 1, MPI_INT, 0, world);
+        MPI_Bcast(&normal_model[i][j], 1, MPI_INT, 0, world);
+        MPI_Bcast(&tangential_model[i][j], 1, MPI_INT, 0, world);
+        MPI_Bcast(&damping_model[i][j], 1, MPI_INT, 0, world);
+
+        MPI_Bcast(&kn[i][j], 1, MPI_DOUBLE, 0, world);
+        MPI_Bcast(&gamman[i][j], 1, MPI_DOUBLE, 0, world);
+        MPI_Bcast(&kt[i][j], 1, MPI_DOUBLE, 0, world);
+        MPI_Bcast(&xt[i][j], 1, MPI_DOUBLE, 0, world);
+        MPI_Bcast(&xmu[i][j], 1, MPI_DOUBLE, 0, world);
+      }
+    }
+  }
+}
+
+/* ---------------------------------------------------------------------- */
+
+void PairGranularSuperellipsoid::reset_dt()
+{
+  dt = update->dt;
+}
+
+/* ---------------------------------------------------------------------- */
+
+double PairGranularSuperellipsoid::single(int i, int j, int /*itype*/, int /*jtype*/, double rsq,
+                                          double /*factor_coul*/, double factor_lj, double &fforce)
+{
+  if (factor_lj == 0) {
+    fforce = 0.0;
+    for (int m = 0; m < single_extra; m++) svector[m] = 0.0;
+    return 0.0;
+  }
+
+  int nall = atom->nlocal + atom->nghost;
+  if ((i >= nall) || (j >= nall))
+    error->all(FLERR, "Not enough atoms for pair granular single function");
+
+  // Reset model and copy initial geometric data
+
+  double *allhistory;
+  int jnum = list->numneigh[i];
+  int *jlist = list->firstneigh[i];
+
+  if ((fix_history == nullptr) || (fix_history->firstvalue == nullptr))
+    error->one(FLERR, "Pair granular single computation needs history");
+  allhistory = fix_history->firstvalue[i];
+  for (int jj = 0; jj < jnum; jj++) {
+    neighprev++;
+    if (neighprev >= jnum) neighprev = 0;
+    if (jlist[neighprev] == j) break;
+  }
+  touchjj = fix_history->firstflag[i][neighprev];
+
+  xi = atom->x[i];
+  xj = atom->x[j];
+  radi = atom->radius[i];
+  radj = atom->radius[j];
+  itype = itype;
+  jtype = jtype;
+  history_data = &allhistory[size_history * neighprev];
+  int indx_ref = (atom->tag[i] < atom->tag[j]) ? i : j;
+  xref = atom->x[indx_ref];
+  tagi = atom->tag[i];
+  tagj = atom->tag[j];
+  history_update = 0;    // Don't update history
+
+  auto avec_ellipsoid = dynamic_cast<AtomVecEllipsoid *>(atom->style_match("ellipsoid"));
+  AtomVecEllipsoid::BonusSuper *bonus = avec_ellipsoid->bonus_super;
+  int *ellipsoid = atom->ellipsoid;
+
+  flagi = bonus[ellipsoid[i]].type;
+  flagj = bonus[ellipsoid[j]].type;
+
+  MathExtra::copy3(bonus[ellipsoid[i]].shape, shapei0);
+  MathExtra::copy3(bonus[ellipsoid[j]].shape, shapej0);
+  MathExtra::copy3(bonus[ellipsoid[i]].block, blocki0);
+  MathExtra::copy3(bonus[ellipsoid[j]].block, blockj0);
+  MathExtra::copy3(bonus[ellipsoid[i]].shape, shapei);
+  MathExtra::copy3(bonus[ellipsoid[j]].shape, shapej);
+  MathExtra::copy3(bonus[ellipsoid[i]].block, blocki);
+  MathExtra::copy3(bonus[ellipsoid[j]].block, blockj);
+  MathExtra::quat_to_mat(bonus[ellipsoid[i]].quat, Ri);
+  MathExtra::quat_to_mat(bonus[ellipsoid[j]].quat, Rj);
+
+  int touchflag = check_contact();
+
+  if (!touchflag) {
+    fforce = 0.0;
+    for (int m = 0; m < single_extra; m++) svector[m] = 0.0;
+    return 0.0;
+  }
+
+  // meff = effective mass of pair of particles
+  // if I or J part of rigid body, use body mass
+  // if I or J is frozen, meff is other particle
+  double *rmass = atom->rmass;
+  int *mask = atom->mask;
+
+  double mi = rmass[i];
+  double mj = rmass[j];
+  if (fix_rigid) {
+    if (mass_rigid[i] > 0.0) mi = mass_rigid[i];
+    if (mass_rigid[j] > 0.0) mj = mass_rigid[j];
+  }
+  meff = mi * mj / (mi + mj);
+  if (mask[i] & freeze_group_bit) meff = mj;
+  if (mask[j] & freeze_group_bit) meff = mi;
+
+  // Copy additional information and calculate forces
+
+  vi = atom->v[i];
+  vj = atom->v[j];
+  angmomi = atom->angmom[i];
+  angmomj = atom->angmom[j];
+  quati = bonus[ellipsoid[i]].quat;
+  quatj = bonus[ellipsoid[j]].quat;
+  inertiai = bonus[ellipsoid[i]].inertia;
+  inertiaj = bonus[ellipsoid[j]].inertia;
+
+  calculate_forces();
+
+  // set single_extra quantities
+  svector[0] = fs[0];
+  svector[1] = fs[1];
+  svector[2] = fs[2];
+  svector[3] = MathExtra::len3(fs);
+  svector[4] = 0.0;
+  svector[5] = 0.0;
+  svector[6] = 0.0;
+  svector[7] = 0.0;
+  svector[8] = 0.0;
+  svector[9] = dx[0];
+  svector[10] = dx[1];
+  svector[11] = dx[2];
+
+  // Superellipsoid specific values - were these included?
+
+  svector[12] = 0.0;    //contact_point_and_Lagrange_multiplier[0]
+  svector[13] = 0.0;    //contact_point_and_Lagrange_multiplier[1]
+  svector[14] = 0.0;    //contact_point_and_Lagrange_multiplier[2]
+  svector[15] = 0.0;    //contact_point_and_Lagrange_multiplier[3]
+  svector[16] = 0.0;    //bounding_box_separating_axis_index
+
+  return 0.0;
+}
+
+/* ---------------------------------------------------------------------- */
+
+int PairGranularSuperellipsoid::pack_forward_comm(int n, int *list, double *buf, int /*pbc_flag*/,
+                                                  int * /*pbc*/)
+{
+  int i, j, m;
+
+  m = 0;
+  for (i = 0; i < n; i++) {
+    j = list[i];
+    buf[m++] = mass_rigid[j];
+  }
+  return m;
+}
+
+/* ---------------------------------------------------------------------- */
+
+void PairGranularSuperellipsoid::unpack_forward_comm(int n, int first, double *buf)
+{
+  int i, m, last;
+
+  m = 0;
+  last = first + n;
+  for (i = first; i < last; i++) mass_rigid[i] = buf[m++];
+}
+
+/* ----------------------------------------------------------------------
+   Transfer history
+------------------------------------------------------------------------- */
+
+void PairGranularSuperellipsoid::transfer_history(double *source, double *target, int itype,
+                                                  int jtype)
+{
+  // copy of all history variables (shear, contact point, axis)
+
+  for (int i = 0; i < size_history; i++) {
+    if (i >= default_hist_size && tangential_model[itype][jtype] == CLASSIC) {
+      target[i] = -source[i];    //shear
+    } else {
+      target[i] = source[i];
+    }
+  }
+}
+
+/* ----------------------------------------------------------------------
+   memory usage of local atom-based arrays
+------------------------------------------------------------------------- */
+
+double PairGranularSuperellipsoid::memory_usage()
+{
+  double bytes = (double) nmax * sizeof(double);
+  return bytes;
+}
+
+/* ---------------------------------------------------------------------- */
+
+double PairGranularSuperellipsoid::mix_geom(double val1, double val2)
+{
+  return sqrt(val1 * val2);
+}
+
+/* ---------------------------------------------------------------------- */
+
+double PairGranularSuperellipsoid::mix_mean(double val1, double val2)
+{
+  return 0.5 * (val1 + val2);
+}
+
+/* ---------------------------------------------------------------------- */
+
+bool PairGranularSuperellipsoid::check_contact()
+{
+  bool touching;
+  if (rsq >= radsum * radsum) {
+    touching = false;
+  } else {
+    bool skip_contact_detection(false);
+    if (bounding_box) {
+      int separating_axis = (int) (history_data[4]);
+      int new_axis = MathExtraSuperellipsoids::check_oriented_bounding_boxes(
+          xi, Ri, shapei, xj, Rj, shapej, separating_axis);
+      if (new_axis != -1) {
+        skip_contact_detection = true;
+        if (history_update) history_data[4] = (double) new_axis;
+      }
+    }
+    if (skip_contact_detection) {
+      touching = false;
+      return touching;
+    }
+
+    double *X0_prev = history_data;
+
+    // superellipsoid contact detection between atoms i and j
+
+    if (touchjj == 1) {
+      // Continued contact: use grain true shape and last contact point with respect to grain i
+      X0[0] = X0_prev[0] + xref[0];
+      X0[1] = X0_prev[1] + xref[1];
+      X0[2] = X0_prev[2] + xref[2];
+      X0[3] = X0_prev[3];
+      // std::cout << "Using old contact point as initial guess between particle " << atom->tag[i] << " and particle " << atom->tag[j] << " : "
+      //           << X0[0] << " " << X0[1] << " " << X0[2] << " Lagrange multiplier mu^2: " << X0[3] << std::endl;
+      int status = MathExtraSuperellipsoids::determine_contact_point(xi, Ri, shapei, blocki, flagi,
+                                                                     xj, Rj, shapej, blockj, flagj,
+                                                                     X0, nij, contact_formulation);
+      if (status == 0) {
+        touching = true;
+      } else if (status == 1) {
+        touching = false;
+      } else {
+        error->warning(FLERR,
+                       "Ellipsoid contact detection (old contact) failed "
+                       "between particle {} and particle {} ",
+                       tagi, tagj);
+      }
+    } else {
+      // New contact: Build initial guess incrementally by morphing the particles from spheres to actual shape
+
+      // There might be better heuristic for the "volume equivalent spheres" suggested in the paper
+      // but this is good enough. We might even be able to use radi and radj which is cheaper
+      // MathExtra::scaleadd3(radj / radsum, x[i], radi /radsum, x[j], X0);
+
+      double reqi = std::cbrt(shapei[0] * shapei[1] * shapei[2]);
+      double reqj = std::cbrt(shapej[0] * shapej[1] * shapej[2]);
+      double rsuminv = 1.0 / (reqi + reqj);
+      MathExtra::scaleadd3(reqj * rsuminv, xi, reqi * rsuminv, xj, X0);
+      X0[3] = reqj / reqi;    // Lagrange multiplier mu^2
+      for (int iter_ig = 1; iter_ig <= NUMSTEP_INITIAL_GUESS; iter_ig++) {
+        double frac = iter_ig / double(NUMSTEP_INITIAL_GUESS);
+        shapei[0] = shapei[1] = shapei[2] = reqi;
+        shapej[0] = shapej[1] = shapej[2] = reqj;
+        MathExtra::scaleadd3(1.0 - frac, shapei, frac, shapei0, shapei);
+        MathExtra::scaleadd3(1.0 - frac, shapej, frac, shapej0, shapej);
+        blocki[0] = 2.0 + frac * (blocki0[0] - 2.0);
+        blocki[1] = 2.0 + frac * (blocki0[1] - 2.0);
+        blockj[0] = 2.0 + frac * (blockj0[0] - 2.0);
+        blockj[1] = 2.0 + frac * (blockj0[1] - 2.0);
+
+        // force ellipsoid flag for first initial guess iteration.
+        // Avoid incorrect values of n1/n2 - 2 in second derivatives.
+        int status = MathExtraSuperellipsoids::determine_contact_point(
+            xi, Ri, shapei, blocki, iter_ig == 1 ? AtomVecEllipsoid::BlockType::ELLIPSOID : flagi,
+            xj, Rj, shapej, blockj, iter_ig == 1 ? AtomVecEllipsoid::BlockType::ELLIPSOID : flagj,
+            X0, nij, contact_formulation);
+
+        if (status == 0) {
+          touching = true;
+        } else if (status == 1) {
+          touching = false;
+        } else if (iter_ig == NUMSTEP_INITIAL_GUESS) {
+          // keep trying until last iteration to avoid erroring out too early
+          error->warning(FLERR,
+                         "Ellipsoid contact detection (new contact) failed "
+                         "between particle {} and particle {}",
+                         tagi, tagj);
+        }
+      }
+    }
+  }
+
+  return touching;
+}
+
+/* ---------------------------------------------------------------------- */
+
+void PairGranularSuperellipsoid::calculate_forces()
+{
+  // Store contact point with respect to grain i for next time step
+  // This is crucial for periodic BCs when grains can move by large amount in one time step
+  // Keeping the previous contact point relative to global frame would lead to bad initial guess
+
+  if (history_update) {
+    double *X0_prev = history_data;
+    X0_prev[0] = X0[0] - xref[0];
+    X0_prev[1] = X0[1] - xref[1];
+    X0_prev[2] = X0[2] - xref[2];
+    X0_prev[3] = X0[3];
+  }
+
+  double nji[3] = {-nij[0], -nij[1], -nij[2]};
+  // compute overlap depth along normal direction for each grain
+  // overlap is positive for both grains
+  double overlap_i =
+      MathExtraSuperellipsoids::compute_overlap_distance(shapei, blocki, Ri, flagi, X0, nij, xi);
+  double overlap_j =
+      MathExtraSuperellipsoids::compute_overlap_distance(shapej, blockj, Rj, flagj, X0, nji, xj);
+
+  // branch vectors
+  double cr_i[3], cr_j[3];
+  MathExtra::sub3(X0, xi, cr_i);
+  MathExtra::sub3(X0, xj, cr_j);
+
+  // we need to take the cross product of omega
+
+  double ex_space[3], ey_space[3], ez_space[3], omegai[3], omegaj[3];
+  MathExtra::q_to_exyz(quati, ex_space, ey_space, ez_space);
+  MathExtra::angmom_to_omega(angmomi, ex_space, ey_space, ez_space, inertiai, omegai);
+  MathExtra::q_to_exyz(quatj, ex_space, ey_space, ez_space);
+  MathExtra::angmom_to_omega(angmomj, ex_space, ey_space, ez_space, inertiaj, omegaj);
+
+  double omega_cross_ri[3], omega_cross_rj[3];
+  MathExtra::cross3(omegai, cr_i, omega_cross_ri);
+  MathExtra::cross3(omegaj, cr_j, omega_cross_rj);
+
+  // relative translational velocity
+  // compute directly the sum of relative translational velocity at contact point
+  // since rotational velocity contribution is different for superellipsoids
+  double cv_i[3], cv_j[3];
+  add3(vi, omega_cross_ri, cv_i);
+  add3(vj, omega_cross_rj, cv_j);
+
+  // total relative velocity at contact point
+  sub3(cv_i, cv_j, vr);
+
+  // normal component
+
+  vnnr = dot3(vr, nij);
+  scale3(vnnr, nij, vn);
+
+  // tangential component
+
+  sub3(vr, vn, vtr);
+
+  vrel = len3(vtr);
+
+  // Approximate contact radius
+
+  // hertzian contact radius approximation
+  if (contact_radius_flag) {
+    double surf_point_i[3], surf_point_j[3], curvature_i, curvature_j;
+    MathExtra::scaleadd3(overlap_i, nij, X0, surf_point_i);
+    MathExtra::scaleadd3(overlap_j, nji, X0, surf_point_j);
+
+    if (curvature_model == MathExtraSuperellipsoids::CURV_MEAN) {
+      curvature_i = MathExtraSuperellipsoids::mean_curvature_superellipsoid(shapei, blocki, flagi,
+                                                                            Ri, surf_point_i, xi);
+      curvature_j = MathExtraSuperellipsoids::mean_curvature_superellipsoid(shapej, blockj, flagj,
+                                                                            Rj, surf_point_j, xj);
+    } else {
+      curvature_i = MathExtraSuperellipsoids::gaussian_curvature_superellipsoid(
+          shapei, blocki, flagi, Ri, surf_point_i, xi);
+      curvature_j = MathExtraSuperellipsoids::gaussian_curvature_superellipsoid(
+          shapej, blockj, flagj, Rj, surf_point_j, xj);
+    }
+    double sum_curvature = curvature_i + curvature_j;
+
+    // Physical upper bound smallest particle's bounding sphere radius
+    double max_physical_radius = MIN(radi, radj);
+    double min_physical_radius = MIN_RADIUS_RATIO * max_physical_radius;
+
+    if (sum_curvature > MIN_CURVATURE) {
+      contact_radius = sqrt((overlap_i + overlap_j) / sum_curvature);
+      // Cap the maximum radius (flat faces)
+      contact_radius = MIN(contact_radius, max_physical_radius);
+      // Cap the minimum radius (sharp corners) to prevent force collapse
+      contact_radius = MAX(contact_radius, min_physical_radius);
+    } else {
+      contact_radius = max_physical_radius;
+    }
+
+    // hertzian contact radius approximation
+    contact_radius = sqrt((overlap_i + overlap_j) / (curvature_i + curvature_j));
+  }
+
+  if (normal_model[itype][jtype] == HOOKE) {
+    // assuming we get the overlap depth
+    Fnormal = kn[itype][jtype] * (overlap_i + overlap_j);
+  } else if (normal_model[itype][jtype] == HERTZ) {
+    Fnormal = kn[itype][jtype] * (overlap_i + overlap_j) * contact_radius;
+  }
+
+  double damp = gamman[itype][jtype];
+  double damp_prefactor, Fdamp;
+  if (damping_model[itype][jtype] == MASS_VELOCITY) {
+    damp_prefactor = damp * meff;
+    Fdamp = damp_prefactor * vnnr;
+  } else {
+    damp_prefactor = damp * meff * contact_radius;
+    Fdamp = damp_prefactor * vnnr;
+  }
+
+  // normal forces = elastic contact + normal velocity damping
+
+  Fntot = Fnormal + Fdamp;
+  if (limit_damping[itype][jtype] && (Fntot < 0.0)) Fntot = 0.0;
+  double Fncrit = fabs(Fntot);
+
+  // Tangential model
+
+  double hist_increment[3], fdamp[3];
+  double *history = &history_data[default_hist_size];
+  double Fscrit = Fncrit * xmu[itype][jtype];
+  double dampt = xt[itype][jtype] * damp_prefactor;
+  if (tangential_model[itype][jtype] == LINEAR_HISTORY) {
+    // rotate and update displacements / force.
+    // see e.g. eq. 17 of Luding, Gran. Matter 2008, v10,p235
+
+    int frame_update = 0;
+    if (history_update) {
+      double rsht = dot3(history, nij);
+      frame_update = (fabs(rsht) * kt[itype][jtype]) > (EPSILON * Fscrit);
+
+      if (frame_update) rotate_rescale_vec(history, nij);
+
+      // update history, tangential force using velocities at half step
+      // see e.g. eq. 18 of Thornton et al, Pow. Tech. 2013, v223,p30-46
+      scale3(dt, vtr, hist_increment);
+      add3(history, hist_increment, history);
+    }
+
+    // tangential forces = history + tangential velocity damping
+    scale3(-kt[itype][jtype], history, fs);
+
+    scale3(-dampt, vtr, fdamp);
+    add3(fs, fdamp, fs);
+
+    // rescale frictional displacements and forces if needed
+    double magfs = len3(fs);
+    if (magfs > Fscrit) {
+      double shrmag = len3(history);
+      if (shrmag != 0.0) {
+        double magfs_inv = 1.0 / magfs;
+        scale3(Fscrit * magfs_inv, fs, history);
+        sub3(history, fdamp, history);
+        scale3(-1.0 / kt[itype][jtype], history);
+        scale3(Fscrit * magfs_inv, fs);
+      } else {
+        zero3(fs);
+      }
+    }
+
+  } else if (tangential_model[itype][jtype] == CLASSIC) {
+
+    // shear history effects
+
+    if (history_update) {
+      scale3(dt, vtr, hist_increment);
+      add3(history, hist_increment, history);
+    }
+    double shrmag = len3(history);
+
+    if (history_update) {
+      // rotate shear displacements
+      double rsht = dot3(history, nij);
+      scale3(rsht, nij, hist_increment);
+      sub3(history, hist_increment, history);
+    }
+
+    // tangential forces = history + tangential velocity damping
+    if (contact_radius_flag)
+      scale3(-kt[itype][jtype] * contact_radius, history, fs);
+    else
+      scale3(-kt[itype][jtype], history, fs);
+
+    scale3(-dampt, vtr, fdamp);
+    add3(fs, fdamp, fs);
+
+    // rescale frictional displacements and forces if needed
+
+    double magfs = len3(fs);
+
+    if (magfs > Fscrit) {
+      if (shrmag != 0.0) {
+        // Rescale shear force
+        scale3(Fscrit / magfs, fs);
+
+        // Set shear to elastic component of rescaled force
+        //  has extra factor of kt (+ contact radius)
+        sub3(fs, fdamp, history);
+
+        // Remove extra prefactors from shear history
+        if (contact_radius_flag)
+          scale3(-1.0 / (kt[itype][jtype] * contact_radius), history);
+        else
+          scale3(-1.0 / kt[itype][jtype], history);
+      } else
+        zero3(fs);
+    }
+  }
+
+  // forces & torques
+
+  scale3(Fntot, nji, forces);
+  add3(forces, fs, forces);
+
+  cross3(cr_i, forces, torquesi);
+  cross3(forces, cr_j, torquesj);
+}
+
+/* ----------------------------------------------------------------------
+  rotate-rescale vector v so it is perpendicular to unit vector n
+  and has the same magnitude as before
+    Copied from GranSubMod
+  ---------------------------------------------------------------------- */
+void PairGranularSuperellipsoid::rotate_rescale_vec(double *v, double *n)
+{
+  double rsht, shrmag, prjmag, temp_dbl, temp_array[3];
+
+  rsht = dot3(v, n);
+  shrmag = len3(v);
+
+  scale3(rsht, n, temp_array);
+  sub3(v, temp_array, v);
+
+  // also rescale to preserve magnitude
+  prjmag = len3(v);
+  if (prjmag > 0)
+    temp_dbl = shrmag / prjmag;
+  else
+    temp_dbl = 0;
+  scale3(temp_dbl, v);
+}
diff --git a/src/ASPHERE/pair_granular_superellipsoid.h b/src/ASPHERE/pair_granular_superellipsoid.h
new file mode 100644
index 00000000000..abd4fa0a468
--- /dev/null
+++ b/src/ASPHERE/pair_granular_superellipsoid.h
@@ -0,0 +1,130 @@
+/* -*- c++ -*- ----------------------------------------------------------
+   LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
+   https://www.lammps.org/, Sandia National Laboratories
+   LAMMPS development team: developers@lammps.org
+
+   Copyright (2003) Sandia Corporation.  Under the terms of Contract
+   DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
+   certain rights in this software.  This software is distributed under
+   the GNU General Public License.
+
+   See the README file in the top-level LAMMPS directory.
+------------------------------------------------------------------------- */
+/* ----------------------------------------------------------------------
+   Contributing author: Jacopo Bilotto (EPFL), Jibril B. Coulibaly
+------------------------------------------------------------------------- */
+
+#ifdef PAIR_CLASS
+// clang-format off
+PairStyle(granular/superellipsoid,PairGranularSuperellipsoid);
+// clang-format on
+#else
+
+#ifndef LMP_PAIR_GRANULAR_SUPERELLIPSOID_H
+#define LMP_PAIR_GRANULAR_SUPERELLIPSOID_H
+
+#include "pair.h"
+
+#include "atom_vec_ellipsoid.h"
+
+namespace LAMMPS_NS {
+
+class PairGranularSuperellipsoid : public Pair {
+ public:
+  PairGranularSuperellipsoid(class LAMMPS *);
+  ~PairGranularSuperellipsoid() override;
+  void compute(int, int) override;
+  void settings(int, char **) override;
+  void coeff(int, char **) override;
+  void init_style() override;
+  double init_one(int, int) override;
+  void write_restart(FILE *) override;
+  void read_restart(FILE *) override;
+  void reset_dt() override;
+  double single(int, int, int, int, double, double, double, double &) override;
+  int pack_forward_comm(int, int *, double *, int, int *) override;
+  void unpack_forward_comm(int, int, double *) override;
+  double memory_usage() override;
+  void transfer_history(double *, double *, int, int) override;
+
+ protected:
+  int freeze_group_bit;
+
+  int neighprev;
+  double *onerad_dynamic, *onerad_frozen;
+  double *maxrad_dynamic, *maxrad_frozen;
+
+  class FixDummy *fix_dummy;
+  class FixNeighHistory *fix_history;
+
+  // storage of rigid body masses for use in granular interactions
+
+  class Fix *fix_rigid;    // ptr to rigid body fix, null pointer if none
+  double *mass_rigid;      // rigid mass for owned+ghost atoms
+  int nmax;                // allocated size of mass_rigid
+
+  // Model variables
+  double dt;
+  int **normal_model;
+  int **damping_model;
+  int **tangential_model;
+  int **limit_damping;
+  int default_hist_size;
+  int contact_radius_flag;
+
+  // Normal coefficients
+  double **kn, **gamman;     // Hooke + Hertz
+
+  // Tangential coefficients
+  double **kt, **xt, **xmu;  // linear_history
+
+  // Intermediate values for contact model
+  int history_update, touchjj, itype, jtype;
+  double Fnormal, forces[3], torquesi[3], torquesj[3];
+  double radi, radj, meff, Fntot, contact_radius;
+  double *xi, *xj, *vi, *vj;
+  double fs[3], ft[3];
+  double dx[3], nx[3], r, rsq, rinv, Reff, radsum, delta, dR;
+  double vr[3], vn[3], vnnr, vt[3], wr[3], vtr[3], vrel;
+
+  double *quati, *quatj, *angmomi, *angmomj, *inertiai, *inertiaj;
+  double X0[4], nij[3], Ri[3][3], Rj[3][3];
+  double shapei0[3], blocki0[3], shapej0[3], blockj0[3], shapei[3], blocki[3], shapej[3], blockj[3];
+  double *history_data, *xref;
+  AtomVecEllipsoid::BlockType flagi, flagj;
+  tagint tagi, tagj;
+
+  void allocate();
+  double mix_geom(double, double);
+  double mix_mean(double, double);
+  bool check_contact();
+  void calculate_forces();
+
+ private:
+  int size_history;
+  int heat_flag;
+
+  // optional user-specified global cutoff, per-type user-specified cutoffs
+  double **cutoff_type;
+  double cutoff_global;
+  int contact_formulation;
+  int bounding_box;
+  int curvature_model;
+
+  int extra_svector;
+
+  void rotate_rescale_vec(double *hislocal, double *n);
+
+  // Below not implemented. Placeholder if we decide not to compute local hessian in line search
+  static double
+  shape_and_gradient_local(const double *, const double *, const double *,
+                           double *);    // would return a vector of temporary variables
+  static double hessian_local(
+      const double *, const double *, const double *,
+      double *);    // would use the above vector of temporary variables to compute local hessian
+};
+
+}    // namespace LAMMPS_NS
+
+#endif
+#endif
diff --git a/src/GPU/fix_nve_asphere_gpu.cpp b/src/GPU/fix_nve_asphere_gpu.cpp
index 9b75964c791..6437cf6da6c 100644
--- a/src/GPU/fix_nve_asphere_gpu.cpp
+++ b/src/GPU/fix_nve_asphere_gpu.cpp
@@ -169,6 +169,9 @@ void FixNVEAsphereGPU::init()
   if (!avec)
     error->all(FLERR,"Compute nve/asphere requires atom style ellipsoid");
 
+  if (atom->superellipsoid_flag)
+    error->all(FLERR, "Fix nve/asphere_gpu does not support superellipsoids");
+
   // check that all particles are finite-size ellipsoids
   // no point particles allowed, spherical is OK
 
diff --git a/src/GRANULAR/gran_sub_mod_tangential.cpp b/src/GRANULAR/gran_sub_mod_tangential.cpp
index b43acc73cd4..2d2e1e036cb 100644
--- a/src/GRANULAR/gran_sub_mod_tangential.cpp
+++ b/src/GRANULAR/gran_sub_mod_tangential.cpp
@@ -121,7 +121,8 @@ void GranSubModTangentialLinearHistory::coeffs_to_local()
 void GranSubModTangentialLinearHistory::calculate_forces()
 {
   // Note: this is the same as the base Mindlin calculation except k isn't scaled by contact radius
-  double magfs, magfs_inv, rsht, shrmag, temp_array[3], vtr2[3];
+  double magfs, rsht, shrmag;
+  double hist_increment[3], fdamp[3], vtr2[3];
   int frame_update = 0;
 
   double *nx = gm->nx;
@@ -145,8 +146,8 @@ void GranSubModTangentialLinearHistory::calculate_forces()
 
     // update history, tangential force using velocities at half step
     // see e.g. eq. 18 of Thornton et al, Pow. Tech. 2013, v223,p30-46
-    scale3(dt, vtr, temp_array);
-    add3(history, temp_array, history);
+    scale3(dt, vtr, hist_increment);
+    add3(history, hist_increment, history);
 
     if(gm->synchronized_verlet == 1) {
       rsht = dot3(history, nx_unrotated);
@@ -165,20 +166,21 @@ void GranSubModTangentialLinearHistory::calculate_forces()
   } else {
     copy3(vtr, vtr2);
   }
-  scale3(damp, vtr2, temp_array);
-  sub3(fs, temp_array, fs);
+  scale3(-damp, vtr2, fdamp);
+  add3(fs, fdamp, fs);
 
   // rescale frictional displacements and forces if needed
   magfs = len3(fs);
   if (magfs > Fscrit) {
     shrmag = len3(history);
     if (shrmag != 0.0) {
-      magfs_inv = 1.0 / magfs;
-      scale3(Fscrit * magfs_inv, fs, history);
-      scale3(damp, vtr, temp_array);
-      add3(history, temp_array, history);
+      // Rescale shear force
+      scale3(Fscrit / magfs, fs);
+
+      // Set shear to elastic component of rescaled force
+      //  has extra factor of k that is then removed
+      sub3(fs, fdamp, history);
       scale3(-1.0 / k, history);
-      scale3(Fscrit * magfs_inv, fs);
     } else {
       zero3(fs);
     }
@@ -200,8 +202,8 @@ GranSubModTangentialLinearHistoryClassic::GranSubModTangentialLinearHistoryClass
 
 void GranSubModTangentialLinearHistoryClassic::calculate_forces()
 {
-  double magfs, magfs_inv, rsht, shrmag;
-  double temp_array[3];
+  double magfs, rsht, shrmag;
+  double hist_increment[3], fdamp[3];
 
   double *nx = gm->nx;
   double *vtr = gm->vtr;
@@ -216,8 +218,8 @@ void GranSubModTangentialLinearHistoryClassic::calculate_forces()
 
   // update history
   if (history_update) {
-    scale3(dt, vtr, temp_array);
-    add3(history, temp_array, history);
+    scale3(dt, vtr, hist_increment);
+    add3(history, hist_increment, history);
   }
 
   shrmag = len3(history);
@@ -225,28 +227,40 @@ void GranSubModTangentialLinearHistoryClassic::calculate_forces()
   // rotate shear displacements
   if (history_update) {
     rsht = dot3(history, nx);
-    scale3(rsht, nx, temp_array);
-    sub3(history, temp_array, history);
+    scale3(rsht, nx, hist_increment);
+    sub3(history, hist_increment, history);
   }
 
   // tangential forces = history + tangential velocity damping
+  // classic model can only set contact_radius_flag through hertz
   if (contact_radius_flag)
     scale3(-k * contact_radius, history, fs);
   else
     scale3(-k, history, fs);
-  scale3(damp, vtr, temp_array);
-  sub3(fs, temp_array, fs);
+
+  // damping force, note that damp automatically has a factor
+  //   of contact radius with hertz (sets viscoelastic damping)
+  //   but not with hooke (sets mass_velocity damping)
+
+  scale3(-damp, vtr, fdamp);
+  add3(fs, fdamp, fs);
 
   // rescale frictional displacements and forces if needed
   magfs = len3(fs);
   if (magfs > Fscrit) {
     if (shrmag != 0.0) {
-      magfs_inv = 1.0 / magfs;
-      scale3(Fscrit * magfs_inv, fs, history);
-      scale3(damp, vtr, temp_array);
-      add3(history, temp_array, history);
-      scale3(-1.0 / k, history);
-      scale3(Fscrit * magfs_inv, fs);
+      // Rescale shear force
+      scale3(Fscrit / magfs, fs);
+
+      // Set shear to elastic component of rescaled force
+      //  has extra factor of kt (+ contact radius)
+      sub3(fs, fdamp, history);
+
+      // Remove extra prefactors from shear history
+      if (contact_radius_flag)
+        scale3(-1.0 / (k * contact_radius), history);
+      else
+        scale3(-1.0 / k, history);
     } else {
       zero3(fs);
     }
@@ -323,8 +337,8 @@ void GranSubModTangentialMindlin::mix_coeffs(double *icoeffs, double *jcoeffs)
 
 void GranSubModTangentialMindlin::calculate_forces()
 {
-  double k_scaled, magfs, magfs_inv, rsht, shrmag;
-  double temp_array[3], vtr2[3];
+  double k_scaled, magfs, rsht, shrmag;
+  double hist_increment[3], fdamp[3], vtr2[3];
   int frame_update = 0;
 
   double *nx = gm->nx;
@@ -361,11 +375,11 @@ void GranSubModTangentialMindlin::calculate_forces()
     if (mindlin_force) {
       // tangential force
       // see e.g. eq. 18 of Thornton et al, Pow. Tech. 2013, v223,p30-46
-      scale3(-k_scaled * dt, vtr, temp_array);
+      scale3(-k_scaled * dt, vtr, hist_increment);
     } else {
-      scale3(dt, vtr, temp_array);
+      scale3(dt, vtr, hist_increment);
     }
-    add3(history, temp_array, history);
+    add3(history, hist_increment, history);
 
     if (mindlin_rescale) history[3] = contact_radius;
 
@@ -382,6 +396,14 @@ void GranSubModTangentialMindlin::calculate_forces()
   }
 
   // tangential forces = history + tangential velocity damping
+
+  if (!mindlin_force) {
+    scale3(-k_scaled, history, fs);
+  } else {
+    copy3(history, fs);
+  }
+
+
   // Rotating vtr for damping term in nx direction
   if (frame_update && gm->synchronized_verlet) {
     copy3(vtr, vtr2);
@@ -389,28 +411,22 @@ void GranSubModTangentialMindlin::calculate_forces()
   } else {
     copy3(vtr, vtr2);
   }
-  scale3(-damp, vtr2, fs);
-
-  if (!mindlin_force) {
-    scale3(k_scaled, history, temp_array);
-    sub3(fs, temp_array, fs);
-  } else {
-    add3(fs, history, fs);
-  }
+  scale3(-damp, vtr2, fdamp);
+  add3(fs, fdamp, fs);
 
   // rescale frictional displacements and forces if needed
   magfs = len3(fs);
   if (magfs > Fscrit) {
     shrmag = len3(history);
     if (shrmag != 0.0) {
-      magfs_inv = 1.0 / magfs;
-      scale3(Fscrit * magfs_inv, fs, history);
-      scale3(damp, vtr, temp_array);
-      add3(history, temp_array, history);
-
-      if (!mindlin_force) scale3(-1.0 / k_scaled, history);
-
-      scale3(Fscrit * magfs_inv, fs);
+      // Rescale shear force
+      scale3(Fscrit / magfs, fs);
+
+      // Set shear to elastic component of rescaled force
+      //  may have extra factor of k_scaled that is then removed
+      sub3(fs, fdamp, history);
+      if (!mindlin_force)
+        scale3(-1.0 / k_scaled, history);
     } else {
       zero3(fs);
     }
diff --git a/src/GRANULAR/pair_gran_hertz_history.cpp b/src/GRANULAR/pair_gran_hertz_history.cpp
index 1cd7b9444e0..8037bb52a8a 100644
--- a/src/GRANULAR/pair_gran_hertz_history.cpp
+++ b/src/GRANULAR/pair_gran_hertz_history.cpp
@@ -475,6 +475,11 @@ double PairGranHertzHistory::single(int i, int j, int /*itype*/, int /*jtype*/,
   svector[7] = vt1;
   svector[8] = vt2;
   svector[9] = vt3;
+  // TODO to LAMMPS:
+  // doc says The last 3 (8-10) the components of the relative velocity in the tangential direction
+  // `vt` is the relative translational velocity only, i.e., it ignores the angular velocity.
+  // the total relative tangent velocity should be `vtr`.
+  // Should that be corrected? That would break backward compatibility, and this is "legacy code" anyway
 
   return 0.0;
 }
diff --git a/src/GRANULAR/pair_gran_hooke_history.cpp b/src/GRANULAR/pair_gran_hooke_history.cpp
index 60e6edf4d9d..0679be9173c 100644
--- a/src/GRANULAR/pair_gran_hooke_history.cpp
+++ b/src/GRANULAR/pair_gran_hooke_history.cpp
@@ -768,6 +768,11 @@ double PairGranHookeHistory::single(int i, int j, int /*itype*/, int /*jtype*/,
   svector[7] = vt1;
   svector[8] = vt2;
   svector[9] = vt3;
+  // TODO to LAMMPS:
+  // doc says The last 3 (8-10) the components of the relative velocity in the tangential direction
+  // `vt` is the relative translational velocity only, i.e., it ignores the angular velocity.
+  // the total relative tangent velocity should be `vtr`.
+  // Should that be corrected? That would break backward compatibility, and this is "legacy code" anyway
 
   return 0.0;
 }
diff --git a/src/GRANULAR/pair_granular.cpp b/src/GRANULAR/pair_granular.cpp
index d26f0dc2569..4a019699909 100644
--- a/src/GRANULAR/pair_granular.cpp
+++ b/src/GRANULAR/pair_granular.cpp
@@ -63,6 +63,9 @@ PairGranular::PairGranular(LAMMPS *lmp) : Pair(lmp)
   maxrad_dynamic = nullptr;
   maxrad_frozen = nullptr;
 
+  types_indices = nullptr;
+  cutoff_type = nullptr;
+
   // set comm size needed by this Pair if used with fix rigid
 
   comm_forward = 1;
@@ -110,13 +113,13 @@ PairGranular::~PairGranular()
 
 void PairGranular::compute(int eflag, int vflag)
 {
-  int i,j,k,ii,jj,inum,jnum,itype,jtype;
-  double factor_lj,mi,mj,meff;
+  int i, j, k, ii, jj, inum, jnum, itype, jtype;
+  double factor_lj, mi, mj, meff;
   double *forces, *torquesi, *torquesj, dq;
 
-  int *ilist,*jlist,*numneigh,**firstneigh;
-  int *touch,**firsttouch;
-  double *history,*allhistory,**firsthistory;
+  int *ilist, *jlist, *numneigh, **firstneigh;
+  int *touch, **firsttouch;
+  double *history, *allhistory, **firsthistory;
 
   bool touchflag = false;
   const bool history_update = update->setupflag == 0;
@@ -148,10 +151,10 @@ void PairGranular::compute(int eflag, int vflag)
     comm->forward_comm(this);
   }
 
+  int *type = atom->type;
   double **x = atom->x;
   double **v = atom->v;
   double **f = atom->f;
-  int *type = atom->type;
   double **omega = atom->omega;
   double **torque = atom->torque;
   double *radius = atom->radius;
@@ -275,10 +278,9 @@ void PairGranular::compute(int eflag, int vflag)
         if (force->newton_pair || j < nlocal) heatflow[j] -= dq;
       }
 
-      if (evflag) {
-        ev_tally_xyz(i,j,nlocal,force->newton_pair,
-          0.0,0.0,forces[0],forces[1],forces[2],model->dx[0],model->dx[1],model->dx[2]);
-      }
+      if (evflag)
+        ev_tally_xyz(i, j, nlocal, force->newton_pair, 0.0, 0.0, forces[0], forces[1], forces[2],
+            model->dx[0], model->dx[1], model->dx[2]);
     }
   }
 }
diff --git a/src/GRANULAR/pair_granular.h b/src/GRANULAR/pair_granular.h
index 0be649a7b51..995d5b13d47 100644
--- a/src/GRANULAR/pair_granular.h
+++ b/src/GRANULAR/pair_granular.h
@@ -58,7 +58,6 @@ class PairGranular : public Pair {
   int neighprev;
   double *onerad_dynamic, *onerad_frozen;
   double *maxrad_dynamic, *maxrad_frozen;
-  double **cut;
 
   class FixDummy *fix_dummy;
   class FixNeighHistory *fix_history;
diff --git a/src/GRAPHICS/dump_image.cpp b/src/GRAPHICS/dump_image.cpp
index d36230e64a6..ec460b99560 100644
--- a/src/GRAPHICS/dump_image.cpp
+++ b/src/GRAPHICS/dump_image.cpp
@@ -1410,16 +1410,15 @@ void DumpImage::create_image()
         color = image->color2rgb("white");
       }
       savedColors saved;
-      if (estyle & 1) {
-        // brighten flat surfaces a little bit
-        saved = reset_lighting(image, 0.3, 0.8, 0.45, 0.8);
-      }
       EllipsoidObj e(elevel);
-      e.draw(image, estyle, color, x[j], avec_ellipsoid->bonus[ellipsoid[j]].shape,
-             avec_ellipsoid->bonus[ellipsoid[j]].quat, ediamvalue, opacity);
-      if (estyle & 1) {
-        // restore previous settings
-        restore_lighting(saved, image);
+      if (avec_ellipsoid->bonus_super) {
+        auto *bonus = avec_ellipsoid->bonus_super;
+        e.draw(image, estyle, color, x[j], bonus[ellipsoid[j]].shape, bonus[ellipsoid[j]].quat,
+               ediamvalue, opacity, bonus[ellipsoid[j]].block);
+      } else {
+        auto *bonus = avec_ellipsoid->bonus;
+        e.draw(image, estyle, color, x[j], bonus[ellipsoid[j]].shape,bonus[ellipsoid[j]].quat,
+               ediamvalue, opacity, nullptr);
       }
       m += size_one;
     }
diff --git a/src/GRAPHICS/image.cpp b/src/GRAPHICS/image.cpp
index c963254cc2d..7361d6753a5 100644
--- a/src/GRAPHICS/image.cpp
+++ b/src/GRAPHICS/image.cpp
@@ -1273,8 +1273,8 @@ void Image::draw_triangle(const double *x, const double *y, const double *z,
   double pixelWidth = (tanPerPixel > 0) ? tanPerPixel * dist : -tanPerPixel / zoom;
   double xf = xmap / pixelWidth;
   double yf = ymap / pixelWidth;
-  int xc = static_cast<int>(xf);
-  int yc = static_cast<int>(yf);
+  int xc = static_cast<int>(floor(xf));
+  int yc = static_cast<int>(floor(yf));
   double width_error = xf - xc;
   double height_error = yf - yc;
 
@@ -1287,10 +1287,10 @@ void Image::draw_triangle(const double *x, const double *y, const double *z,
   double pixelRightFull = rasterRight / pixelWidth;
   double pixelDownFull = rasterDown / pixelWidth;
   double pixelUpFull = rasterUp / pixelWidth;
-  int pixelLeft = std::lround(pixelLeftFull);
-  int pixelRight = std::lround(pixelRightFull);
-  int pixelDown = std::lround(pixelDownFull);
-  int pixelUp = std::lround(pixelUpFull);
+  int pixelLeft = static_cast<int>(ceil(pixelLeftFull));
+  int pixelRight = static_cast<int>(ceil(pixelRightFull));
+  int pixelDown = static_cast<int>(ceil(pixelDownFull));
+  int pixelUp = static_cast<int>(ceil(pixelUpFull));
 
   for (int iy = yc - pixelDown; iy <= yc + pixelUp; iy ++) {
     for (int ix = xc - pixelLeft; ix <= xc + pixelRight; ix ++) {
@@ -1316,12 +1316,6 @@ void Image::draw_triangle(const double *x, const double *y, const double *z,
       double s1[3], s2[3], s3[3];
       double c1[3], c2[3];
 
-      // for grid cell and other triangle meshes:
-      // there can be single pixel gaps due to rounding
-      // using <= if test can leave single-pixel gaps between 2 triangles
-      // using < if test fixes most of them
-      // suggested by Nathan Fabian, Nov 2022
-
       MathExtra::sub3(zlocal, xlocal, s1);
       MathExtra::sub3(ylocal, xlocal, s2);
       MathExtra::sub3(p, xlocal, s3);
diff --git a/src/GRAPHICS/image_objects.cpp b/src/GRAPHICS/image_objects.cpp
index a80c858bca9..9a8c9eb8bf5 100644
--- a/src/GRAPHICS/image_objects.cpp
+++ b/src/GRAPHICS/image_objects.cpp
@@ -67,6 +67,14 @@ inline double radscale(const double *shape, const vec3 &pos)
                pos[2] / shape[2] * pos[2] / shape[2]));
 }
 
+// scale factor to move a position to the surface of a superellipsoid with given parameters
+inline double superscale(const double *shape, const double *block, const vec3 &pos)
+{
+  double a = pow(fabs(pos[0] / shape[0]), block[1]) + pow(fabs(pos[1] / shape[1]), block[1]);
+  double b = pow(fabs(pos[2] / shape[2]), block[0]);
+  return pow(pow(a, block[0] / block[1]) + b, -1.0 / block[0]);
+}
+
 // re-orient list of triangles to point along "dir", then scale and translate it.
 std::vector<triangle> transform(const std::vector<triangle> &triangles, const vec3 &dir,
                                 const vec3 &offs, double len, double width)
@@ -430,7 +438,8 @@ void EllipsoidObj::draw(Image *img, int flag, const double *color, const double
 // draw method for drawing ellipsoids from per-atom data which has a quaternion
 // and the shape list to define the orientation and stretch
 void EllipsoidObj::draw(Image *img, int flag, const double *color, const double *center,
-                        const double *shape, const double *quat, double diameter, double opacity)
+                        const double *shape, const double *quat, double diameter, double opacity,
+                        const double *block)
 {
   // select between triangles or cylinders or both
   bool doframe = true;
@@ -445,7 +454,8 @@ void EllipsoidObj::draw(Image *img, int flag, const double *color, const double
   const vec3 offs{center[0], center[1], center[2]};
 
   // optimization: just draw a sphere if a filled surface is requested and the object is a sphere
-  if (dotri && (shape[0] == shape[1]) && (shape[0] == shape[2])) {
+  // note: this does not apply to superellipsoids
+  if (dotri && !block && (shape[0] == shape[1]) && (shape[0] == shape[2])) {
     img->draw_sphere(center, color, 2.0 * shape[0], opacity);
     return;
   }
@@ -461,9 +471,16 @@ void EllipsoidObj::draw(Image *img, int flag, const double *color, const double
 
     if (dotri) {
       // set shape by shifting each corner to the surface
-      for (int i = 0; i < 3; ++i) {
-        auto &t = tri[i];
-        t = radscale(shape, t) * t;
+      if (block) {
+        for (int i = 0; i < 3; ++i) {
+          auto &t = tri[i];
+          t = superscale(shape, block, t) * t;
+        }
+      } else {
+        for (int i = 0; i < 3; ++i) {
+          auto &t = tri[i];
+          t = radscale(shape, t) * t;
+        }
       }
 
       // rotate
@@ -480,16 +497,32 @@ void EllipsoidObj::draw(Image *img, int flag, const double *color, const double
     }
 
     if (doframe) {
-      // set shape
-      for (int i = 0; i < 3; ++i) {
-        auto &t = tri[i];
-        if (dotri) {
-          // shift the cylinder positions inward by their diameter when using cylinders and
-          // triangles together for a smoother surface to avoid increasing the final size
-          double shapeplus[3] = {shape[0] - diameter, shape[1] - diameter, shape[1] - diameter};
-          t = radscale(shapeplus, t) * t;
-        } else {
-          t = radscale(shape, t) * t;
+      if (block) {
+        // set shape
+        for (int i = 0; i < 3; ++i) {
+          auto &t = tri[i];
+          if (dotri) {
+            // shift the cylinder positions inward by their diameter when using cylinders and
+            // triangles together for a smoother surface to avoid increasing the final size
+            double shapeplus[3] = {shape[0] - diameter, shape[1] - diameter, shape[1] - diameter};
+            t = superscale(shapeplus, block, t) * t;
+          } else {
+            t = superscale(shape, block, t) * t;
+          }
+        }
+
+      } else {
+        // set shape
+        for (int i = 0; i < 3; ++i) {
+          auto &t = tri[i];
+          if (dotri) {
+            // shift the cylinder positions inward by their diameter when using cylinders and
+            // triangles together for a smoother surface to avoid increasing the final size
+            double shapeplus[3] = {shape[0] - diameter, shape[1] - diameter, shape[1] - diameter};
+            t = radscale(shapeplus, t) * t;
+          } else {
+            t = radscale(shape, t) * t;
+          }
         }
       }
 
diff --git a/src/GRAPHICS/image_objects.h b/src/GRAPHICS/image_objects.h
index ccda1ccabc6..1c208d2c380 100644
--- a/src/GRAPHICS/image_objects.h
+++ b/src/GRAPHICS/image_objects.h
@@ -92,9 +92,9 @@ namespace ImageObjects {
     // construct (spherical) triangle mesh by refinining the triangles of an octahedron
     EllipsoidObj(int level = DEF_ELEVEL);
 
-    // draw ellipsoid from triangle mesh for ellipsoid particles
+    // draw ellipsoid from triangle mesh for ellipsoid and superellipsoid particles
     void draw(Image *, int, const double *, const double *, const double *, const double *, double,
-              double opacity = 1.0);
+              double opacity = 1.0, const double *block = nullptr);
 
     // draw ellipsoid from triangle mesh for ellipsoid regions
     void draw(Image *, int, const double *, const double *, const double *, Region *, double,
diff --git a/src/INTEL/fix_nve_asphere_intel.cpp b/src/INTEL/fix_nve_asphere_intel.cpp
index 848afa20cca..ad1c7329687 100644
--- a/src/INTEL/fix_nve_asphere_intel.cpp
+++ b/src/INTEL/fix_nve_asphere_intel.cpp
@@ -53,6 +53,8 @@ void FixNVEAsphereIntel::init()
   if (!avec)
     error->all(FLERR,"Compute nve/asphere requires atom style ellipsoid");
 
+  if (atom->superellipsoid_flag)
+    error->all(FLERR, "Fix nve/asphere_intel does not support superellipsoids");
   // check that all particles are finite-size ellipsoids
   // no point particles allowed, spherical is OK
 
diff --git a/src/OPENMP/fix_nh_asphere_omp.cpp b/src/OPENMP/fix_nh_asphere_omp.cpp
index 35a42d2119d..aefebf7ecea 100644
--- a/src/OPENMP/fix_nh_asphere_omp.cpp
+++ b/src/OPENMP/fix_nh_asphere_omp.cpp
@@ -46,7 +46,8 @@ void FixNHAsphereOMP::init()
   avec = dynamic_cast<AtomVecEllipsoid *>(atom->style_match("ellipsoid"));
   if (!avec)
     error->all(FLERR, Error::NOLASTLINE, "Fix {} requires atom style ellipsoid", style);
-
+  if (atom->superellipsoid_flag)
+    error->all(FLERR, Error::NOLASTLINE, "Fix {} does not support superellipsoids", style);
   // check that all particles are finite-size
   // no point particles allowed, spherical is OK
 
diff --git a/src/RHEO/compute_rheo_property_atom.cpp b/src/RHEO/compute_rheo_property_atom.cpp
index f02304b7c95..47e0d822d21 100644
--- a/src/RHEO/compute_rheo_property_atom.cpp
+++ b/src/RHEO/compute_rheo_property_atom.cpp
@@ -124,8 +124,10 @@ ComputeRHEOPropertyAtom::ComputeRHEOPropertyAtom(LAMMPS *lmp, int narg, char **a
     } else if (utils::strmatch(arg[iarg], "^grad/v/")) {
       i += add_tensor_component(arg[iarg], i, &ComputeRHEOPropertyAtom::pack_gradv) - 1;
     } else if (utils::strmatch(arg[iarg], "^stress/v/")) {
+      pressure_flag = 1;
       i += add_tensor_component(arg[iarg], i, &ComputeRHEOPropertyAtom::pack_viscous_stress) - 1;
     } else if (utils::strmatch(arg[iarg], "^stress/t/")) {
+      pressure_flag = 1;
       i += add_tensor_component(arg[iarg], i, &ComputeRHEOPropertyAtom::pack_total_stress) - 1;
     } else if (strcmp(arg[iarg], "energy") == 0) {
       avec_index[i] = atom->avec->property_atom("esph");
@@ -202,16 +204,22 @@ void ComputeRHEOPropertyAtom::setup()
 {
   if (thermal_flag) {
     auto fixes = modify->get_fix_by_style("rheo/thermal");
+    if (fixes.empty())
+      error->all(FLERR, "Cannot request thermal property without fix rheo/thermal");
     fix_thermal = dynamic_cast<FixRHEOThermal *>(fixes[0]);
   }
 
   if (pressure_flag) {
     auto fixes = modify->get_fix_by_style("rheo/pressure");
+    if (fixes.empty())
+      error->all(FLERR, "Cannot request pressure property without fix rheo/pressure");
     fix_pressure = dynamic_cast<FixRHEOPressure *>(fixes[0]);
   }
 
   if (shell_flag) {
     auto fixes = modify->get_fix_by_style("rheo/oxidation");
+    if (fixes.empty())
+      error->all(FLERR, "Cannot request nbond/shell without fix rheo/oxidation");
     fix_oxidation = dynamic_cast<FixRHEOOxidation *>(fixes[0]);
   }
 }
diff --git a/src/RIGID/fix_rigid.cpp b/src/RIGID/fix_rigid.cpp
index 08a4ab2103b..08b08695b52 100644
--- a/src/RIGID/fix_rigid.cpp
+++ b/src/RIGID/fix_rigid.cpp
@@ -757,6 +757,10 @@ void FixRigid::init()
     gvec = (double *) ifix->extract("gvec", tmp);
   }
 
+  // error for not supported superellipsoids
+
+  if (atom->superellipsoid_flag) error->all(FLERR,"Superellipsoids not supported in fix rigid");
+
   // timestep info
 
   dtv = update->dt;
diff --git a/src/RIGID/fix_rigid_small.cpp b/src/RIGID/fix_rigid_small.cpp
index 441d7fb674c..01f543f77b6 100644
--- a/src/RIGID/fix_rigid_small.cpp
+++ b/src/RIGID/fix_rigid_small.cpp
@@ -580,6 +580,10 @@ void FixRigidSmall::init()
     gvec = (double *) ifix->extract("gvec", tmp);
   }
 
+  // error for not supported superellipsoids
+
+  if (atom->superellipsoid_flag) error->all(FLERR,"Superellipsoids not supported in fix rigid/small");
+
   // timestep info
 
   dtv = update->dt;
diff --git a/src/SRD/fix_srd.cpp b/src/SRD/fix_srd.cpp
index 5b6bd5988ce..ca45b211e91 100644
--- a/src/SRD/fix_srd.cpp
+++ b/src/SRD/fix_srd.cpp
@@ -358,7 +358,8 @@ void FixSRD::init()
     error->all(FLERR, Error::NOLASTLINE, "Cannot change timestep once fix srd is set up");
   if (comm->style != Comm::BRICK)
     error->all(FLERR, Error::NOLASTLINE, "Fix srd currently only be used with comm_style brick");
-
+  if (atom->superellipsoid_flag)
+    error->all(FLERR, Error::NOLASTLINE, "Fix srd does not currently support superellipsoids");
   // orthogonal vs triclinic simulation box
   // could be static or shearing box
 
diff --git a/src/atom.cpp b/src/atom.cpp
index fec59cd5d6c..f372402d848 100644
--- a/src/atom.cpp
+++ b/src/atom.cpp
@@ -645,7 +645,7 @@ void Atom::set_atomflag_defaults()
   // identical list as 2nd customization in atom.h
 
   labelmapflag = 0;
-  ellipsoid_flag = line_flag = tri_flag = body_flag = 0;
+  ellipsoid_flag = line_flag = tri_flag = body_flag = superellipsoid_flag = 0;
   quat_flag = 0;
   peri_flag = electron_flag = sph_flag = 0;
   molecule_flag = molindex_flag = molatom_flag = 0;
diff --git a/src/atom.h b/src/atom.h
index 5aa8cf0f507..6f782d7a595 100644
--- a/src/atom.h
+++ b/src/atom.h
@@ -191,7 +191,7 @@ class Atom : protected Pointers {
   // 1 if variable is used, 0 if not
 
   int labelmapflag, types_style;
-  int ellipsoid_flag, line_flag, tri_flag, body_flag;
+  int ellipsoid_flag, line_flag, tri_flag, body_flag, superellipsoid_flag;
   int peri_flag, electron_flag, sph_flag;
 
   int molecule_flag, molindex_flag, molatom_flag;
diff --git a/src/atom_vec_ellipsoid.cpp b/src/atom_vec_ellipsoid.cpp
index 84af470409e..b806b0d9c25 100644
--- a/src/atom_vec_ellipsoid.cpp
+++ b/src/atom_vec_ellipsoid.cpp
@@ -29,13 +29,14 @@
 #include <cstring>
 
 using namespace LAMMPS_NS;
-using MathConst::MY_PI;
+
+static constexpr double EPSILON_BLOCK = 1.0e-3;
 
 /* ---------------------------------------------------------------------- */
 
 AtomVecEllipsoid::AtomVecEllipsoid(LAMMPS *lmp) :
     AtomVec(lmp), bonus(nullptr), ellipsoid(nullptr), rmass(nullptr), angmom(nullptr),
-    quat_hold(nullptr)
+    quat_hold(nullptr), bonus_super(nullptr)
 {
   molecular = Atom::ATOMIC;
   bonus_flag = 1;
@@ -46,6 +47,7 @@ AtomVecEllipsoid::AtomVecEllipsoid(LAMMPS *lmp) :
   size_data_bonus = 8;
 
   atom->ellipsoid_flag = 1;
+  atom->superellipsoid_flag = 0;
   atom->rmass_flag = atom->angmom_flag = atom->torque_flag = 1;
 
   nlocal_bonus = nghost_bonus = nmax_bonus = 0;
@@ -74,7 +76,10 @@ AtomVecEllipsoid::AtomVecEllipsoid(LAMMPS *lmp) :
 
 AtomVecEllipsoid::~AtomVecEllipsoid()
 {
-  memory->sfree(bonus);
+  if (atom->superellipsoid_flag)
+    memory->sfree(bonus_super);
+  else
+    memory->sfree(bonus);
 }
 
 /* ----------------------------------------------------------------------
@@ -87,6 +92,7 @@ void AtomVecEllipsoid::grow_pointers()
   ellipsoid = atom->ellipsoid;
   rmass = atom->rmass;
   angmom = atom->angmom;
+  if (atom->superellipsoid_flag) radius = atom->radius;
 }
 
 /* ----------------------------------------------------------------------
@@ -98,7 +104,12 @@ void AtomVecEllipsoid::grow_bonus()
   nmax_bonus = grow_nmax_bonus(nmax_bonus);
   if (nmax_bonus < 0) error->one(FLERR, "Per-processor system is too big");
 
-  bonus = (Bonus *) memory->srealloc(bonus, nmax_bonus * sizeof(Bonus), "atom:bonus");
+  if (atom->superellipsoid_flag) {
+    bonus_super = (BonusSuper *) memory->srealloc(bonus_super, nmax_bonus * sizeof(BonusSuper),
+                                                  "atom:bonus_super");
+  } else {
+    bonus = (Bonus *) memory->srealloc(bonus, nmax_bonus * sizeof(Bonus), "atom:bonus");
+  }
 }
 
 /* ----------------------------------------------------------------------
@@ -117,7 +128,10 @@ void AtomVecEllipsoid::copy_bonus(int i, int j, int delflag)
   // if atom I has bonus data, reset I's bonus.ilocal to loc J
   // do NOT do this if self-copy (I=J) since I's bonus data is already deleted
 
-  if (ellipsoid[i] >= 0 && i != j) bonus[ellipsoid[i]].ilocal = j;
+  if (atom->superellipsoid_flag) {
+    if (ellipsoid[i] >= 0 && i != j) bonus_super[ellipsoid[i]].ilocal = j;
+  } else if (ellipsoid[i] >= 0 && i != j)
+    bonus[ellipsoid[i]].ilocal = j;
   ellipsoid[j] = ellipsoid[i];
 }
 
@@ -128,8 +142,13 @@ void AtomVecEllipsoid::copy_bonus(int i, int j, int delflag)
 
 void AtomVecEllipsoid::copy_bonus_all(int i, int j)
 {
-  ellipsoid[bonus[i].ilocal] = j;
-  memcpy(&bonus[j], &bonus[i], sizeof(Bonus));
+  if (atom->superellipsoid_flag) {
+    ellipsoid[bonus_super[i].ilocal] = j;
+    memcpy(&bonus_super[j], &bonus_super[i], sizeof(BonusSuper));
+  } else {
+    ellipsoid[bonus[i].ilocal] = j;
+    memcpy(&bonus[j], &bonus[i], sizeof(Bonus));
+  }
 }
 
 /* ----------------------------------------------------------------------
@@ -147,8 +166,8 @@ void AtomVecEllipsoid::clear_bonus()
 }
 
 /* ---------------------------------------------------------------------- */
-
-int AtomVecEllipsoid::pack_comm_bonus(int n, int *list, double *buf)
+template <bool is_super>
+int AtomVecEllipsoid::pack_comm_bonus_templated(int n, int *list, double *buf)
 {
   int i, j, m;
   double *quat;
@@ -157,7 +176,11 @@ int AtomVecEllipsoid::pack_comm_bonus(int n, int *list, double *buf)
   for (i = 0; i < n; i++) {
     j = list[i];
     if (ellipsoid[j] >= 0) {
-      quat = bonus[ellipsoid[j]].quat;
+      if (is_super) {
+        quat = bonus_super[ellipsoid[j]].quat;
+      } else {
+        quat = bonus[ellipsoid[j]].quat;
+      }
       buf[m++] = quat[0];
       buf[m++] = quat[1];
       buf[m++] = quat[2];
@@ -168,9 +191,18 @@ int AtomVecEllipsoid::pack_comm_bonus(int n, int *list, double *buf)
   return m;
 }
 
-/* ---------------------------------------------------------------------- */
+int AtomVecEllipsoid::pack_comm_bonus(int n, int *list, double *buf)
+{
+  if (atom->superellipsoid_flag) {
+    return pack_comm_bonus_templated<true>(n, list, buf);
+  } else {
+    return pack_comm_bonus_templated<false>(n, list, buf);
+  }
+}
 
-void AtomVecEllipsoid::unpack_comm_bonus(int n, int first, double *buf)
+/* ---------------------------------------------------------------------- */
+template <bool is_super>
+void AtomVecEllipsoid::unpack_comm_bonus_templated(int n, int first, double *buf)
 {
   int i, m, last;
   double *quat;
@@ -179,7 +211,11 @@ void AtomVecEllipsoid::unpack_comm_bonus(int n, int first, double *buf)
   last = first + n;
   for (i = first; i < last; i++) {
     if (ellipsoid[i] >= 0) {
-      quat = bonus[ellipsoid[i]].quat;
+      if (is_super) {
+        quat = bonus_super[ellipsoid[i]].quat;
+      } else {
+        quat = bonus[ellipsoid[i]].quat;
+      }
       quat[0] = buf[m++];
       quat[1] = buf[m++];
       quat[2] = buf[m++];
@@ -188,12 +224,21 @@ void AtomVecEllipsoid::unpack_comm_bonus(int n, int first, double *buf)
   }
 }
 
-/* ---------------------------------------------------------------------- */
+void AtomVecEllipsoid::unpack_comm_bonus(int n, int first, double *buf)
+{
+  if (atom->superellipsoid_flag) {
+    unpack_comm_bonus_templated<true>(n, first, buf);
+  } else {
+    unpack_comm_bonus_templated<false>(n, first, buf);
+  }
+}
 
-int AtomVecEllipsoid::pack_border_bonus(int n, int *list, double *buf)
+/* ---------------------------------------------------------------------- */
+template <bool is_super>
+int AtomVecEllipsoid::pack_border_bonus_templated(int n, int *list, double *buf)
 {
   int i, j, m;
-  double *shape, *quat;
+  double *shape, *quat, *block, *inertia;
 
   m = 0;
   for (i = 0; i < n; i++) {
@@ -202,8 +247,16 @@ int AtomVecEllipsoid::pack_border_bonus(int n, int *list, double *buf)
       buf[m++] = ubuf(0).d;
     else {
       buf[m++] = ubuf(1).d;
-      shape = bonus[ellipsoid[j]].shape;
-      quat = bonus[ellipsoid[j]].quat;
+      if (is_super) {
+        shape = bonus_super[ellipsoid[j]].shape;
+        quat = bonus_super[ellipsoid[j]].quat;
+        block = bonus_super[ellipsoid[j]].block;
+        inertia = bonus_super[ellipsoid[j]].inertia;
+      } else {
+        shape = bonus[ellipsoid[j]].shape;
+        quat = bonus[ellipsoid[j]].quat;
+      }
+
       buf[m++] = shape[0];
       buf[m++] = shape[1];
       buf[m++] = shape[2];
@@ -211,18 +264,35 @@ int AtomVecEllipsoid::pack_border_bonus(int n, int *list, double *buf)
       buf[m++] = quat[1];
       buf[m++] = quat[2];
       buf[m++] = quat[3];
+
+      if (is_super) {
+        buf[m++] = block[0];
+        buf[m++] = block[1];
+        buf[m++] = inertia[0];
+        buf[m++] = inertia[1];
+        buf[m++] = inertia[2];
+      }
     }
   }
 
   return m;
 }
 
-/* ---------------------------------------------------------------------- */
+int AtomVecEllipsoid::pack_border_bonus(int n, int *list, double *buf)
+{
+  if (atom->superellipsoid_flag) {
+    return pack_border_bonus_templated<true>(n, list, buf);
+  } else {
+    return pack_border_bonus_templated<false>(n, list, buf);
+  }
+}
 
-int AtomVecEllipsoid::unpack_border_bonus(int n, int first, double *buf)
+/* ---------------------------------------------------------------------- */
+template <bool is_super>
+int AtomVecEllipsoid::unpack_border_bonus_templated(int n, int first, double *buf)
 {
   int i, j, m, last;
-  double *shape, *quat;
+  double *shape, *quat, *block, *inertia;
 
   m = 0;
   last = first + n;
@@ -232,8 +302,15 @@ int AtomVecEllipsoid::unpack_border_bonus(int n, int first, double *buf)
     else {
       j = nlocal_bonus + nghost_bonus;
       if (j == nmax_bonus) grow_bonus();
-      shape = bonus[j].shape;
-      quat = bonus[j].quat;
+      if (is_super) {
+        shape = bonus_super[j].shape;
+        quat = bonus_super[j].quat;
+        block = bonus_super[j].block;
+        inertia = bonus_super[j].inertia;
+      } else {
+        shape = bonus[j].shape;
+        quat = bonus[j].quat;
+      }
       shape[0] = buf[m++];
       shape[1] = buf[m++];
       shape[2] = buf[m++];
@@ -241,15 +318,36 @@ int AtomVecEllipsoid::unpack_border_bonus(int n, int first, double *buf)
       quat[1] = buf[m++];
       quat[2] = buf[m++];
       quat[3] = buf[m++];
-      bonus[j].ilocal = i;
+      if (is_super) {
+        block[0] = buf[m++];
+        block[1] = buf[m++];
+        inertia[0] = buf[m++];
+        inertia[1] = buf[m++];
+        inertia[2] = buf[m++];
+        // Particle type inferred from block to reduce comm
+        // TODO: is this a good idea or is that not saving much compared to
+        //       passing the flag in the buffer?
+        bonus_super[j].type = determine_type(block);
+        bonus_super[j].ilocal = i;
+      } else {
+        bonus[j].ilocal = i;
+      }
       ellipsoid[i] = j;
       nghost_bonus++;
     }
   }
-
   return m;
 }
 
+int AtomVecEllipsoid::unpack_border_bonus(int n, int first, double *buf)
+{
+  if (atom->superellipsoid_flag) {
+    return unpack_border_bonus_templated<true>(n, first, buf);
+  } else {
+    return unpack_border_bonus_templated<false>(n, first, buf);
+  }
+}
+
 /* ----------------------------------------------------------------------
    pack data for atom I for sending to another proc
    xyz must be 1st 3 values, so comm::exchange() can test on them
@@ -264,17 +362,38 @@ int AtomVecEllipsoid::pack_exchange_bonus(int i, double *buf)
   else {
     buf[m++] = ubuf(1).d;
     int j = ellipsoid[i];
-    double *shape = bonus[j].shape;
-    double *quat = bonus[j].quat;
-    buf[m++] = shape[0];
-    buf[m++] = shape[1];
-    buf[m++] = shape[2];
-    buf[m++] = quat[0];
-    buf[m++] = quat[1];
-    buf[m++] = quat[2];
-    buf[m++] = quat[3];
-  }
+    if (atom->superellipsoid_flag) {
+      double *shape = bonus_super[j].shape;
+      double *quat = bonus_super[j].quat;
+      double *block = bonus_super[j].block;
+      double *inertia = bonus_super[j].inertia;
+
+      buf[m++] = shape[0];
+      buf[m++] = shape[1];
+      buf[m++] = shape[2];
+      buf[m++] = quat[0];
+      buf[m++] = quat[1];
+      buf[m++] = quat[2];
+      buf[m++] = quat[3];
+      buf[m++] = block[0];
+      buf[m++] = block[1];
+      buf[m++] = inertia[0];
+      buf[m++] = inertia[1];
+      buf[m++] = inertia[2];
+
+    } else {
+      double *shape = bonus[j].shape;
+      double *quat = bonus[j].quat;
 
+      buf[m++] = shape[0];
+      buf[m++] = shape[1];
+      buf[m++] = shape[2];
+      buf[m++] = quat[0];
+      buf[m++] = quat[1];
+      buf[m++] = quat[2];
+      buf[m++] = quat[3];
+    }
+  }
   return m;
 }
 
@@ -288,16 +407,38 @@ int AtomVecEllipsoid::unpack_exchange_bonus(int ilocal, double *buf)
     ellipsoid[ilocal] = -1;
   else {
     if (nlocal_bonus == nmax_bonus) grow_bonus();
-    double *shape = bonus[nlocal_bonus].shape;
-    double *quat = bonus[nlocal_bonus].quat;
-    shape[0] = buf[m++];
-    shape[1] = buf[m++];
-    shape[2] = buf[m++];
-    quat[0] = buf[m++];
-    quat[1] = buf[m++];
-    quat[2] = buf[m++];
-    quat[3] = buf[m++];
-    bonus[nlocal_bonus].ilocal = ilocal;
+    if (atom->superellipsoid_flag) {
+      double *shape = bonus_super[nlocal_bonus].shape;
+      double *quat = bonus_super[nlocal_bonus].quat;
+      double *block = bonus_super[nlocal_bonus].block;
+      double *inertia = bonus_super[nlocal_bonus].inertia;
+      BlockType &type = bonus_super[nlocal_bonus].type;
+      shape[0] = buf[m++];
+      shape[1] = buf[m++];
+      shape[2] = buf[m++];
+      quat[0] = buf[m++];
+      quat[1] = buf[m++];
+      quat[2] = buf[m++];
+      quat[3] = buf[m++];
+      block[0] = buf[m++];
+      block[1] = buf[m++];
+      inertia[0] = buf[m++];
+      inertia[1] = buf[m++];
+      inertia[2] = buf[m++];
+      type = determine_type(block);
+      bonus_super[nlocal_bonus].ilocal = ilocal;
+    } else {
+      double *shape = bonus[nlocal_bonus].shape;
+      double *quat = bonus[nlocal_bonus].quat;
+      shape[0] = buf[m++];
+      shape[1] = buf[m++];
+      shape[2] = buf[m++];
+      quat[0] = buf[m++];
+      quat[1] = buf[m++];
+      quat[2] = buf[m++];
+      quat[3] = buf[m++];
+      bonus[nlocal_bonus].ilocal = ilocal;
+    }
     ellipsoid[ilocal] = nlocal_bonus++;
   }
 
@@ -340,13 +481,28 @@ int AtomVecEllipsoid::pack_restart_bonus(int i, double *buf)
   else {
     buf[m++] = ubuf(1).d;
     int j = ellipsoid[i];
-    buf[m++] = bonus[j].shape[0];
-    buf[m++] = bonus[j].shape[1];
-    buf[m++] = bonus[j].shape[2];
-    buf[m++] = bonus[j].quat[0];
-    buf[m++] = bonus[j].quat[1];
-    buf[m++] = bonus[j].quat[2];
-    buf[m++] = bonus[j].quat[3];
+    if (atom->superellipsoid_flag) {
+      buf[m++] = bonus_super[j].shape[0];
+      buf[m++] = bonus_super[j].shape[1];
+      buf[m++] = bonus_super[j].shape[2];
+      buf[m++] = bonus_super[j].quat[0];
+      buf[m++] = bonus_super[j].quat[1];
+      buf[m++] = bonus_super[j].quat[2];
+      buf[m++] = bonus_super[j].quat[3];
+      buf[m++] = bonus_super[j].block[0];
+      buf[m++] = bonus_super[j].block[1];
+      buf[m++] = bonus_super[j].inertia[0];
+      buf[m++] = bonus_super[j].inertia[1];
+      buf[m++] = bonus_super[j].inertia[2];
+    } else {
+      buf[m++] = bonus[j].shape[0];
+      buf[m++] = bonus[j].shape[1];
+      buf[m++] = bonus[j].shape[2];
+      buf[m++] = bonus[j].quat[0];
+      buf[m++] = bonus[j].quat[1];
+      buf[m++] = bonus[j].quat[2];
+      buf[m++] = bonus[j].quat[3];
+    }
   }
 
   return m;
@@ -365,16 +521,38 @@ int AtomVecEllipsoid::unpack_restart_bonus(int ilocal, double *buf)
     ellipsoid[ilocal] = -1;
   else {
     if (nlocal_bonus == nmax_bonus) grow_bonus();
-    double *shape = bonus[nlocal_bonus].shape;
-    double *quat = bonus[nlocal_bonus].quat;
-    shape[0] = buf[m++];
-    shape[1] = buf[m++];
-    shape[2] = buf[m++];
-    quat[0] = buf[m++];
-    quat[1] = buf[m++];
-    quat[2] = buf[m++];
-    quat[3] = buf[m++];
-    bonus[nlocal_bonus].ilocal = ilocal;
+    if (atom->superellipsoid_flag) {
+      double *shape = bonus_super[nlocal_bonus].shape;
+      double *quat = bonus_super[nlocal_bonus].quat;
+      double *block = bonus_super[nlocal_bonus].block;
+      double *inertia = bonus_super[nlocal_bonus].inertia;
+      BlockType &type = bonus_super[nlocal_bonus].type;
+      shape[0] = buf[m++];
+      shape[1] = buf[m++];
+      shape[2] = buf[m++];
+      quat[0] = buf[m++];
+      quat[1] = buf[m++];
+      quat[2] = buf[m++];
+      quat[3] = buf[m++];
+      block[0] = buf[m++];
+      block[1] = buf[m++];
+      inertia[0] = buf[m++];
+      inertia[1] = buf[m++];
+      inertia[2] = buf[m++];
+      type = determine_type(block);
+      bonus_super[nlocal_bonus].ilocal = ilocal;
+    } else {
+      double *shape = bonus[nlocal_bonus].shape;
+      double *quat = bonus[nlocal_bonus].quat;
+      shape[0] = buf[m++];
+      shape[1] = buf[m++];
+      shape[2] = buf[m++];
+      quat[0] = buf[m++];
+      quat[1] = buf[m++];
+      quat[2] = buf[m++];
+      quat[3] = buf[m++];
+      bonus[nlocal_bonus].ilocal = ilocal;
+    }
     ellipsoid[ilocal] = nlocal_bonus++;
   }
 
@@ -391,27 +569,72 @@ void AtomVecEllipsoid::data_atom_bonus(int m, const std::vector<std::string> &va
 
   if (nlocal_bonus == nmax_bonus) grow_bonus();
 
-  double *shape = bonus[nlocal_bonus].shape;
   int ivalue = 1;
+  double shape[3];
   shape[0] = 0.5 * utils::numeric(FLERR, values[ivalue++], true, lmp);
   shape[1] = 0.5 * utils::numeric(FLERR, values[ivalue++], true, lmp);
   shape[2] = 0.5 * utils::numeric(FLERR, values[ivalue++], true, lmp);
   if (shape[0] <= 0.0 || shape[1] <= 0.0 || shape[2] <= 0.0)
     error->one(FLERR, "Invalid shape in Ellipsoids section of data file");
 
-  double *quat = bonus[nlocal_bonus].quat;
+  double quat[4];
   quat[0] = utils::numeric(FLERR, values[ivalue++], true, lmp);
   quat[1] = utils::numeric(FLERR, values[ivalue++], true, lmp);
   quat[2] = utils::numeric(FLERR, values[ivalue++], true, lmp);
   quat[3] = utils::numeric(FLERR, values[ivalue++], true, lmp);
   MathExtra::qnormalize(quat);
 
-  // reset ellipsoid mass
-  // previously stored density in rmass
+  // Blockiness exponents can be given optionally for superellipsoids
+  if (atom->superellipsoid_flag) {
+    // assign shape and quat to bonus data structure
+    BonusSuper *b = &bonus_super[nlocal_bonus];
+    b->shape[0] = shape[0];
+    b->shape[1] = shape[1];
+    b->shape[2] = shape[2];
+    b->quat[0] = quat[0];
+    b->quat[1] = quat[1];
+    b->quat[2] = quat[2];
+    b->quat[3] = quat[3];
+
+    double *block = bonus_super[nlocal_bonus].block;
+    BlockType &type = bonus_super[nlocal_bonus].type;
+    if (ivalue == values.size()) {
+      block[0] = block[1] = 2.0;
+      type = BlockType::ELLIPSOID;
+    } else {
+      block[0] = utils::numeric(FLERR, values[ivalue++], true, lmp);
+      block[1] = utils::numeric(FLERR, values[ivalue++], true, lmp);
+      type = determine_type(block);
+    }
+    // reset ellipsoid mass
+    // previously stored density in rmass
+
+    rmass[m] *= MathExtra::volume_ellipsoid(shape, block, type);
 
-  rmass[m] *= 4.0 * MY_PI / 3.0 * shape[0] * shape[1] * shape[2];
+    // Principal moments of inertia
+
+    inertia_ellipsoid_principal(shape, rmass[m], bonus_super[nlocal_bonus].inertia, block, type);
+
+    radius[m] = radius_ellipsoid(shape, block, type);
+    bonus_super[nlocal_bonus].ilocal = m;
+
+  } else {
+    // assign shape and quat to bonus data structure
+    Bonus *b = &bonus[nlocal_bonus];
+    b->shape[0] = shape[0];
+    b->shape[1] = shape[1];
+    b->shape[2] = shape[2];
+    b->quat[0] = quat[0];
+    b->quat[1] = quat[1];
+    b->quat[2] = quat[2];
+    b->quat[3] = quat[3];
+
+    // reset ellipsoid mass
+    // previously stored density in rmass
+    rmass[m] *= MathExtra::volume_ellipsoid(shape);
+    bonus[nlocal_bonus].ilocal = m;
+  }
 
-  bonus[nlocal_bonus].ilocal = m;
   ellipsoid[m] = nlocal_bonus++;
 }
 
@@ -422,7 +645,10 @@ void AtomVecEllipsoid::data_atom_bonus(int m, const std::vector<std::string> &va
 double AtomVecEllipsoid::memory_usage_bonus()
 {
   double bytes = 0;
-  bytes += nmax_bonus * sizeof(Bonus);
+  if (atom->superellipsoid_flag)
+    bytes += nmax_bonus * sizeof(BonusSuper);
+  else
+    bytes += nmax_bonus * sizeof(Bonus);
   return bytes;
 }
 
@@ -434,6 +660,7 @@ void AtomVecEllipsoid::create_atom_post(int ilocal)
 {
   rmass[ilocal] = 1.0;
   ellipsoid[ilocal] = -1;
+  if (atom->superellipsoid_flag) radius[ilocal] = 0.0;
 }
 
 /* ----------------------------------------------------------------------
@@ -476,8 +703,15 @@ void AtomVecEllipsoid::pack_data_pre(int ilocal)
     ellipsoid[ilocal] = 1;
 
   if (ellipsoid_flag >= 0) {
-    shape = bonus[ellipsoid_flag].shape;
-    rmass[ilocal] /= 4.0 * MY_PI / 3.0 * shape[0] * shape[1] * shape[2];
+    if (atom->superellipsoid_flag) {
+      shape = bonus_super[ellipsoid_flag].shape;
+      double *block = bonus_super[ellipsoid_flag].block;
+      BlockType type = bonus_super[ellipsoid_flag].type;
+      rmass[ilocal] /= MathExtra::volume_ellipsoid(shape, block, type);
+    } else {
+      shape = bonus[ellipsoid_flag].shape;
+      rmass[ilocal] /= MathExtra::volume_ellipsoid(shape);
+    }
   }
 }
 
@@ -509,13 +743,25 @@ int AtomVecEllipsoid::pack_data_bonus(double *buf, int /*flag*/)
     if (buf) {
       buf[m++] = ubuf(tag[i]).d;
       j = ellipsoid[i];
-      buf[m++] = 2.0 * bonus[j].shape[0];
-      buf[m++] = 2.0 * bonus[j].shape[1];
-      buf[m++] = 2.0 * bonus[j].shape[2];
-      buf[m++] = bonus[j].quat[0];
-      buf[m++] = bonus[j].quat[1];
-      buf[m++] = bonus[j].quat[2];
-      buf[m++] = bonus[j].quat[3];
+      if (atom->superellipsoid_flag) {
+        buf[m++] = 2.0 * bonus_super[j].shape[0];
+        buf[m++] = 2.0 * bonus_super[j].shape[1];
+        buf[m++] = 2.0 * bonus_super[j].shape[2];
+        buf[m++] = bonus_super[j].quat[0];
+        buf[m++] = bonus_super[j].quat[1];
+        buf[m++] = bonus_super[j].quat[2];
+        buf[m++] = bonus_super[j].quat[3];
+        buf[m++] = bonus_super[j].block[0];
+        buf[m++] = bonus_super[j].block[1];
+      } else {
+        buf[m++] = 2.0 * bonus[j].shape[0];
+        buf[m++] = 2.0 * bonus[j].shape[1];
+        buf[m++] = 2.0 * bonus[j].shape[2];
+        buf[m++] = bonus[j].quat[0];
+        buf[m++] = bonus[j].quat[1];
+        buf[m++] = bonus[j].quat[2];
+        buf[m++] = bonus[j].quat[3];
+      }
     } else
       m += size_data_bonus;
   }
@@ -530,10 +776,19 @@ int AtomVecEllipsoid::pack_data_bonus(double *buf, int /*flag*/)
 void AtomVecEllipsoid::write_data_bonus(FILE *fp, int n, double *buf, int /*flag*/)
 {
   int i = 0;
-  while (i < n) {
-    utils::print(fp, "{} {} {} {} {} {} {} {}\n", ubuf(buf[i]).i, buf[i + 1], buf[i + 2], buf[i + 3],
-               buf[i + 4], buf[i + 5], buf[i + 6], buf[i + 7]);
-    i += size_data_bonus;
+  if (atom->superellipsoid_flag) {
+    while (i < n) {
+      utils::print(fp, "{} {} {} {} {} {} {} {} {} {}\n", ubuf(buf[i]).i, buf[i + 1], buf[i + 2],
+                   buf[i + 3], buf[i + 4], buf[i + 5], buf[i + 6], buf[i + 7], buf[i + 8],
+                   buf[i + 9]);
+      i += size_data_bonus;
+    }
+  } else {
+    while (i < n) {
+      utils::print(fp, "{} {} {} {} {} {} {} {}\n", ubuf(buf[i]).i, buf[i + 1], buf[i + 2],
+                   buf[i + 3], buf[i + 4], buf[i + 5], buf[i + 6], buf[i + 7]);
+      i += size_data_bonus;
+    }
   }
 }
 
@@ -552,17 +807,29 @@ void AtomVecEllipsoid::read_data_general_to_restricted(int nlocal_previous, int
   // quat_g2r = quat that rotates from general to restricted triclinic
   // quat_new = ellipsoid quat converted to restricted triclinic
 
-  double quat_g2r[4],quat_new[4];
-  MathExtra::mat_to_quat(domain->rotate_g2r,quat_g2r);
+  double quat_g2r[4], quat_new[4];
+  MathExtra::mat_to_quat(domain->rotate_g2r, quat_g2r);
 
-  for (int i = nlocal_previous; i < nlocal; i++) {
-    if (ellipsoid[i] < 0) continue;
-    j = ellipsoid[i];
-    MathExtra::quatquat(quat_g2r,bonus[j].quat,quat_new);
-    bonus[j].quat[0] = quat_new[0];
-    bonus[j].quat[1] = quat_new[1];
-    bonus[j].quat[2] = quat_new[2];
-    bonus[j].quat[3] = quat_new[3];
+  if (atom->superellipsoid_flag) {
+    for (int i = nlocal_previous; i < nlocal; i++) {
+      if (ellipsoid[i] < 0) continue;
+      j = ellipsoid[i];
+      MathExtra::quatquat(quat_g2r, bonus_super[j].quat, quat_new);
+      bonus_super[j].quat[0] = quat_new[0];
+      bonus_super[j].quat[1] = quat_new[1];
+      bonus_super[j].quat[2] = quat_new[2];
+      bonus_super[j].quat[3] = quat_new[3];
+    }
+  } else {
+    for (int i = nlocal_previous; i < nlocal; i++) {
+      if (ellipsoid[i] < 0) continue;
+      j = ellipsoid[i];
+      MathExtra::quatquat(quat_g2r, bonus[j].quat, quat_new);
+      bonus[j].quat[0] = quat_new[0];
+      bonus[j].quat[1] = quat_new[1];
+      bonus[j].quat[2] = quat_new[2];
+      bonus[j].quat[3] = quat_new[3];
+    }
   }
 }
 
@@ -576,23 +843,32 @@ void AtomVecEllipsoid::write_data_restricted_to_general()
 {
   AtomVec::write_data_restricted_to_general();
 
-  memory->create(quat_hold,nlocal_bonus,4,"atomvec:quat_hold");
+  memory->create(quat_hold, nlocal_bonus, 4, "atomvec:quat_hold");
 
-  for (int i = 0; i < nlocal_bonus; i++)
-    memcpy(quat_hold[i],bonus[i].quat,4*sizeof(double));
+  for (int i = 0; i < nlocal_bonus; i++) memcpy(quat_hold[i], bonus[i].quat, 4 * sizeof(double));
 
   // quat_r2g = quat that rotates from restricted to general triclinic
   // quat_new = ellipsoid quat converted to general triclinic
 
-  double quat_r2g[4],quat_new[4];
-  MathExtra::mat_to_quat(domain->rotate_r2g,quat_r2g);
+  double quat_r2g[4], quat_new[4];
+  MathExtra::mat_to_quat(domain->rotate_r2g, quat_r2g);
+  if (atom->superellipsoid_flag) {
 
-  for (int i = 0; i < nlocal_bonus; i++) {
-    MathExtra::quatquat(quat_r2g,bonus[i].quat,quat_new);
-    bonus[i].quat[0] = quat_new[0];
-    bonus[i].quat[1] = quat_new[1];
-    bonus[i].quat[2] = quat_new[2];
-    bonus[i].quat[3] = quat_new[3];
+    for (int i = 0; i < nlocal_bonus; i++) {
+      MathExtra::quatquat(quat_r2g, bonus_super[i].quat, quat_new);
+      bonus_super[i].quat[0] = quat_new[0];
+      bonus_super[i].quat[1] = quat_new[1];
+      bonus_super[i].quat[2] = quat_new[2];
+      bonus_super[i].quat[3] = quat_new[3];
+    }
+  } else {
+    for (int i = 0; i < nlocal_bonus; i++) {
+      MathExtra::quatquat(quat_r2g, bonus[i].quat, quat_new);
+      bonus[i].quat[0] = quat_new[0];
+      bonus[i].quat[1] = quat_new[1];
+      bonus[i].quat[2] = quat_new[2];
+      bonus[i].quat[3] = quat_new[3];
+    }
   }
 }
 
@@ -606,10 +882,12 @@ void AtomVecEllipsoid::write_data_restricted_to_general()
 void AtomVecEllipsoid::write_data_restore_restricted()
 {
   AtomVec::write_data_restore_restricted();
-
-  for (int i = 0; i < nlocal_bonus; i++)
-    memcpy(bonus[i].quat,quat_hold[i],4*sizeof(double));
-
+  if (atom->superellipsoid_flag) {
+    for (int i = 0; i < nlocal_bonus; i++)
+      memcpy(bonus_super[i].quat, quat_hold[i], 4 * sizeof(double));
+  } else {
+    for (int i = 0; i < nlocal_bonus; i++) memcpy(bonus[i].quat, quat_hold[i], 4 * sizeof(double));
+  }
   memory->destroy(quat_hold);
   quat_hold = nullptr;
 }
@@ -625,25 +903,215 @@ void AtomVecEllipsoid::set_shape(int i, double shapex, double shapey, double sha
   if (ellipsoid[i] < 0) {
     if (shapex == 0.0 && shapey == 0.0 && shapez == 0.0) return;
     if (nlocal_bonus == nmax_bonus) grow_bonus();
-    double *shape = bonus[nlocal_bonus].shape;
-    double *quat = bonus[nlocal_bonus].quat;
-    shape[0] = shapex;
-    shape[1] = shapey;
-    shape[2] = shapez;
-    quat[0] = 1.0;
-    quat[1] = 0.0;
-    quat[2] = 0.0;
-    quat[3] = 0.0;
-    bonus[nlocal_bonus].ilocal = i;
+    if (atom->superellipsoid_flag) {
+      double *shape = bonus_super[nlocal_bonus].shape;
+      double *quat = bonus_super[nlocal_bonus].quat;
+      double *block = bonus_super[nlocal_bonus].block;
+      double *inertia = bonus_super[nlocal_bonus].inertia;
+      BlockType &type = bonus_super[nlocal_bonus].type;
+      shape[0] = shapex;
+      shape[1] = shapey;
+      shape[2] = shapez;
+      quat[0] = 1.0;
+      quat[1] = 0.0;
+      quat[2] = 0.0;
+      quat[3] = 0.0;
+      block[0] = 2;
+      block[1] = 2;
+      type = BlockType::ELLIPSOID;
+      inertia_ellipsoid_principal(shape, rmass[i], inertia, block, type);
+      radius[i] = radius_ellipsoid(shape, block, type);
+      bonus_super[nlocal_bonus].ilocal = i;
+    } else {
+      double *shape = bonus[nlocal_bonus].shape;
+      double *quat = bonus[nlocal_bonus].quat;
+      shape[0] = shapex;
+      shape[1] = shapey;
+      shape[2] = shapez;
+      quat[0] = 1.0;
+      quat[1] = 0.0;
+      quat[2] = 0.0;
+      quat[3] = 0.0;
+      bonus[nlocal_bonus].ilocal = i;
+    }
     ellipsoid[i] = nlocal_bonus++;
   } else if (shapex == 0.0 && shapey == 0.0 && shapez == 0.0) {
     copy_bonus_all(nlocal_bonus - 1, ellipsoid[i]);
     nlocal_bonus--;
     ellipsoid[i] = -1;
+    if (atom->superellipsoid_flag) radius[i] = 0.0;
   } else {
-    double *shape = bonus[ellipsoid[i]].shape;
-    shape[0] = shapex;
-    shape[1] = shapey;
-    shape[2] = shapez;
+    if (atom->superellipsoid_flag) {
+      double *shape = bonus_super[ellipsoid[i]].shape;
+      double *block = bonus_super[ellipsoid[i]].block;
+      double *inertia = bonus_super[ellipsoid[i]].inertia;
+      BlockType type = bonus_super[ellipsoid[i]].type;
+      shape[0] = shapex;
+      shape[1] = shapey;
+      shape[2] = shapez;
+      inertia_ellipsoid_principal(shape, rmass[i], inertia, block, type);
+      radius[i] = radius_ellipsoid(shape, block, type);
+    } else {
+      double *shape = bonus[ellipsoid[i]].shape;
+      shape[0] = shapex;
+      shape[1] = shapey;
+      shape[2] = shapez;
+    }
+  }
+}
+
+/* ----------------------------------------------------------------------
+   set block values in bonus data for particle I
+   oriented aligned with xyz axes
+   this may create entry in bonus data
+------------------------------------------------------------------------- */
+
+void AtomVecEllipsoid::set_block(int i, double blockn1, double blockn2)
+{
+  if (ellipsoid[i] < 0) {
+    if (nlocal_bonus == nmax_bonus) grow_bonus();
+    double *shape = bonus_super[nlocal_bonus].shape;
+    double *quat = bonus_super[nlocal_bonus].quat;
+    double *block = bonus_super[nlocal_bonus].block;
+    double *inertia = bonus_super[nlocal_bonus].inertia;
+    BlockType &type = bonus_super[nlocal_bonus].type;
+    shape[0] = 0.5;
+    shape[1] = 0.5;
+    shape[2] = 0.5;
+    block[0] = blockn1;
+    block[1] = blockn2;
+    quat[0] = 1.0;
+    quat[1] = 0.0;
+    quat[2] = 0.0;
+    quat[3] = 0.0;
+    bonus_super[nlocal_bonus].ilocal = i;
+    type = determine_type(block);
+    inertia_ellipsoid_principal(shape, rmass[i], inertia, block, type);
+    radius[i] = radius_ellipsoid(shape, block, type);
+    ellipsoid[i] = nlocal_bonus++;
+  } else {
+    double *shape = bonus_super[ellipsoid[i]].shape;
+    double *block = bonus_super[ellipsoid[i]].block;
+    double *inertia = bonus_super[ellipsoid[i]].inertia;
+    BlockType &type = bonus_super[ellipsoid[i]].type;
+    block[0] = blockn1;
+    block[1] = blockn2;
+    type = determine_type(block);
+    inertia_ellipsoid_principal(shape, rmass[i], inertia, block, type);
+    radius[i] = radius_ellipsoid(shape, block, type);
+  }
+}
+
+AtomVecEllipsoid::BlockType AtomVecEllipsoid::determine_type(double *block)
+{
+  BlockType flag(BlockType::GENERAL);
+  if ((std::fabs(block[0] - 2) <= EPSILON_BLOCK) && (std::fabs(block[1] - 2) <= EPSILON_BLOCK))
+    flag = BlockType::ELLIPSOID;
+  else if (std::fabs(block[0] - block[1]) <= EPSILON_BLOCK)
+    flag = BlockType::N1_EQUAL_N2;
+  return flag;
+}
+
+double AtomVecEllipsoid::radius_ellipsoid(double *shape, double *block, BlockType flag_type)
+{
+  if (flag_type == BlockType::ELLIPSOID) return std::max(std::max(shape[0], shape[1]), shape[2]);
+
+  // Super ellipsoid
+  double a = shape[0], b = shape[1], c = shape[2];
+  double n1 = block[0], n2 = block[1];
+  if (shape[0] < shape[1]) {
+    a = shape[1];
+    b = shape[0];
+  }
+
+  // Cylinder approximation for n2=2
+
+  if (n2 < 2.0 + EPSILON_BLOCK) return sqrt(a * a + c * c);
+
+  // Ellipsoid approximation for n1=2
+
+  if (n1 < 2.0 + EPSILON_BLOCK) return std::max(c, sqrt(a * a + b * b));
+
+  // Bounding box approximation when n1>2 and n2>2
+
+  return sqrt(a * a + b * b + c * c);
+
+  // General superellipsoid, Eq. (12) of Podlozhnyuk et al. 2017
+  // Not sure if exact solution worth it compared to boundig box diagonal
+  // If both blockiness exponents are greater than 2, the exact radius does not
+  // seem significantly smaller than the bounding box diagonal. At most sqrt(3)~ 70% too large
+  /*
+  double x, y, z, alpha, beta, gamma, xtilde;
+  double small = 0.1; // TO AVOID OVERFLOW IN POW
+
+  alpha = std::fabs(n2 - 2.0) > small ? std::pow(b / a, 2.0 / (n2 - 2.0)) : 0.0;
+  gamma = std::fabs(n1divn2 - 1.0) > small ? std::pow((1.0 + std::pow(alpha, n2)), n1divn2 - 1.0) : 1.0;
+  beta = std::pow(gamma * c * c / (a * a), 1.0 / std::max(n1 - 2.0, small));
+  xtilde = 1.0 / std::pow(std::pow(1.0 + std::pow(alpha, n2), n1divn2) + std::pow(beta, n1), 1.0 / n1);
+  x = a * xtilde;
+  y = alpha * b * xtilde;
+  z = beta * c * xtilde;
+  return sqrt(x * x + y * y + z * z);
+  */
+}
+
+void AtomVecEllipsoid::inertia_ellipsoid_principal(double *shape, double mass, double *idiag,
+                                                   double *block, BlockType flag_type)
+{
+  double rsq0 = shape[0] * shape[0];
+  double rsq1 = shape[1] * shape[1];
+  double rsq2 = shape[2] * shape[2];
+  if (flag_type == BlockType::ELLIPSOID) {
+    double dens = 0.2 * mass;
+    idiag[0] = dens * (rsq1 + rsq2);
+    idiag[1] = dens * (rsq0 + rsq2);
+    idiag[2] = dens * (rsq0 + rsq1);
+  } else {
+    // superellipsoid, Eq. (12) of Jaklic and Solina, 2003
+    double e1 = 2.0 / block[0], e2 = 2.0 / block[1];
+    double beta_tmp1 = MathExtra::beta(0.5 * e1, 1 + 2 * e1);
+    double beta_tmp2 = MathExtra::beta(0.5 * e2, 0.5 * e2);
+    double beta_tmp3 = MathExtra::beta(0.5 * e2, 1.5 * e2);
+    double dens = mass / (MathExtra::beta(0.5 * e1, 1.0 + e1) * beta_tmp2);
+    double m0 = 0.5 * rsq0 * beta_tmp1 * beta_tmp3;
+    double m1 = 0.5 * rsq1 * beta_tmp1 * beta_tmp3;
+    double m2 = rsq2 * MathExtra::beta(1.5 * e1, 1 + e1) * beta_tmp2;
+    idiag[0] = dens * (m1 + m2);
+    idiag[1] = dens * (m0 + m2);
+    idiag[2] = dens * (m0 + m1);
+  }
+}
+
+void AtomVecEllipsoid::process_args(int narg, char **arg)
+{
+  if (narg == 0) return;
+
+  int iarg = 0;
+  while (iarg < narg) {
+    if (strcmp(arg[iarg], "superellipsoid") == 0) {
+      atom->superellipsoid_flag = 1;
+      // Circumscribed radius, not physical radius
+      atom->radius_flag = 1;
+
+      // Allocate bonus data for blockiness
+      size_border_bonus = 13;
+      size_restart_bonus_one = 13;
+      size_data_bonus = 10;
+
+      // Add radius to the arrays for communication
+      fields_grow.push_back("radius");
+      fields_copy.push_back("radius");
+      fields_border.push_back("radius");
+      fields_border_vel.push_back("radius");
+      fields_exchange.push_back("radius");
+      fields_restart.push_back("radius");
+      fields_create.push_back("radius");
+
+      setup_fields();
+
+      iarg++;
+    } else {
+      error->all(FLERR, fmt::format("Unknown atom_style ellipsoid argument: {}", arg[iarg]));
+    }
   }
 }
diff --git a/src/atom_vec_ellipsoid.h b/src/atom_vec_ellipsoid.h
index 666f9cbc42f..c9a6435b858 100644
--- a/src/atom_vec_ellipsoid.h
+++ b/src/atom_vec_ellipsoid.h
@@ -26,6 +26,11 @@ namespace LAMMPS_NS {
 
 class AtomVecEllipsoid : virtual public AtomVec {
  public:
+  enum BlockType {
+    ELLIPSOID = 0, // n1 = n2 = 2
+    N1_EQUAL_N2 = 1, // n1 = n2 > 2
+    GENERAL = 2, // n1 != n2, n1 > 2, n2 > 2
+  };
   struct Bonus {
     double shape[3];
     double quat[4];
@@ -33,6 +38,13 @@ class AtomVecEllipsoid : virtual public AtomVec {
   };
   struct Bonus *bonus;
 
+  struct BonusSuper : public Bonus {
+    double block[2];
+    double inertia[3];
+    BlockType type;
+  };
+  struct BonusSuper *bonus_super;
+
   AtomVecEllipsoid(class LAMMPS *);
   ~AtomVecEllipsoid() override;
 
@@ -66,12 +78,13 @@ class AtomVecEllipsoid : virtual public AtomVec {
   // unique to AtomVecEllipsoid
 
   void set_shape(int, double, double, double);
+  void set_block(int, double, double);
 
   int nlocal_bonus;
 
  protected:
   int *ellipsoid;
-  double *rmass;
+  double *radius, *rmass;
   double **angmom;
   double **quat_hold;
 
@@ -81,6 +94,28 @@ class AtomVecEllipsoid : virtual public AtomVec {
 
   virtual void grow_bonus();
   void copy_bonus_all(int, int);
+
+  static BlockType determine_type(double *);
+  static double radius_ellipsoid(double *, double *, BlockType);
+  static void inertia_ellipsoid_principal(double *, double, double *,
+                                   double *block, BlockType);
+
+
+  template <bool is_super>
+  int pack_comm_bonus_templated(int, int *, double *);
+
+  template <bool is_super>
+  void unpack_comm_bonus_templated(int, int, double *);
+
+  template <bool is_super>
+  int pack_border_bonus_templated(int, int *, double *);
+
+  template <bool is_super>
+  int unpack_border_bonus_templated(int, int, double *);
+
+  void process_args(int, char **) override;
+
+
 };
 
 }    // namespace LAMMPS_NS
diff --git a/src/atom_vec_sphere.cpp b/src/atom_vec_sphere.cpp
index 3c7be5d3ee4..6f670911100 100644
--- a/src/atom_vec_sphere.cpp
+++ b/src/atom_vec_sphere.cpp
@@ -109,7 +109,7 @@ void AtomVecSphere::grow_pointers()
 void AtomVecSphere::create_atom_post(int ilocal)
 {
   radius[ilocal] = 0.5;
-  rmass[ilocal] = 4.0 * MY_PI / 3.0 * 0.5 * 0.5 * 0.5;
+  rmass[ilocal] = MY_4PI3 * 0.5 * 0.5 * 0.5;
 }
 
 /* ----------------------------------------------------------------------
@@ -121,7 +121,7 @@ void AtomVecSphere::data_atom_post(int ilocal)
 {
   radius_one = 0.5 * atom->radius[ilocal];
   radius[ilocal] = radius_one;
-  if (radius_one > 0.0) rmass[ilocal] *= 4.0 * MY_PI / 3.0 * radius_one * radius_one * radius_one;
+  if (radius_one > 0.0) rmass[ilocal] *= MY_4PI3 * radius_one * radius_one * radius_one;
 
   if (rmass[ilocal] <= 0.0) error->one(FLERR, "Invalid density in Atoms section of data file");
 
@@ -141,7 +141,7 @@ void AtomVecSphere::pack_data_pre(int ilocal)
 
   radius[ilocal] *= 2.0;
   if (radius_one != 0.0)
-    rmass[ilocal] = rmass_one / (4.0 * MY_PI / 3.0 * radius_one * radius_one * radius_one);
+    rmass[ilocal] = rmass_one / (MY_4PI3 * radius_one * radius_one * radius_one);
 }
 
 /* ----------------------------------------------------------------------
diff --git a/src/compute_property_atom.cpp b/src/compute_property_atom.cpp
index a2f7e4025d0..e7a0c6292af 100644
--- a/src/compute_property_atom.cpp
+++ b/src/compute_property_atom.cpp
@@ -239,6 +239,14 @@ ComputePropertyAtom::ComputePropertyAtom(LAMMPS *lmp, int narg, char **arg) :
         error->all(FLERR,"Compute property/atom {} requires atom style ellipsoid", arg[iarg]);
       pack_choice[i] = &ComputePropertyAtom::pack_shapez;
 
+    } else if (strcmp(arg[iarg],"block1") == 0) {
+      if (!avec_ellipsoid || !atom->superellipsoid_flag)
+        error->all(FLERR,"Compute property/atom {} requires atom style ellipsoid with super flag", arg[iarg]);
+      pack_choice[i] = &ComputePropertyAtom::pack_block1;
+    } else if (strcmp(arg[iarg],"block2") == 0) {
+      if (!avec_ellipsoid || !atom->superellipsoid_flag)
+        error->all(FLERR,"Compute property/atom {} requires atom style ellipsoid with super flag", arg[iarg]);
+      pack_choice[i] = &ComputePropertyAtom::pack_block2;
     } else if (strcmp(arg[iarg],"quatw") == 0) {
       if (!avec_ellipsoid && !avec_body && !atom->quat_flag)
         error->all(FLERR,"Compute property/atom {} is not available", arg[iarg]);
@@ -255,7 +263,18 @@ ComputePropertyAtom::ComputePropertyAtom(LAMMPS *lmp, int narg, char **arg) :
       if (!avec_ellipsoid && !avec_body && !atom->quat_flag)
         error->all(FLERR,"Compute property/atom {} is not available", arg[iarg]);
       pack_choice[i] = &ComputePropertyAtom::pack_quatk;
-
+    } else if (strcmp(arg[iarg],"inertiax") == 0) {
+      if (!avec_ellipsoid || !atom->superellipsoid_flag)
+        error->all(FLERR,"Compute property/atom {} requires atom style ellipsoid with super flag", arg[iarg]);
+      pack_choice[i] = &ComputePropertyAtom::pack_inertiax;
+    } else if (strcmp(arg[iarg],"inertiay") == 0) {
+      if (!avec_ellipsoid || !atom->superellipsoid_flag)
+        error->all(FLERR,"Compute property/atom {} requires atom style ellipsoid with super flag", arg[iarg]);
+      pack_choice[i] = &ComputePropertyAtom::pack_inertiay;
+    } else if (strcmp(arg[iarg],"inertiaz") == 0) {
+      if (!avec_ellipsoid || !atom->superellipsoid_flag)
+        error->all(FLERR,"Compute property/atom {} requires atom style ellipsoid with super flag", arg[iarg]);
+      pack_choice[i] = &ComputePropertyAtom::pack_inertiaz;
     } else if (strcmp(arg[iarg],"tqx") == 0) {
       if (!atom->torque_flag)
         error->all(FLERR,"Compute property/atom {} is not available", arg[iarg]);
@@ -1317,50 +1336,166 @@ void ComputePropertyAtom::pack_angmomz(int n)
 
 void ComputePropertyAtom::pack_shapex(int n)
 {
-  AtomVecEllipsoid::Bonus *bonus = avec_ellipsoid->bonus;
   int *ellipsoid = atom->ellipsoid;
   int *mask = atom->mask;
   int nlocal = atom->nlocal;
 
-  for (int i = 0; i < nlocal; i++) {
+  if (atom->superellipsoid_flag){
+    AtomVecEllipsoid::BonusSuper *bonus = avec_ellipsoid->bonus_super;
+      for (int i = 0; i < nlocal; i++) {
     if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
       buf[n] = 2.0*bonus[ellipsoid[i]].shape[0];
     else buf[n] = 1.0;
     n += nvalues;
   }
+  } else {
+    AtomVecEllipsoid::Bonus *bonus = avec_ellipsoid->bonus;
+    for (int i = 0; i < nlocal; i++) {
+      if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
+        buf[n] = 2.0*bonus[ellipsoid[i]].shape[0];
+      else buf[n] = 1.0;
+      n += nvalues;
+    }
+  }
 }
 
 /* ---------------------------------------------------------------------- */
 
 void ComputePropertyAtom::pack_shapey(int n)
 {
-  AtomVecEllipsoid::Bonus *bonus = avec_ellipsoid->bonus;
   int *ellipsoid = atom->ellipsoid;
   int *mask = atom->mask;
   int nlocal = atom->nlocal;
 
-  for (int i = 0; i < nlocal; i++) {
+  if (atom->superellipsoid_flag){
+    AtomVecEllipsoid::BonusSuper *bonus = avec_ellipsoid->bonus_super;
+      for (int i = 0; i < nlocal; i++) {
     if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
       buf[n] = 2.0*bonus[ellipsoid[i]].shape[1];
     else buf[n] = 1.0;
     n += nvalues;
   }
+  } else {
+    AtomVecEllipsoid::Bonus *bonus = avec_ellipsoid->bonus;
+    for (int i = 0; i < nlocal; i++) {
+      if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
+        buf[n] = 2.0*bonus[ellipsoid[i]].shape[1];
+      else buf[n] = 1.0;
+      n += nvalues;
+    }
+  }
 }
 
 /* ---------------------------------------------------------------------- */
 
+
 void ComputePropertyAtom::pack_shapez(int n)
 {
-  AtomVecEllipsoid::Bonus *bonus = avec_ellipsoid->bonus;
   int *ellipsoid = atom->ellipsoid;
   int *mask = atom->mask;
   int nlocal = atom->nlocal;
 
-  for (int i = 0; i < nlocal; i++) {
+  if (atom->superellipsoid_flag){
+    AtomVecEllipsoid::BonusSuper *bonus = avec_ellipsoid->bonus_super;
+      for (int i = 0; i < nlocal; i++) {
     if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
       buf[n] = 2.0*bonus[ellipsoid[i]].shape[2];
     else buf[n] = 1.0;
     n += nvalues;
+    }
+  } else {
+    AtomVecEllipsoid::Bonus *bonus = avec_ellipsoid->bonus;
+    for (int i = 0; i < nlocal; i++) {
+      if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
+        buf[n] = 2.0*bonus[ellipsoid[i]].shape[2];
+      else buf[n] = 1.0;
+      n += nvalues;
+    }
+  }
+}
+
+/* ---------------------------------------------------------------------- */
+
+void ComputePropertyAtom::pack_block1(int n)
+{
+  AtomVecEllipsoid::BonusSuper *bonus = avec_ellipsoid->bonus_super;
+  int *ellipsoid = atom->ellipsoid;
+  int *mask = atom->mask;
+  int nlocal = atom->nlocal;
+
+  for (int i = 0; i < nlocal; i++) {
+    if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
+      buf[n] = bonus[ellipsoid[i]].block[0];
+    else buf[n] = 2.0;
+    n += nvalues;
+  }
+}
+/* ---------------------------------------------------------------------- */
+
+void ComputePropertyAtom::pack_block2(int n)
+{
+  AtomVecEllipsoid::BonusSuper *bonus = avec_ellipsoid->bonus_super;
+  int *ellipsoid = atom->ellipsoid;
+  int *mask = atom->mask;
+  int nlocal = atom->nlocal;
+
+  for (int i = 0; i < nlocal; i++) {
+    if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
+      buf[n] = bonus[ellipsoid[i]].block[1];
+    else buf[n] = 2.0;
+    n += nvalues;
+  }
+}
+
+/* ---------------------------------------------------------------------- */
+
+/* ---------------------------------------------------------------------- */
+
+void ComputePropertyAtom::pack_inertiax(int n)
+{
+  AtomVecEllipsoid::BonusSuper *bonus = avec_ellipsoid->bonus_super;
+  int *ellipsoid = atom->ellipsoid;
+  int *mask = atom->mask;
+  int nlocal = atom->nlocal;
+
+  for (int i = 0; i < nlocal; i++) {
+    if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
+      buf[n] = bonus[ellipsoid[i]].inertia[0];
+    else buf[n] = 1.0;
+    n += nvalues;
+  }
+}
+/* ---------------------------------------------------------------------- */
+
+void ComputePropertyAtom::pack_inertiay(int n)
+{
+  AtomVecEllipsoid::BonusSuper *bonus = avec_ellipsoid->bonus_super;
+  int *ellipsoid = atom->ellipsoid;
+  int *mask = atom->mask;
+  int nlocal = atom->nlocal;
+
+  for (int i = 0; i < nlocal; i++) {
+    if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
+      buf[n] = bonus[ellipsoid[i]].inertia[1];
+    else buf[n] = 1.0;
+    n += nvalues;
+  }
+}
+
+/* ---------------------------------------------------------------------- */
+
+void ComputePropertyAtom::pack_inertiaz(int n)
+{
+  AtomVecEllipsoid::BonusSuper *bonus = avec_ellipsoid->bonus_super;
+  int *ellipsoid = atom->ellipsoid;
+  int *mask = atom->mask;
+  int nlocal = atom->nlocal;
+
+  for (int i = 0; i < nlocal; i++) {
+    if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
+      buf[n] = bonus[ellipsoid[i]].inertia[2];
+    else buf[n] = 1.0;
+    n += nvalues;
   }
 }
 
@@ -1369,16 +1504,27 @@ void ComputePropertyAtom::pack_shapez(int n)
 void ComputePropertyAtom::pack_quatw(int n)
 {
   if (avec_ellipsoid) {
-    AtomVecEllipsoid::Bonus *bonus = avec_ellipsoid->bonus;
+
     int *ellipsoid = atom->ellipsoid;
     int *mask = atom->mask;
     int nlocal = atom->nlocal;
 
-    for (int i = 0; i < nlocal; i++) {
-      if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
-        buf[n] = bonus[ellipsoid[i]].quat[0];
-      else buf[n] = 1.0;
-      n += nvalues;
+    if (atom->superellipsoid_flag){
+      AtomVecEllipsoid::BonusSuper *bonus = avec_ellipsoid->bonus_super;
+      for (int i = 0; i < nlocal; i++) {
+        if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
+          buf[n] = bonus[ellipsoid[i]].quat[0];
+        else buf[n] = 1.0;
+        n += nvalues;
+      }
+    } else {
+      AtomVecEllipsoid::Bonus *bonus = avec_ellipsoid->bonus;
+      for (int i = 0; i < nlocal; i++) {
+        if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
+          buf[n] = bonus[ellipsoid[i]].quat[0];
+        else buf[n] = 1.0;
+        n += nvalues;
+      }
     }
 
   } else if (avec_body) {
@@ -1390,7 +1536,7 @@ void ComputePropertyAtom::pack_quatw(int n)
     for (int i = 0; i < nlocal; i++) {
       if ((mask[i] & groupbit) && body[i] >= 0)
         buf[n] = bonus[body[i]].quat[0];
-      else buf[n] = 0.0;
+      else buf[n] = 1.0;
       n += nvalues;
     }
   } else {
@@ -1401,7 +1547,7 @@ void ComputePropertyAtom::pack_quatw(int n)
     for (int i = 0; i < nlocal; i++) {
       if (mask[i] & groupbit)
         buf[n] = quat[i][0];
-      else buf[n] = 0.0;
+      else buf[n] = 1.0;
       n += nvalues;
     }
   }
@@ -1412,16 +1558,27 @@ void ComputePropertyAtom::pack_quatw(int n)
 void ComputePropertyAtom::pack_quati(int n)
 {
   if (avec_ellipsoid) {
-    AtomVecEllipsoid::Bonus *bonus = avec_ellipsoid->bonus;
+
     int *ellipsoid = atom->ellipsoid;
     int *mask = atom->mask;
     int nlocal = atom->nlocal;
 
-    for (int i = 0; i < nlocal; i++) {
-      if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
-        buf[n] = bonus[ellipsoid[i]].quat[1];
-      else buf[n] = 0.0;
-      n += nvalues;
+    if (atom->superellipsoid_flag){
+      AtomVecEllipsoid::BonusSuper *bonus = avec_ellipsoid->bonus_super;
+      for (int i = 0; i < nlocal; i++) {
+        if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
+          buf[n] = bonus[ellipsoid[i]].quat[1];
+        else buf[n] = 0.0;
+        n += nvalues;
+      }
+    } else {
+      AtomVecEllipsoid::Bonus *bonus = avec_ellipsoid->bonus;
+      for (int i = 0; i < nlocal; i++) {
+        if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
+          buf[n] = bonus[ellipsoid[i]].quat[1];
+        else buf[n] = 0.0;
+        n += nvalues;
+      }
     }
 
   } else if (avec_body) {
@@ -1455,16 +1612,27 @@ void ComputePropertyAtom::pack_quati(int n)
 void ComputePropertyAtom::pack_quatj(int n)
 {
   if (avec_ellipsoid) {
-    AtomVecEllipsoid::Bonus *bonus = avec_ellipsoid->bonus;
+
     int *ellipsoid = atom->ellipsoid;
     int *mask = atom->mask;
     int nlocal = atom->nlocal;
 
-    for (int i = 0; i < nlocal; i++) {
-      if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
-        buf[n] = bonus[ellipsoid[i]].quat[2];
-      else buf[n] = 0.0;
-      n += nvalues;
+    if (atom->superellipsoid_flag){
+      AtomVecEllipsoid::BonusSuper *bonus = avec_ellipsoid->bonus_super;
+      for (int i = 0; i < nlocal; i++) {
+        if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
+          buf[n] = bonus[ellipsoid[i]].quat[2];
+        else buf[n] = 0.0;
+        n += nvalues;
+      }
+    } else {
+      AtomVecEllipsoid::Bonus *bonus = avec_ellipsoid->bonus;
+      for (int i = 0; i < nlocal; i++) {
+        if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
+          buf[n] = bonus[ellipsoid[i]].quat[2];
+        else buf[n] = 0.0;
+        n += nvalues;
+      }
     }
 
   } else if (avec_body) {
@@ -1498,16 +1666,27 @@ void ComputePropertyAtom::pack_quatj(int n)
 void ComputePropertyAtom::pack_quatk(int n)
 {
   if (avec_ellipsoid) {
-    AtomVecEllipsoid::Bonus *bonus = avec_ellipsoid->bonus;
+
     int *ellipsoid = atom->ellipsoid;
     int *mask = atom->mask;
     int nlocal = atom->nlocal;
 
-    for (int i = 0; i < nlocal; i++) {
-      if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
-        buf[n] = bonus[ellipsoid[i]].quat[3];
-      else buf[n] = 0.0;
-      n += nvalues;
+    if (atom->superellipsoid_flag){
+      AtomVecEllipsoid::BonusSuper *bonus = avec_ellipsoid->bonus_super;
+      for (int i = 0; i < nlocal; i++) {
+        if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
+          buf[n] = bonus[ellipsoid[i]].quat[3];
+        else buf[n] = 0.0;
+        n += nvalues;
+      }
+    } else {
+      AtomVecEllipsoid::Bonus *bonus = avec_ellipsoid->bonus;
+      for (int i = 0; i < nlocal; i++) {
+        if ((mask[i] & groupbit) && ellipsoid[i] >= 0)
+          buf[n] = bonus[ellipsoid[i]].quat[3];
+        else buf[n] = 0.0;
+        n += nvalues;
+      }
     }
 
   } else if (avec_body) {
diff --git a/src/compute_property_atom.h b/src/compute_property_atom.h
index c6f4b2fd652..ccd88f45d60 100644
--- a/src/compute_property_atom.h
+++ b/src/compute_property_atom.h
@@ -104,10 +104,15 @@ class ComputePropertyAtom : public Compute {
   void pack_shapex(int);
   void pack_shapey(int);
   void pack_shapez(int);
+  void pack_block1(int);
+  void pack_block2(int);
   void pack_quatw(int);
   void pack_quati(int);
   void pack_quatj(int);
   void pack_quatk(int);
+  void pack_inertiax(int);
+  void pack_inertiay(int);
+  void pack_inertiaz(int);
   void pack_tqx(int);
   void pack_tqy(int);
   void pack_tqz(int);
diff --git a/src/fix_langevin.cpp b/src/fix_langevin.cpp
index 9b637853a53..9f8df1362d9 100644
--- a/src/fix_langevin.cpp
+++ b/src/fix_langevin.cpp
@@ -238,6 +238,11 @@ void FixLangevin::init()
         if (ellipsoid[i] < 0) error->one(FLERR, "Fix langevin angmom requires extended particles");
   }
 
+  // check that superellipsoids are not used
+
+  if (atom->superellipsoid_flag)
+    error->all(FLERR, "Fix langevin does not support superellipsoids");
+
   // set force prefactors
 
   if (!atom->rmass) {
diff --git a/src/fix_move.cpp b/src/fix_move.cpp
index be170aea9e3..051bbf903f3 100644
--- a/src/fix_move.cpp
+++ b/src/fix_move.cpp
@@ -349,9 +349,12 @@ FixMove::FixMove(LAMMPS *lmp, int narg, char **arg) :
     for (int i = 0; i < nlocal; i++) {
       quat = nullptr;
       if (mask[i] & groupbit) {
-        if (ellipsoid_flag && ellipsoid[i] >= 0)
-          quat = avec_ellipsoid->bonus[ellipsoid[i]].quat;
-        else if (tri_flag && tri[i] >= 0)
+        if (ellipsoid_flag && ellipsoid[i] >= 0) {
+          if (atom->superellipsoid_flag)
+            quat = avec_ellipsoid->bonus_super[ellipsoid[i]].quat;
+          else
+            quat = avec_ellipsoid->bonus[ellipsoid[i]].quat;
+        } else if (tri_flag && tri[i] >= 0)
           quat = avec_tri->bonus[tri[i]].quat;
         else if (body_flag && body[i] >= 0)
           quat = avec_body->bonus[body[i]].quat;
@@ -779,8 +782,13 @@ void FixMove::initial_integrate(int /*vflag*/)
           if (angmom_flag) {
             quat = inertia = nullptr;
             if (ellipsoid_flag && ellipsoid[i] >= 0) {
-              quat = avec_ellipsoid->bonus[ellipsoid[i]].quat;
-              shape = avec_ellipsoid->bonus[ellipsoid[i]].shape;
+              if (atom->superellipsoid_flag) {
+                quat = avec_ellipsoid->bonus_super[ellipsoid[i]].quat;
+                shape = avec_ellipsoid->bonus_super[ellipsoid[i]].shape;
+              } else {
+                quat = avec_ellipsoid->bonus[ellipsoid[i]].quat;
+                shape = avec_ellipsoid->bonus[ellipsoid[i]].shape;
+              }
               inertia_ellipsoid[0] =
                   INERTIA * rmass[i] * (shape[1] * shape[1] + shape[2] * shape[2]);
               inertia_ellipsoid[1] =
@@ -816,7 +824,10 @@ void FixMove::initial_integrate(int /*vflag*/)
           if (quat_flag && !quat_atom_flag) {
             quat = nullptr;
             if (ellipsoid_flag && ellipsoid[i] >= 0)
-              quat = avec_ellipsoid->bonus[ellipsoid[i]].quat;
+              if (atom->superellipsoid_flag)
+                quat = avec_ellipsoid->bonus_super[ellipsoid[i]].quat;
+              else
+                quat = avec_ellipsoid->bonus[ellipsoid[i]].quat;
             else if (tri_flag && tri[i] >= 0)
               quat = avec_tri->bonus[tri[i]].quat;
             else if (body_flag && body[i] >= 0)
@@ -923,15 +934,20 @@ void FixMove::initial_integrate(int /*vflag*/)
           if (angmom_flag) {
             quat = inertia = nullptr;
             if (ellipsoid_flag && ellipsoid[i] >= 0) {
-              quat = avec_ellipsoid->bonus[ellipsoid[i]].quat;
-              shape = avec_ellipsoid->bonus[ellipsoid[i]].shape;
-              inertia_ellipsoid[0] =
-                  INERTIA * rmass[i] * (shape[1] * shape[1] + shape[2] * shape[2]);
-              inertia_ellipsoid[1] =
-                  INERTIA * rmass[i] * (shape[0] * shape[0] + shape[2] * shape[2]);
-              inertia_ellipsoid[2] =
-                  INERTIA * rmass[i] * (shape[0] * shape[0] + shape[1] * shape[1]);
-              inertia = inertia_ellipsoid;
+              if (atom->superellipsoid_flag){
+                quat = avec_ellipsoid->bonus_super[ellipsoid[i]].quat;
+                inertia = avec_ellipsoid->bonus_super[ellipsoid[i]].inertia;
+              } else {
+                quat = avec_ellipsoid->bonus[ellipsoid[i]].quat;
+                shape = avec_ellipsoid->bonus[ellipsoid[i]].shape;
+                inertia_ellipsoid[0] =
+                    INERTIA * rmass[i] * (shape[1] * shape[1] + shape[2] * shape[2]);
+                inertia_ellipsoid[1] =
+                    INERTIA * rmass[i] * (shape[0] * shape[0] + shape[2] * shape[2]);
+                inertia_ellipsoid[2] =
+                    INERTIA * rmass[i] * (shape[0] * shape[0] + shape[1] * shape[1]);
+                inertia = inertia_ellipsoid;
+              }
             } else if (tri_flag && tri[i] >= 0) {
               quat = avec_tri->bonus[tri[i]].quat;
               inertia = avec_tri->bonus[tri[i]].inertia;
@@ -960,7 +976,10 @@ void FixMove::initial_integrate(int /*vflag*/)
           if (quat_flag && !quat_atom_flag) {
             quat = nullptr;
             if (ellipsoid_flag && ellipsoid[i] >= 0)
-              quat = avec_ellipsoid->bonus[ellipsoid[i]].quat;
+              if (atom->superellipsoid_flag)
+                quat = avec_ellipsoid->bonus_super[ellipsoid[i]].quat;
+              else
+                quat = avec_ellipsoid->bonus[ellipsoid[i]].quat;
             else if (tri_flag && tri[i] >= 0)
               quat = avec_tri->bonus[tri[i]].quat;
             else if (body_flag && body[i] >= 0)
@@ -1440,7 +1459,10 @@ void FixMove::set_arrays(int i)
       if (quat_flag & !quat_atom_flag) {
         quat = nullptr;
         if (ellipsoid_flag && ellipsoid[i] >= 0)
-          quat = avec_ellipsoid->bonus[ellipsoid[i]].quat;
+          if (atom->superellipsoid_flag)
+            quat = avec_ellipsoid->bonus_super[ellipsoid[i]].quat;
+          else
+            quat = avec_ellipsoid->bonus[ellipsoid[i]].quat;
         else if (tri_flag && tri[i] >= 0)
           quat = avec_tri->bonus[tri[i]].quat;
         else if (body_flag && body[i] >= 0)
@@ -1503,7 +1525,10 @@ void FixMove::set_arrays(int i)
       if (quat_flag && !quat_atom_flag) {
         quat = nullptr;
         if (ellipsoid_flag && ellipsoid[i] >= 0)
-          quat = avec_ellipsoid->bonus[ellipsoid[i]].quat;
+          if (atom->superellipsoid_flag)
+            quat = avec_ellipsoid->bonus_super[ellipsoid[i]].quat;
+          else
+            quat = avec_ellipsoid->bonus[ellipsoid[i]].quat;
         else if (tri_flag && tri[i] >= 0)
           quat = avec_tri->bonus[tri[i]].quat;
         else if (body_flag && body[i] >= 0)
diff --git a/src/math_extra.cpp b/src/math_extra.cpp
index a36600d970a..4e5c5363ed4 100644
--- a/src/math_extra.cpp
+++ b/src/math_extra.cpp
@@ -17,9 +17,15 @@
 ------------------------------------------------------------------------- */
 
 #include "math_extra.h"
+#include "math_special.h"
+#include "math_const.h"
+#include <algorithm>
 #include <cstdio>
 #include <cstring>
 
+using namespace LAMMPS_NS;
+using MathConst::MY_4PI3;
+
 namespace MathExtra {
 
 /* ----------------------------------------------------------------------
@@ -471,12 +477,12 @@ void quat_to_mat_trans(const double *quat, double mat[3][3])
 
 /* ----------------------------------------------------------------------
    compute space-frame inertia tensor of an ellipsoid
-   radii = 3 radii of ellipsoid
+   shape = 3 semiaxes of ellipsoid
    quat = orientiation quaternion of ellipsoid
    return symmetric inertia tensor as 6-vector in Voigt ordering
 ------------------------------------------------------------------------- */
 
-void inertia_ellipsoid(double *radii, double *quat, double mass,
+void inertia_ellipsoid(double *shape, double *quat, double mass,
                        double *inertia)
 {
   double p[3][3],ptrans[3][3],itemp[3][3],tensor[3][3];
@@ -484,9 +490,31 @@ void inertia_ellipsoid(double *radii, double *quat, double mass,
 
   quat_to_mat(quat,p);
   quat_to_mat_trans(quat,ptrans);
-  idiag[0] = 0.2*mass * (radii[1]*radii[1] + radii[2]*radii[2]);
-  idiag[1] = 0.2*mass * (radii[0]*radii[0] + radii[2]*radii[2]);
-  idiag[2] = 0.2*mass * (radii[0]*radii[0] + radii[1]*radii[1]);
+  idiag[0] = 0.2*mass * (shape[1]*shape[1] + shape[2]*shape[2]);
+  idiag[1] = 0.2*mass * (shape[0]*shape[0] + shape[2]*shape[2]);
+  idiag[2] = 0.2*mass * (shape[0]*shape[0] + shape[1]*shape[1]);
+  diag_times3(idiag,ptrans,itemp);
+  times3(p,itemp,tensor);
+  inertia[0] = tensor[0][0];
+  inertia[1] = tensor[1][1];
+  inertia[2] = tensor[2][2];
+  inertia[3] = tensor[1][2];
+  inertia[4] = tensor[0][2];
+  inertia[5] = tensor[0][1];
+}
+
+/* ----------------------------------------------------------------------
+  Superellipsoid inertia tensor
+  No need to compute new inertia tensor
+  for superellipsoid since it is stored in bonus_super
+------------------------------------------------------------------------- */
+
+void inertia_ellipsoid(double *idiag, double *quat, double *inertia)
+{
+  double p[3][3],ptrans[3][3],itemp[3][3],tensor[3][3];
+
+  quat_to_mat(quat,p);
+  quat_to_mat_trans(quat,ptrans);
   diag_times3(idiag,ptrans,itemp);
   times3(p,itemp,tensor);
   inertia[0] = tensor[0][0];
@@ -600,6 +628,39 @@ void inertia_triangle(double *idiag, double *quat, double /*mass*/,
   inertia[5] = tensor[0][1];
 }
 
+/* ----------------------------------------------------------------------
+   compute the volume of the ellipsoid
+   shape = 3 radii of ellipsoid
+   return volume of the ellipsoid
+------------------------------------------------------------------------- */
+
+double volume_ellipsoid(double *shape)
+{
+  double unitvol = MY_4PI3;
+  return unitvol * shape[0] * shape[1] * shape[2];
+}
+
+/* ----------------------------------------------------------------------
+   compute the volume of the (super)ellipsoid
+   shape = 3 radii of (super)ellipsoid
+   block = blockiness exponents of (super)ellipsoid
+   return volume of the (super)ellipsoid
+------------------------------------------------------------------------- */
+
+double volume_ellipsoid(double *shape, double *block, int flag_super)
+{
+  double unitvol = MY_4PI3;
+
+  // superellipsoid, Eq. (12) of Jaklic and Solina, 2003, for p = q = r = 0
+
+  if (flag_super) {
+    double e1 = 2.0 / block[0], e2 = 2.0 / block[1];
+    unitvol = e1 * e2 * beta(0.5 * e1, 1.0 + e1) *
+                        beta(0.5 * e2, 0.5 * e2);
+  }
+  return unitvol * shape[0] * shape[1] * shape[2];
+}
+
 /* ----------------------------------------------------------------------
    build rotation matrix for a small angle rotation around the X axis
 ------------------------------------------------------------------------- */
diff --git a/src/math_extra.h b/src/math_extra.h
index 52d1d838ffb..8b5373b8a57 100644
--- a/src/math_extra.h
+++ b/src/math_extra.h
@@ -88,7 +88,8 @@ inline void multiply_shape_shape(const double *one, const double *two, double *a
 // quaternion operations
 
 inline void qnormalize(double *q);
-inline void qconjugate(double *q, double *qc);
+inline void qconjugate(double *q,
+                       double *qc);    // would it be better to have q passed as const double?
 inline void vecquat(double *a, double *b, double *c);
 inline void quatvec(double *a, double *b, double *c);
 inline void quatquat(double *a, double *b, double *c);
@@ -116,16 +117,23 @@ void BuildRyMatrix(double R[3][3], const double angle);
 void BuildRzMatrix(double R[3][3], const double angle);
 
 // moment of inertia operations
-
+void inertia_ellipsoid(double *idiag, double *quat, double *inertia);    //superellipsoid version
 void inertia_ellipsoid(double *shape, double *quat, double mass, double *inertia);
 void inertia_line(double length, double theta, double mass, double *inertia);
 void inertia_triangle(double *v0, double *v1, double *v2, double mass, double *inertia);
 void inertia_triangle(double *idiag, double *quat, double mass, double *inertia);
 
-// triclinic bounding box of a spher
+// volume of ellipsoid
+double volume_ellipsoid(double *shape);
+double volume_ellipsoid(double *shape, double *block, int flag_super);
+
+// triclinic bounding box of a sphere
 
 void tribbox(double *, double, double *);
 
+// alternative to std::beta
+double beta(double x, double y);
+
 }    // namespace MathExtra
 
 /* ----------------------------------------------------------------------
@@ -838,4 +846,9 @@ inline void MathExtra::outer3(const double *v1, const double *v2, double ans[3][
   ans[2][2] = v1[2] * v2[2];
 }
 
+inline double MathExtra::beta(double x, double y)
+{
+  return std::exp(std::lgamma(x) + std::lgamma(y) - std::lgamma(x + y));
+}
+
 #endif
diff --git a/src/math_special.h b/src/math_special.h
index b5ac1c73fba..3cce45538fa 100644
--- a/src/math_special.h
+++ b/src/math_special.h
@@ -69,7 +69,8 @@ namespace LAMMPS_NS::MathSpecial {
 
   extern double erfcx_y100(const double y100);
 
-  /*! Fast scaled error function complement exp(x*x)*erfc(x) for coul/long styles
+
+ /*! Fast scaled error function complement exp(x*x)*erfc(x) for coul/long styles
    *
    *  This is a portable fast implementation of exp(x*x)*erfc(x) that can be used
    *  in coul/long pair styles as a replacement for the polynomial expansion that
@@ -183,7 +184,7 @@ namespace LAMMPS_NS::MathSpecial {
 
     return yy;
   }
-} // namespace LAMMPS_NS::MathSpecial
+}    // namespace LAMMPS_NS::MathSpecial
 
 
 #endif
diff --git a/src/set.cpp b/src/set.cpp
index 4a780e53e00..6f786ddbd5f 100644
--- a/src/set.cpp
+++ b/src/set.cpp
@@ -44,7 +44,7 @@ using namespace MathConst;
 
 enum{ATOM_SELECT,MOL_SELECT,TYPE_SELECT,GROUP_SELECT,REGION_SELECT};
 
-enum{ANGLE,ANGMOM,APIP_LAMBDA,BOND,CC,CHARGE,DENSITY,DIAMETER,DIHEDRAL,DIPOLE,
+enum{ANGLE,ANGMOM,APIP_LAMBDA,BLOCK,BOND,CC,CHARGE,DENSITY,DIAMETER,DIHEDRAL,DIPOLE,
   DIPOLE_RANDOM,DPD_THETA,EDPD_CV,EDPD_TEMP,EPSILON,IMAGE,IMPROPER,LENGTH,
   MASS,MOLECULE,OMEGA,QUAT,QUAT_RANDOM,RADIUS_ELECTRON,RHEO_STATUS,SHAPE,
   SMD_CONTACT_RADIUS,SMD_MASS_DENSITY,SPH_CV,SPH_E,SPH_RHO,
@@ -212,6 +212,10 @@ void Set::process_args(int caller_flag, int narg, char **arg)
       action->keyword = APIP_LAMBDA;
       process_apip_lambda(iarg,narg,arg,action);
       invoke_choice[naction++] = &Set::invoke_apip_lambda;
+    } else if (strcmp(arg[iarg],"block") == 0) {
+      action->keyword = BLOCK;
+      process_block(iarg, narg, arg, action);
+      invoke_choice[naction++] = &Set::invoke_block;
     } else if (strcmp(arg[iarg],"bond") == 0) {
       action->keyword = BOND;
       process_bond(iarg,narg,arg,action);
@@ -801,8 +805,10 @@ void Set::setrandom(int keyword, Action *action)
     if (domain->dimension == 3) {
       for (i = 0; i < nlocal; i++)
         if (select[i]) {
-          if (avec_ellipsoid && ellipsoid[i] >= 0)
-            quat_one = avec_ellipsoid->bonus[ellipsoid[i]].quat;
+          if (avec_ellipsoid && ellipsoid[i] >= 0){
+            if (atom->superellipsoid_flag) quat_one = avec_ellipsoid->bonus_super[ellipsoid[i]].quat;
+            else quat_one = avec_ellipsoid->bonus[ellipsoid[i]].quat;
+          }
           else if (avec_tri && tri[i] >= 0)
             quat_one = avec_tri->bonus[tri[i]].quat;
           else if (avec_body && body[i] >= 0)
@@ -828,8 +834,10 @@ void Set::setrandom(int keyword, Action *action)
       double theta2;
       for (i = 0; i < nlocal; i++)
         if (select[i]) {
-          if (avec_ellipsoid && ellipsoid[i] >= 0)
-            quat_one = avec_ellipsoid->bonus[ellipsoid[i]].quat;
+          if (avec_ellipsoid && ellipsoid[i] >= 0){
+              if (atom->superellipsoid_flag) quat_one = avec_ellipsoid->bonus_super[ellipsoid[i]].quat;
+              else quat_one = avec_ellipsoid->bonus[ellipsoid[i]].quat;
+          }
           else if (avec_body && body[i] >= 0)
             quat_one = avec_body->bonus[body[i]].quat;
           else if (quat_flag)
@@ -1114,6 +1122,57 @@ void Set::invoke_apip_lambda(Action *action)
 
 /* ---------------------------------------------------------------------- */
 
+void Set::process_block(int &iarg, int narg, char **arg, Action *action)
+{
+  if (!atom->superellipsoid_flag)
+    error->all(FLERR,"Cannot set attribute {} for atom style {} (available with ellipsoid with superellipsoid flag)", arg[iarg], atom->get_style());
+  if (iarg+3 > narg) utils::missing_cmd_args(FLERR, "set block", error);
+  if (utils::strmatch(arg[iarg+1],"^v_")) varparse(arg[iarg+1],1,action);
+  else {
+    action->dvalue1 = utils::numeric(FLERR,arg[iarg+1],false,lmp);
+    if (action->dvalue1 < 2.0) error->one(FLERR,"Invalid block in set command");
+  }
+  if (utils::strmatch(arg[iarg+2],"^v_")) varparse(arg[iarg+2],2,action);
+  else {
+    action->dvalue2 = utils::numeric(FLERR,arg[iarg+2],false,lmp);
+    if (action->dvalue2 < 2.0) error->one(FLERR,"Invalid block in set command");
+  }
+  iarg += 3;
+}
+
+
+void Set::invoke_block(Action *action)
+{
+  int nlocal = atom->nlocal;
+  auto *avec_ellipsoid = dynamic_cast<AtomVecEllipsoid *>(atom->style_match("ellipsoid"));
+
+  int varflag = action->varflag;
+  double block1 = 0.0, block2 = 0.0;
+  if (!action->varflag1) block1 = action->dvalue1;
+  if (!action->varflag2) block2 = action->dvalue2;
+
+  for (int i = 0; i < nlocal; i++) {
+    if (!select[i]) continue;
+
+    if (varflag) {
+      if (action->varflag1) block1 = vec1[i];
+      if (action->varflag2) block2 = vec2[i];
+      if (block1 < 2.0 || block2 < 2.0)
+        error->one(FLERR, Error::NOLASTLINE, "Invalid block in set command");
+    }
+
+    avec_ellipsoid->set_block(i, block1, block2);
+  }
+
+  // update global ellipsoid count
+  // TODO: Not sure if block should update the ellipsoid count
+  //       what happens if you call this twice in invike_shape and invoke_block ?
+  //   bigint nlocal_bonus = avec_ellipsoid->nlocal_bonus;
+  //   MPI_Allreduce(&nlocal_bonus,&atom->nellipsoids,1,MPI_LMP_BIGINT,MPI_SUM,world);
+}
+
+/* ---------------------------------------------------------------------- */
+
 void Set::process_bond(int &iarg, int narg, char **arg, Action *action)
 {
   if (atom->avec->bonds_allow == 0)
@@ -1285,7 +1344,9 @@ void Set::invoke_density(Action *action)
       else rmass[i] = 4.0*MY_PI/3.0 * radius[i]*radius[i]*radius[i] * density;
 
     else if (ellipsoid_flag && ellipsoid[i] >= 0) {
-      double *shape = avec_ellipsoid->bonus[ellipsoid[i]].shape;
+      double *shape;
+      if (atom->superellipsoid_flag) shape = avec_ellipsoid->bonus_super[ellipsoid[i]].shape;
+      else shape = avec_ellipsoid->bonus[ellipsoid[i]].shape;
       // could enable 2d ellipse (versus 3d ellipsoid) when time integration
       //   options (fix nve/asphere, fix nh/asphere) are also implemented
       // if (discflag)
@@ -1953,8 +2014,10 @@ void Set::invoke_quat(Action *action)
   for (int i = 0; i < nlocal; i++) {
     if (!select[i]) continue;
 
-    if (avec_ellipsoid && ellipsoid[i] >= 0)
-      quat_one = avec_ellipsoid->bonus[ellipsoid[i]].quat;
+    if (avec_ellipsoid && ellipsoid[i] >= 0){
+      if (atom->superellipsoid_flag) quat_one = avec_ellipsoid->bonus_super[ellipsoid[i]].quat;
+      else quat_one = avec_ellipsoid->bonus[ellipsoid[i]].quat;
+    }
     else if (avec_tri && tri[i] >= 0)
       quat_one = avec_tri->bonus[tri[i]].quat;
     else if (avec_body && body[i] >= 0)
diff --git a/src/set.h b/src/set.h
index bcf06e6ba18..a44f6b9b8bf 100644
--- a/src/set.h
+++ b/src/set.h
@@ -93,6 +93,7 @@ class Set : public Command {
   void process_angle(int &, int, char **, Action *);
   void process_angmom(int &, int, char **, Action *);
   void process_apip_lambda(int &, int, char **, Action *);
+  void process_block(int &, int, char **, Action *);
   void process_bond(int &, int, char **, Action *);
   void process_cc(int &, int, char **, Action *);
   void process_charge(int &, int, char **, Action *);
@@ -147,6 +148,7 @@ class Set : public Command {
   void invoke_angle(Action *);
   void invoke_angmom(Action *);
   void invoke_apip_lambda(Action *);
+  void invoke_block(Action *);
   void invoke_bond(Action *);
   void invoke_cc(Action *);
   void invoke_charge(Action *);
diff --git a/unittest/formats/test_atom_styles.cpp b/unittest/formats/test_atom_styles.cpp
index f3004951b34..6023fb00469 100644
--- a/unittest/formats/test_atom_styles.cpp
+++ b/unittest/formats/test_atom_styles.cpp
@@ -1459,6 +1459,192 @@ TEST_F(AtomStyleTest, ellipsoid)
     EXPECT_NEAR(bonus[3].quat[3], 0.25056280708573159, EPSILON);
 }
 
+TEST_F(AtomStyleTest, superellipsoid)
+{
+    if (!Info::has_package("ASPHERE")) GTEST_SKIP();
+
+    BEGIN_HIDE_OUTPUT();
+    command("atom_style ellipsoid superellipsoid");
+    END_HIDE_OUTPUT();
+
+    AtomState expected;
+    expected.atom_style     = "ellipsoid";
+    expected.molecular      = Atom::ATOMIC;
+    expected.tag_enable     = 1;
+    expected.ellipsoid_flag = 1;
+    expected.rmass_flag     = 1;
+    expected.radius_flag    = 1;
+    expected.angmom_flag    = 1;
+    expected.torque_flag    = 1;
+    expected.has_type       = true;
+    expected.has_mask       = true;
+    expected.has_image      = true;
+    expected.has_x          = true;
+    expected.has_v          = true;
+    expected.has_f          = true;
+
+    ASSERT_ATOM_STATE_EQ(lmp->atom, expected);
+    ASSERT_EQ(lmp->atom->superellipsoid_flag, 1);
+
+    BEGIN_HIDE_OUTPUT();
+    command("create_box 4 box");
+    command("create_atoms 1 single -2.0  2.0  0.1"); // Point
+    command("create_atoms 2 single  2.0  2.0 -0.1"); // ELLIPSOID (n1=2, n2=2)
+    command("create_atoms 3 single  2.0  2.0 -2.1"); // GENERAL (n1!=n2)
+    command("create_atoms 4 single -2.0 -2.0  0.1"); // N1_EQUAL_N2
+    command("set type 1 mass 4.0");
+    command("set type 2 mass 2.4");
+    command("set type 3 mass 4.4");
+    command("set type 4 mass 5.0");
+    command("set type 2 shape 1.0 1.0 1.0");
+    command("set type 3 shape 3.0 0.8 1.1");
+    command("set type 4 shape 2.0 2.0 2.0");
+    command("set type 3 block 4.0 3.0");
+    command("set type 4 block 3.5 3.5");
+    command("pair_coeff * *");
+    END_HIDE_OUTPUT();
+    ASSERT_THAT(std::string(lmp->atom->atom_style), Eq("ellipsoid"));
+    ASSERT_NE(lmp->atom->avec, nullptr);
+    ASSERT_EQ(lmp->atom->natoms, 4);
+    ASSERT_EQ(lmp->atom->nellipsoids, 3);
+    ASSERT_EQ(lmp->atom->nlocal, 4);
+    ASSERT_EQ(lmp->atom->nghost, 0);
+    ASSERT_NE(lmp->atom->nmax, -1);
+    ASSERT_EQ(lmp->atom->tag_enable, 1);
+    ASSERT_EQ(lmp->atom->molecular, Atom::ATOMIC);
+    ASSERT_EQ(lmp->atom->ntypes, 4);
+    ASSERT_EQ(lmp->atom->nextra_grow, 0);
+    ASSERT_EQ(lmp->atom->nextra_restart, 0);
+    ASSERT_EQ(lmp->atom->nextra_border, 0);
+    ASSERT_EQ(lmp->atom->nextra_grow_max, 0);
+    ASSERT_EQ(lmp->atom->nextra_restart_max, 0);
+    ASSERT_EQ(lmp->atom->nextra_border_max, 0);
+    ASSERT_EQ(lmp->atom->nextra_store, 0);
+    ASSERT_EQ(lmp->atom->extra_grow, nullptr);
+    ASSERT_EQ(lmp->atom->extra_restart, nullptr);
+    ASSERT_EQ(lmp->atom->extra_border, nullptr);
+    ASSERT_EQ(lmp->atom->extra, nullptr);
+
+    ASSERT_EQ(lmp->atom->mass, nullptr);
+    ASSERT_NE(lmp->atom->rmass, nullptr);
+    ASSERT_NE(lmp->atom->radius, nullptr);
+    ASSERT_NE(lmp->atom->ellipsoid, nullptr);
+    ASSERT_EQ(lmp->atom->mass_setflag, nullptr);
+
+    BEGIN_HIDE_OUTPUT();
+    command("write_data test_atom_styles.data nocoeff");
+    command("clear");
+    command("atom_style ellipsoid superellipsoid");
+    command("pair_style zero 4.0");
+    command("units real");
+    command("atom_modify map array");
+    command("read_data test_atom_styles.data");
+    command("pair_coeff * *");
+    END_HIDE_OUTPUT();
+    ASSERT_THAT(std::string(lmp->atom->atom_style), Eq("ellipsoid"));
+    ASSERT_NE(lmp->atom->avec, nullptr);
+    ASSERT_EQ(lmp->atom->natoms, 4);
+    ASSERT_EQ(lmp->atom->nlocal, 4);
+    ASSERT_EQ(lmp->atom->nellipsoids, 3);
+    ASSERT_EQ(lmp->atom->nghost, 0);
+    ASSERT_NE(lmp->atom->nmax, -1);
+    ASSERT_EQ(lmp->atom->tag_enable, 1);
+    ASSERT_EQ(lmp->atom->molecular, Atom::ATOMIC);
+    ASSERT_EQ(lmp->atom->ntypes, 4);
+    ASSERT_EQ(lmp->atom->ellipsoid_flag, 1);
+    ASSERT_NE(lmp->atom->ellipsoid, nullptr);
+    ASSERT_NE(lmp->atom->sametag, nullptr);
+    ASSERT_EQ(lmp->atom->tag_consecutive(), 1);
+    ASSERT_EQ(lmp->atom->map_style, Atom::MAP_ARRAY);
+    ASSERT_EQ(lmp->atom->map_user, 1);
+    ASSERT_EQ(lmp->atom->map_tag_max, 4);
+
+    auto *type      = lmp->atom->type;
+    auto *ellipsoid = lmp->atom->ellipsoid;
+    auto *rmass     = lmp->atom->rmass;
+    auto *avec      = dynamic_cast<AtomVecEllipsoid *>(lmp->atom->avec);
+    auto *bonus     = avec->bonus_super;
+
+    ASSERT_EQ(type[GETIDX(1)], 1);
+    ASSERT_EQ(ellipsoid[GETIDX(1)], -1);
+    EXPECT_NEAR(rmass[GETIDX(1)], 4.0, EPSILON);
+    ASSERT_EQ(type[GETIDX(2)], 2);
+    ASSERT_EQ(ellipsoid[GETIDX(2)], 0);
+    EXPECT_NEAR(rmass[GETIDX(2)], 2.4, EPSILON);
+    EXPECT_NEAR(bonus[0].shape[0], 0.5, EPSILON);
+    EXPECT_NEAR(bonus[0].shape[1], 0.5, EPSILON);
+    EXPECT_NEAR(bonus[0].shape[2], 0.5, EPSILON);
+    EXPECT_NEAR(bonus[0].block[0], 2.0, EPSILON); // set by default
+    EXPECT_NEAR(bonus[0].block[1], 2.0, EPSILON); // set by default
+    EXPECT_NEAR(bonus[0].type, 0, EPSILON); // BlockType::ELLIPSOID
+    ASSERT_EQ(type[GETIDX(3)], 3);
+    ASSERT_EQ(ellipsoid[GETIDX(3)], 1);
+    EXPECT_NEAR(rmass[GETIDX(3)], 4.4, EPSILON);
+    EXPECT_NEAR(bonus[1].shape[0], 1.5, EPSILON);
+    EXPECT_NEAR(bonus[1].shape[1], 0.4, EPSILON);
+    EXPECT_NEAR(bonus[1].shape[2], 0.55, EPSILON);
+    EXPECT_NEAR(bonus[1].block[0], 4.0, EPSILON);
+    EXPECT_NEAR(bonus[1].block[1], 3.0, EPSILON);
+    EXPECT_NEAR(bonus[1].type, 2, EPSILON); // BlockType::GENERAL
+    ASSERT_EQ(type[GETIDX(4)], 4);
+    ASSERT_EQ(ellipsoid[GETIDX(4)], 2);
+    EXPECT_NEAR(rmass[GETIDX(4)], 5.0, EPSILON);
+    EXPECT_NEAR(bonus[2].shape[0], 1.0, EPSILON);
+    EXPECT_NEAR(bonus[2].shape[1], 1.0, EPSILON);
+    EXPECT_NEAR(bonus[2].shape[2], 1.0, EPSILON);
+    EXPECT_NEAR(bonus[2].block[0], 3.5, EPSILON);
+    EXPECT_NEAR(bonus[2].block[1], 3.5, EPSILON);
+    EXPECT_NEAR(bonus[2].type, 1, EPSILON); // BlockType::N1_EQUAL_N2
+
+    BEGIN_HIDE_OUTPUT();
+    command("write_restart test_atom_styles.restart");
+    command("clear");
+    command("read_restart test_atom_styles.restart");
+    command("comm_style tiled");
+    command("replicate 1 1 2 bbox");
+    END_HIDE_OUTPUT();
+
+    ASSERT_THAT(std::string(lmp->atom->atom_style), Eq("ellipsoid"));
+    ASSERT_NE(lmp->atom->avec, nullptr);
+    ASSERT_EQ(lmp->atom->natoms, 8);
+    ASSERT_EQ(lmp->atom->nlocal, 8);
+    ASSERT_EQ(lmp->atom->nellipsoids, 6);
+    ASSERT_EQ(lmp->atom->superellipsoid_flag, 1);
+
+    type      = lmp->atom->type;
+    ellipsoid = lmp->atom->ellipsoid;
+    rmass     = lmp->atom->rmass;
+    avec      = dynamic_cast<AtomVecEllipsoid *>(lmp->atom->avec);
+    bonus     = avec->bonus_super;
+
+    ASSERT_EQ(type[GETIDX(1)], 1);
+    ASSERT_EQ(type[GETIDX(2)], 2);
+    ASSERT_EQ(type[GETIDX(3)], 3);
+    ASSERT_EQ(type[GETIDX(4)], 4);
+    ASSERT_EQ(type[GETIDX(5)], 1);
+    ASSERT_EQ(type[GETIDX(6)], 2);
+    ASSERT_EQ(type[GETIDX(7)], 3);
+    ASSERT_EQ(type[GETIDX(8)], 4);
+    ASSERT_EQ(ellipsoid[GETIDX(1)], -1);
+    ASSERT_EQ(ellipsoid[GETIDX(2)], 0);
+    ASSERT_EQ(ellipsoid[GETIDX(3)], 1);
+    ASSERT_EQ(ellipsoid[GETIDX(4)], 2);
+    ASSERT_EQ(ellipsoid[GETIDX(5)], -1);
+    ASSERT_EQ(ellipsoid[GETIDX(6)], 3);
+    ASSERT_EQ(ellipsoid[GETIDX(7)], 4);
+    ASSERT_EQ(ellipsoid[GETIDX(8)], 5);
+    EXPECT_NEAR(bonus[3].shape[0], 0.5, EPSILON);
+    EXPECT_NEAR(bonus[3].block[0], 2.0, EPSILON);
+    EXPECT_NEAR(bonus[3].block[1], 2.0, EPSILON);
+    EXPECT_NEAR(bonus[4].shape[0], 1.5, EPSILON);
+    EXPECT_NEAR(bonus[4].block[0], 4.0, EPSILON);
+    EXPECT_NEAR(bonus[4].block[1], 3.0, EPSILON);
+    EXPECT_NEAR(bonus[5].shape[0], 1.0, EPSILON);
+    EXPECT_NEAR(bonus[5].block[0], 3.5, EPSILON);
+    EXPECT_NEAR(bonus[5].block[1], 3.5, EPSILON);
+    EXPECT_NEAR(bonus[5].type, 1, EPSILON);
+}
+
 TEST_F(AtomStyleTest, line)
 {
     if (!Info::has_package("ASPHERE")) GTEST_SKIP();
diff --git a/unittest/utils/CMakeLists.txt b/unittest/utils/CMakeLists.txt
index 5d86bc592b1..cb07e9e3ca2 100644
--- a/unittest/utils/CMakeLists.txt
+++ b/unittest/utils/CMakeLists.txt
@@ -154,3 +154,10 @@ endif()
 # =============================================================================
 # End of FFT Testing Infrastructure
 # =============================================================================
+
+# Extra math tests for superellipsoids
+if(PKG_ASPHERE)
+  add_executable(test_math_extra_superellipsoids test_math_extra_superellipsoids.cpp)
+  target_link_libraries(test_math_extra_superellipsoids PRIVATE lammps GTest::GMockMain)
+  add_test(NAME MathExtraSuperellipsoids COMMAND test_math_extra_superellipsoids)
+endif()
\ No newline at end of file
diff --git a/unittest/utils/test_math_extra_superellipsoids.cpp b/unittest/utils/test_math_extra_superellipsoids.cpp
new file mode 100644
index 00000000000..32dbba88b4f
--- /dev/null
+++ b/unittest/utils/test_math_extra_superellipsoids.cpp
@@ -0,0 +1,275 @@
+/* ----------------------------------------------------------------------
+   LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
+   https://www.lammps.org/, Sandia National Laboratories
+   LAMMPS Development team: developers@lammps.org
+
+   Copyright (2003) Sandia Corporation.  Under the terms of Contract
+   DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
+   certain rights in this software.  This software is distributed under
+   the GNU General Public License.
+
+   See the README file in the top-level LAMMPS directory.
+------------------------------------------------------------------------- */
+
+#include "../../src/ASPHERE/math_extra_superellipsoids.h"
+#include "../../src/math_extra.h"
+#include "gmock/gmock.h"
+#include "gtest/gtest.h"
+#include <cmath>
+#include <limits>
+#include <vector>
+// TODO: consider making a fixture with several setup functions?
+
+static constexpr double EPSILON      = 1e-4;
+static constexpr double SOLV_EPSILON = std::numeric_limits<double>::epsilon() * 100;
+
+TEST(HandwrittenSolver, invertible)
+{
+    double A[16] = {4, 2, 1, 3, 0, 5, 2, 1, 1, 0, 3, 2, 2, 1, 0, 4};
+
+    double b[4] = {23.0, 20.0, 18.0, 20.0};
+
+    double expected_solution[4] = {1.0, 2.0, 3.0, 4.0};
+
+    bool success = MathExtraSuperellipsoids::solve_4x4_robust_unrolled(A, b);
+
+    ASSERT_TRUE(success) << "The solver falsely flagged an invertible matrix as singular.";
+
+    for (int i = 0; i < 4; ++i) {
+        ASSERT_NEAR(b[i], expected_solution[i], SOLV_EPSILON) << "Failed at index " << i;
+    }
+}
+
+TEST(ContactPointAndNormal, sphere)
+{
+    // First grain
+    double xci[3]    = {1.0, 5.246, 3.123};
+    double ri        = 2.5;
+    double shapei[3] = {ri, ri, ri};
+    double Ri[3][3]  = {{1.0, 0.0, 0.0}, {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}};
+    double blocki[2] = {2.0, 2.0};
+    int flagi        = 0;
+
+    // Second grains
+    double xcj[3]    = {2.0, -1.562, 4.607};
+    double rj        = 1.25;
+    double shapej[3] = {rj, rj, rj};
+    double Rj[3][3]  = {{1.0, 0.0, 0.0}, {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}};
+    double blockj[2] = {2.0, 2.0};
+    int flagj        = 0;
+
+    // Analytical solution
+    double X0_analytical[4]  = {rj * xci[0] / (ri + rj) + ri * xcj[0] / (ri + rj),
+                                rj * xci[1] / (ri + rj) + ri * xcj[1] / (ri + rj),
+                                rj * xci[2] / (ri + rj) + ri * xcj[2] / (ri + rj), rj / ri};
+    double nij_analytical[3] = {xcj[0] - xci[0], xcj[1] - xci[1], xcj[2] - xci[2]};
+    MathExtra::norm3(nij_analytical);
+
+    int method = MathExtraSuperellipsoids::FORMULATION_ALGEBRAIC;
+
+    // Contact detection
+    double X0[4] = {0.0, 0.0, 0.0, 0.0}, nij[3];
+    MathExtraSuperellipsoids::determine_contact_point(xci, Ri, shapei, blocki, flagi, xcj, Rj,
+                                                      shapej, blockj, flagj, X0, nij, method);
+
+    ASSERT_NEAR(X0[0], X0_analytical[0], EPSILON);
+    ASSERT_NEAR(X0[1], X0_analytical[1], EPSILON);
+    ASSERT_NEAR(X0[2], X0_analytical[2], EPSILON);
+    ASSERT_NEAR(X0[3], X0_analytical[3], EPSILON);
+
+    ASSERT_NEAR(nij[0], nij_analytical[0], EPSILON);
+    ASSERT_NEAR(nij[1], nij_analytical[1], EPSILON);
+    ASSERT_NEAR(nij[2], nij_analytical[2], EPSILON);
+
+    // Rotational invariance
+    double anglei   = 0.456;
+    double axisi[3] = {1, 2, 3};
+    MathExtra::norm3(axisi);
+    double quati[4] = {std::cos(anglei), std::sin(anglei) * axisi[0], std::sin(anglei) * axisi[1],
+                       std::sin(anglei) * axisi[2]};
+    MathExtra::quat_to_mat(quati, Ri);
+
+    double anglej   = 0.123;
+    double axisj[3] = {-1, 2, 1};
+    MathExtra::norm3(axisj);
+    double quatj[4] = {std::cos(anglej), std::sin(anglej) * axisj[0], std::sin(anglej) * axisj[1],
+                       std::sin(anglej) * axisj[2]};
+    MathExtra::quat_to_mat(quatj, Rj);
+
+    X0[0] = X0[1] = X0[2] = X0[3] = 0.0;
+    MathExtraSuperellipsoids::determine_contact_point(xci, Ri, shapei, blocki, flagi, xcj, Rj,
+                                                      shapej, blockj, flagj, X0, nij, method);
+
+    ASSERT_NEAR(X0[0], X0_analytical[0], EPSILON) << "Method: " << method;
+    ASSERT_NEAR(X0[1], X0_analytical[1], EPSILON) << "Method: " << method;
+    ASSERT_NEAR(X0[2], X0_analytical[2], EPSILON) << "Method: " << method;
+    ASSERT_NEAR(X0[3], X0_analytical[3], EPSILON) << "Method: " << method;
+
+    ASSERT_NEAR(nij[0], nij_analytical[0], EPSILON);
+    ASSERT_NEAR(nij[1], nij_analytical[1], EPSILON);
+    ASSERT_NEAR(nij[2], nij_analytical[2], EPSILON);
+}
+
+TEST(ContactPointAndNormal, supersphere_mono)
+{
+    double r        = 3.456;
+    double xci[3]   = {-2 * r, 0.0, 0.0};
+    double xcj[3]   = {2 * r, 0.0, 0.0};
+    double shape[3] = {r, r, r};
+    double R[3][3]  = {{1.0, 0.0, 0.0}, {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}};
+
+    std::vector<double> blocks = {2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0};
+    int method                 = MathExtraSuperellipsoids::FORMULATION_ALGEBRAIC;
+
+    // Analytical solution
+    double X0_analytical[4]  = {0.0, 0.0, 0.0, 1.0};
+    double nij_analytical[3] = {1.0, 0.0, 0.0};
+
+    for (auto n : blocks) {
+        double block[2] = {n, n};
+        int flag        = (n < 2.01) ? 0 : 1;
+
+        // Contact detection
+        // Some starting point away from (0,0,0). Possibly bad initial guess so test is demanding
+        double X0[4] = {r, -r, 2 * r, 0.0}, nij[3];
+
+        int status = MathExtraSuperellipsoids::determine_contact_point(
+            xci, R, shape, block, flag, xcj, R, shape, block, flag, X0, nij, method);
+
+        ASSERT_NEAR(X0[0], X0_analytical[0], EPSILON)
+            << "Method: " << method << " | n: " << n << " | status: " << status << " | X0: ["
+            << X0[0] << ", " << X0[1] << ", " << X0[2] << ", " << X0[3] << "]";
+        ASSERT_NEAR(X0[0], X0_analytical[0], EPSILON) << "Method: " << method;
+        ASSERT_NEAR(X0[1], X0_analytical[1], EPSILON) << "Method: " << method;
+        ASSERT_NEAR(X0[2], X0_analytical[2], EPSILON) << "Method: " << method;
+        ASSERT_NEAR(X0[3], X0_analytical[3], EPSILON) << "Method: " << method;
+
+        ASSERT_NEAR(nij[0], nij_analytical[0], EPSILON);
+        ASSERT_NEAR(nij[1], nij_analytical[1], EPSILON);
+        ASSERT_NEAR(nij[2], nij_analytical[2], EPSILON);
+    }
+}
+
+TEST(ContactPointAndNormal, sphere_geometric)
+{
+    // First grain
+    double ri        = 2.5;
+    double rj        = 1.25;
+    double overlap   = -0.5;
+    double xci[3]    = {-(ri - overlap / 2.0), 0.0, 0.0};
+    double shapei[3] = {ri, ri, ri};
+    double Ri[3][3]  = {{1.0, 0.0, 0.0}, {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}};
+    double blocki[2] = {2.0, 2.0};
+    int flagi        = 0;
+
+    // Second grains
+    double xcj[3] = {rj - overlap / 2.0, 0.0, 0.0};
+
+    double shapej[3] = {rj, rj, rj};
+    double Rj[3][3]  = {{1.0, 0.0, 0.0}, {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}};
+    double blockj[2] = {2.0, 2.0};
+    int flagj        = 0;
+
+    // Analytical solution
+    double X0_analytical[4]  = {0.0, 0.0, 0.0, 1.0};
+    double nij_analytical[3] = {xcj[0] - xci[0], xcj[1] - xci[1], xcj[2] - xci[2]};
+    MathExtra::norm3(nij_analytical);
+
+    int method = MathExtraSuperellipsoids::FORMULATION_GEOMETRIC;
+
+    // Contact detection
+    double X0[4] = {.1, .1, .1, 1.0}, nij[3];
+    MathExtraSuperellipsoids::determine_contact_point(xci, Ri, shapei, blocki, flagi, xcj, Rj,
+                                                      shapej, blockj, flagj, X0, nij, method);
+
+    ASSERT_NEAR(X0[0], X0_analytical[0], EPSILON);
+    ASSERT_NEAR(X0[1], X0_analytical[1], EPSILON);
+    ASSERT_NEAR(X0[2], X0_analytical[2], EPSILON);
+    ASSERT_NEAR(X0[3], X0_analytical[3], EPSILON);
+
+    ASSERT_NEAR(nij[0], nij_analytical[0], EPSILON);
+    ASSERT_NEAR(nij[1], nij_analytical[1], EPSILON);
+    ASSERT_NEAR(nij[2], nij_analytical[2], EPSILON);
+
+    // Rotational invariance
+    double anglei   = 0.456;
+    double axisi[3] = {1, 2, 3};
+    MathExtra::norm3(axisi);
+    double quati[4] = {std::cos(anglei), std::sin(anglei) * axisi[0], std::sin(anglei) * axisi[1],
+                       std::sin(anglei) * axisi[2]};
+    MathExtra::quat_to_mat(quati, Ri);
+
+    double anglej   = 0.123;
+    double axisj[3] = {-1, 2, 1};
+    MathExtra::norm3(axisj);
+    double quatj[4] = {std::cos(anglej), std::sin(anglej) * axisj[0], std::sin(anglej) * axisj[1],
+                       std::sin(anglej) * axisj[2]};
+    MathExtra::quat_to_mat(quatj, Rj);
+
+    X0[0] = X0[1] = X0[2] = X0[3] = 0.0;
+    MathExtraSuperellipsoids::determine_contact_point(xci, Ri, shapei, blocki, flagi, xcj, Rj,
+                                                      shapej, blockj, flagj, X0, nij, method);
+
+    ASSERT_NEAR(X0[0], X0_analytical[0], EPSILON) << "Method: " << method;
+    ASSERT_NEAR(X0[1], X0_analytical[1], EPSILON) << "Method: " << method;
+    ASSERT_NEAR(X0[2], X0_analytical[2], EPSILON) << "Method: " << method;
+    ASSERT_NEAR(X0[3], X0_analytical[3], EPSILON) << "Method: " << method;
+
+    ASSERT_NEAR(nij[0], nij_analytical[0], EPSILON);
+    ASSERT_NEAR(nij[1], nij_analytical[1], EPSILON);
+    ASSERT_NEAR(nij[2], nij_analytical[2], EPSILON);
+}
+
+TEST(ContactPointAndNormal, supersphere_poly_geometric)
+{
+    double r1      = 3.456;
+    double r2      = 2.0 * r1; // Polydisperse: radius_2 = 3 * radius_1
+    double overlap = r1 / 10.0;
+    double xci[3]  = {-(r1 - overlap / 2.0), 0.0, 0.0};
+    double xcj[3]  = {r2 - overlap / 2.0, 0.0, 0.0};
+
+    double shapei[3] = {r1, r1, r1};
+    double shapej[3] = {r2, r2, r2};
+
+    // Identity Rotation
+    double R[3][3] = {{1.0, 0.0, 0.0}, {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}};
+
+    std::vector<double> blocks = {
+        2.0, 3.0, 4.0, 5.0, 6.0,
+        7.0, 8.0, 9.0, 10.0}; // test would no converge for higher n if not starting along the line
+                              // connecting the centers
+    int method = MathExtraSuperellipsoids::FORMULATION_GEOMETRIC;
+
+    double nij_analytical[3] = {1.0, 0.0, 0.0};
+    double X0_analytical[4]  = {0.0, 0.0, 0.0, 1.0};
+
+    for (auto n : blocks) {
+        double block[2] = {n, n};
+        int flag        = (n < 2.01) ? 0 : 1;
+
+        // Initial Guess: Offset from 0 to test convergence
+        double X0[4] = {overlap, overlap, overlap, 1.0 / 2.0}, nij[3];
+        int status   = MathExtraSuperellipsoids::determine_contact_point(
+            xci, R, shapei, block, flag, xcj, R, shapej, block, flag, X0, nij, method);
+
+        ASSERT_NEAR(X0[0], X0_analytical[0], EPSILON)
+            << "Method: " << method << " | n: " << n << " | status: " << status << " | X0: ["
+            << X0[0] << ", " << X0[1] << ", " << X0[2] << ", " << X0[3] << "]";
+
+        ASSERT_EQ(status, 0) << "Failed to converge/detect contact for n=" << n;
+
+        ASSERT_NEAR(X0[0], X0_analytical[0], EPSILON) << "Position X failed for n=" << n;
+        ASSERT_NEAR(X0[1], X0_analytical[1], EPSILON) << "Position Y failed for n=" << n;
+        ASSERT_NEAR(X0[2], X0_analytical[2], EPSILON) << "Position Z failed for n=" << n;
+        ASSERT_NEAR(X0[3], X0_analytical[3], EPSILON) << "Lagrange Multiplier failed for n=" << n;
+
+        ASSERT_NEAR(nij[0], nij_analytical[0], EPSILON) << "Normal X failed for n=" << n;
+        ASSERT_NEAR(nij[1], nij_analytical[1], EPSILON) << "Normal Y failed for n=" << n;
+        ASSERT_NEAR(nij[2], nij_analytical[2], EPSILON) << "Normal Z failed for n=" << n;
+    }
+}
+
+// TODO: supersphere_mono with grains overlapping
+// TODO: supersphere_poly with grains overlapping
+// TODO: more
+// for polydisperse solution should be at the radii ratio