diff --git a/extensions/2.0/Khronos/KHR_gaussian_splatting/README.md b/extensions/2.0/Khronos/KHR_gaussian_splatting/README.md new file mode 100644 index 0000000000..cccf932263 --- /dev/null +++ b/extensions/2.0/Khronos/KHR_gaussian_splatting/README.md @@ -0,0 +1,620 @@ + + +# KHR\_gaussian\_splatting + +## Contributors + +- Jason Sobotka, Cesium +- Renaud Keriven, Cesium +- Adam Morris, Cesium +- Sean Lilley, Cesium +- Projit Bandyopadhyay, Niantic Spatial +- Daniel Knoblauch, Niantic Spatial +- Ronald Poirrier, Esri +- Jean-Philippe Pons, Esri +- Alexey Knyazev, Khronos +- Marco Hutter, Independent +- Arseny Kapoulkine, Independent +- Nathan Morrical, Nvidia +- Norbert Nopper, Huawei +- Zehui Lin, Huawei +- Chenxi Tu, Huawei +- Michael Nikelsky, Autodesk + +## Status + +Release Candidate + +## Dependencies + +Written against the glTF 2.0 spec. + +## Table of Contents + +- [Contributors](#contributors) +- [Status](#status) +- [Dependencies](#dependencies) +- [Overview](#overview) +- [Adding 3D Gaussian Splats to Primitives](#adding-3d-gaussian-splats-to-primitives) +- [Geometry Type](#geometry-type) +- [Mathematics of rendering using the default Ellipse Kernel](#mathematics-of-rendering-using-the-default-ellipse-kernel) +- [Lighting](#lighting) +- [glTF JSON Example](#gltf-json-example) +- [Extension Properties](#extension-properties) +- [Attributes](#attributes) +- [Extending the Base Extension](#extending-the-base-extension) +- [Appendix A: Spherical Harmonics Reference](#appendix-a-spherical-harmonics-reference) +- [Known Implementations](#known-implementations) +- [Resources](#resources) + +## Overview + +This extension defines basic support for storing 3D Gaussian splats in glTF, bringing structure and conformity to the 3D Gaussian splatting space. 3D Gaussian splatting uses fields of Gaussians that can be treated as a point cloud for the purposes of storage. 3D Gaussian splats are defined by their position, rotation, scale, and spherical harmonics which provide both diffuse and specular color. These values are stored as values on a point primitive. Since we treat the 3D Gaussian splats as points primitives, a graceful fallback to treating the data as a sparse point cloud is possible. + +A key objective of this extension is to establish a solid foundation for integrating 3D Gaussian splatting into glTF, while enabling future enhancements and innovation. To achieve this, the extension is intentionally designed to be extended itself, allowing extensions to introduce new kernel types, color spaces, projection methods, and sorting methods over time as 3D Gaussian splatting techniques evolve and become standards within the glTF ecosystem. + +## Adding 3D Gaussian Splats to Primitives + +When a primitive contains an `extension` property defining `KHR_gaussian_splatting`, this indicates to the client that the primitive should be treated as a 3D Gaussian splatting field. + +The extension must be listed in `extensionsUsed`: + +```json + "extensionsUsed" : [ + "KHR_gaussian_splatting" + ] +``` + +Other extensions that depend on this extension such as 3D Gaussian splatting compression extensions may require that this extension be included in `extensionsRequired`. + +## Geometry Type + +The `mode` of the `primitive` must be `POINTS`. + +## Mathematics of rendering using the default Ellipse Kernel + +To render a field of 3D Gaussian splats, the renderer must reconstruct each Gaussian splat using the same forward pass algorithm used during training. This involves using the position, scale, rotation, opacity, and spherical harmonics attributes to reconstruct the Gaussian splat in 3D space. + +There are three key stages to reconstructing and rendering a 3D Gaussian splat: + + 1. [3D Gaussian Representation](#3d-gaussian-representation) + 2. [Projection of 3D Gaussian onto 2D Kernel](#projection-of-3d-gaussian-onto-2d-kernel) + 3. [Rendering: Sorting and Alpha Blending](#rendering-sorting-and-alpha-blending) + +This extension defines a default `ellipse` kernel type that is based on the kernel defined in [3D Gaussian Splatting for Real-Time Radiance Field Rendering](https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/). See the [Ellipse Kernel](#ellipse-kernel) section for more details on how this is defined. + +### 3D Gaussian Representation + +Each Gaussian splat using the default `ellipse` kernel represents a 3D Gaussian defined by the following equation: + +```math +G(x) = \exp\left(-\frac{1}{2}(x-\mu)^T \Sigma^{-1} (x-\mu)\right) +``` + +Where: +- $G(x)$ is the value of the Gaussian at position $x$. +- $x$ is a 3D position vector. +- $\mu$ is the mean vector, representing the center of the Gaussian. +- $\Sigma$ is the 3x3 covariance matrix, defining the Gaussian's size, shape, and orientation. + +For stable optimization and direct manipulation, the 3x3 covariance matrix $\Sigma$ is constructed from a rotation (quaternion) and a scaling vector. Using eigendecomposition of a covariance matrix, we can express $\Sigma$ as: + +```math +\Sigma = \mathbf{R}\mathbf{S}\mathbf{S}^T\mathbf{R}^T +``` + +Where: +- $\mathbf{R}$ is a 3x3 rotation matrix derived from the quaternion. +- $\mathbf{S}$ is a 3x3 scaling matrix constructed from the scale vector. + +To derive the rotation matrix $\mathbf{R}$ from the quaternion $(q_x, q_y, q_z, q_w)$: + +```math +\mathbf{R} = \begin{pmatrix} +1 - 2(q_y^2 + q_z^2) & 2(q_x q_y - q_w q_z) & 2(q_x q_z + q_w q_y) \\ +2(q_x q_y + q_w q_z) & 1 - 2(q_x^2 + q_z^2) & 2(q_y q_z - q_w q_x) \\ +2(q_x q_z - q_w q_y) & 2(q_y q_z + q_w q_x) & 1 - 2(q_x^2 + q_y^2) +\end{pmatrix} +``` + +To derive the scale matrix $\mathbf{S}$ from the scale vector $(s_x, s_y, s_z)$: + +```math +\mathbf{S} = \begin{pmatrix} +s_x & 0 & 0 \\ +0 & s_y & 0 \\ +0 & 0 & s_z +\end{pmatrix} +``` + +### Projection of 3D Gaussian onto 2D Kernel + +To render the scene, each 3D Gaussian splat must be projected onto a 2D kernel shape based on the camera's perspective. This involves transforming the 3D covariance matrix $\Sigma$ into a 2D covariance matrix $\Sigma'$ that represents the Gaussian's shape on the image plane: + +```math +\Sigma' = \mathbf{J}\mathbf{W}\Sigma\mathbf{W}^T\mathbf{J}^T +``` + +Where: +- $\mathbf{W}$ is the view transformation matrix and performs a rigid transformation (rotation and translation) from the world space to the camera space. This is your standard world-to-camera extrinsic matrix. +- $\mathbf{J}$ is the Jacobian matrix of the projection transformation. This is responsible for perspective. + +The Jacobian matrix $\mathbf{J}$ for standard perspective projection is defined as: + +```math +\mathbf{J} = \begin{pmatrix} +\frac{f_x}{z} & 0 & -\frac{f_x x}{z^2} \\ +0 & \frac{f_y}{z} & -\frac{f_y y}{z^2} +\end{pmatrix} +``` + +### Rendering: Sorting and Alpha Blending + +Once the 2D projected Gaussian splats are computed, they must be sorted and alpha blended to produce the final image. The alpha-blending is order-dependent, so correct sorting is crucial for accurate rendering. + +The sorting method for the `ellipse` kernel is based on the depth value of each Gaussian, which is the z-coordinate in camera space. Sorting order is back-to-front based on this depth value and typically uses a radix sort for performance. See the [Sorting Method](#sorting-method) section for more details on sorting. + +Once sorted, the final color of each pixel is computed by alpha blending the splats in sorted order. The alpha blending equation is defined as: + +```math +C = \sum_{i \in \mathscr{N}} c_i \alpha_i \prod_{j=1}^{i-1} (1 - \alpha_j) +``` + +Where: +- $C$ is the final color of the pixel. +- $\mathscr{N}$ is the set of splats that contribute to the pixel. +- $c_i$ is the color of the $i$-th Gaussian. See [Lighting](#lighting) for details on how to compute this color from the spherical harmonics. +- $\alpha_i = \alpha \cdot G(x)$ where $G(x)$ is the value of the projected 2D Gaussian's probability density function evaluated at the pixel center $x$. +- $\prod_{j=1}^{i-1} (1 - \alpha_j)$ is the accumulated transmittance. + +*Non-normative Note: Within rasterizer implementations of 3DGS, transmittance can often be implicitly handled by the GPU's blending hardware. When using premultiplied alpha in WebGL2, using `glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA)` (or similar) will often work. Without premultiplied alpha, you can use `glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA)` (or similar) instead.* + +## Lighting + +At the time of writing, the most common method for lighting 3D Gaussian splats is via the real spherical harmonics. This extension defines attributes to store spherical harmonic coefficients for each splat. The zeroth-order spherical harmonic coefficients are always required. Higher order coefficients are optional. Each color channel has a separate coefficient, so for each degree $ℓ$, there are $(2ℓ + 1)$ coefficients, each containing RGB values. + +These rules may be relaxed by future extensions that define alternative lighting methods or have specific requirements for handling compression, such as when a compression method stores the diffuse color components as linear color values instead of the zeroth-order coefficients. + +### Image State & Relighting + +Image state is defined by ISO 22028-1:2016 and indicates the rendering state of the image data. **_Display-referred_** (also known as _output-referred_ in ISO 22028-1:2016) image state represents data that has gone through color-rendering appropriate for display. **_Scene-referred_** image state represents data that represents the actual radiance of the scene. + +The default `ellipse` kernel based on the original 3D Gaussian splatting paper typically uses the _BT.709-sRGB_ color space with a _display-referred_ image state for training and rendering. This is different than the typical glTF PBR material model, where scene-referred linear color spaces are used. This extension defines two display-referred color spaces but scene-referred color spaces may be added by extensions. See: [Available Color Spaces](#available-color-spaces) + +Implementations are allowed to relight the splats than the one they were trained in, but this may lead to visual differences compared to the original training results. + +### Calculating color from Spherical Harmonics + +The real spherical harmonics included with the [Condon–Shortley phase](https://mathworld.wolfram.com/Condon-ShortleyPhase.html) are used to calculate the color and specular of the Gaussians. 3D Gaussian splatting can use up to 45 spherical harmonic coefficients. Each coefficient is split into 3 color channels. This means that each degree has: + +```math +SH_{n} \times 3 = SH_{total} +``` + +Where $SH_{n}$ represents the number of coefficients for degree $n$. This gives us 3 coefficients for Degree 0, 9 for Degree 1, 15 for Degree 2, and finally 21 for Degree 3. + +The diffuse color of the splat can be computed by multiplying the RGB coefficients of the zeroth-order real spherical harmonic by the normalization constant value of $\approx0.282095$. This constant is derived from the formula for the real spherical harmonic of degree 0, order 0: + +```math +Y_{0,0}(θ, φ) = \frac{1}{2} \sqrt{\frac{1}{π}} ≈ 0.282095 +``` + +To keep the spherical harmonics within the [0, 1] range, the forward pass of the training process applies a _0.5_ bias to the DC component of the spherical harmonics. The rendering process must also apply this bias when reconstructing the color values from the spherical harmonics. This allows the training to occur around 0, ensuring numeric stability for the spherical harmonics, but also allows the coefficients to remain within a valid range for easy rendering. + +Ergo, to calculate the diffuse RGB color from the only the DC component, the formula is: + +```math +Color_{diffuse} = SH_{0,0} * 0.2820947917738781 + 0.5 +``` + +Where $SH_{0,0}$ represents the a vector of the RGB coefficients for the zeroth-order real spherical harmonic. + +The [Condon–Shortley phase](https://mathworld.wolfram.com/Condon-ShortleyPhase.html) is the $(-1)^m$ sign factor included in some spherical‑harmonic definitions to keep their algebraic and normalization properties consistent. This convention ensures the spherical harmonics are orthogonal and behave cleanly with rotation. As defined, odd values for order ($m$) are negated. For Degree 2 and Degree 3, this gives some of the constants negative signs. + +Subsequent degrees of spherical harmonics are used to compute more complex lighting effects, such as ambient occlusion and specular highlights, by evaluating the spherical harmonics at the appropriate angles based on the surface normal and light direction. As an example, the Degree 1 functions are: + +```math +\begin{aligned} +Y_{1,-1}(θ, φ) &= -\sqrt{\frac{3}{4\pi}} \cdot \frac{y}{r}\\ +Y_{1,0}(θ, φ) &= \sqrt{\frac{3}{4\pi}} \cdot \frac{z}{r}\\ +Y_{1,1}(θ, φ) &= -\sqrt{\frac{3}{4\pi}} \cdot \frac{x}{r}\\ +\end{aligned} +``` + +A full list of all spherical harmonic functions and constants including the Condon–Shortley phase can be found in [Appendix A: Spherical Harmonics Reference](#appendix-a-spherical-harmonics-reference). + +For all of these functions, $r$ represents the magnitude of the position vector, calculated as $r = \sqrt{x^2 + y^2 + z^2}$. Within 3D Gaussian splatting, normalization is used to ensure that the direction vectors are unit vectors. Therefore, $r$ is equal to $1$ when evaluating the spherical harmonics for lighting calculations. + +We can use these functions combined with the DC component to calculate the full color of a Gaussian: + +```math +\begin{aligned} +Color_{SH_{0}} =\,&SH_{0,0} \cdot 0.2820947917738781\\\\ +Color_{SH_{1}} =\,&SH_{1,-1} \cdot y \cdot -0.4886025119029199\,+\\ + &SH_{1,0} \cdot z \cdot 0.4886025119029199\,+\\ + &SH_{1,1} \cdot x \cdot -0.4886025119029199\\\\ +Color_{SH_{2}} =\,&SH_{2,-2} \cdot xy \cdot 1.092548430592079\,+\\ + &SH_{2,-1} \cdot yz \cdot -1.092548430592079\,+\\ + &SH_{2,0} \cdot (2z^2 - x^2 - y^2) \cdot 0.3153915652525200\,+\\ + &SH_{2,1} \cdot xz \cdot -1.092548430592079\,+\\ + &SH_{2,2} \cdot (x^2 - y^2) \cdot 0.5462742152960395\\\\ +Color_{SH_{3}} =\,&SH_{3,-3} \cdot y(3x^2 - y^2) \cdot -0.5900435899266435\,+\\ + &SH_{3,-2} \cdot xyz \cdot 2.890611442640554\,+\\ + &SH_{3,-1} \cdot y(4z^2 - x^2 - y^2) \cdot -0.4570457994644657\,+\\ + &SH_{3,0} \cdot z(2z^2 - 3x^2 - 3y^2) \cdot 0.3731763325901154\,+\\ + &SH_{3,1} \cdot x(4z^2 - x^2 - y^2) \cdot -0.4570457994644657\,+\\ + &SH_{3,2} \cdot z(x^2 - y^2) \cdot 1.445305721320277\,+\\ + &SH_{3,3} \cdot x(x^2 - 3y^2) \cdot -0.5900435899266435\\\\ +Color_{final} =\,&Color_{SH_{0}} + Color_{SH_{1}} + Color_{SH_{2}} + Color_{SH_{3}} + 0.5 +\end{aligned} +``` + +Where $SH_{\ell,m}$ represents the RGB spherical harmonic coefficients for a particular degree ($\ell$) and order ($m$) and $x$, $y$, and $z$ represent the independent parts of the direction vector. + +The zeroth-order spherical harmonic is always required to ensure that the diffuse color can be accurately reconstructed, but higher order spherical harmonics are optional. If higher order spherical harmonics are used, lower order spherical harmonics must also be provided. Each order of spherical harmonics must be fully defined; partial definitions are not allowed. + +Extensions extending this extension may define alternative lighting methods, have specific requirements for handling compression, or define different spherical harmonics handling. + +See [Appendix A: Spherical Harmonics Reference](#appendix-a-spherical-harmonics-reference) for an easy reference of the spherical harmonic basis functions and normalization constants. + +*Non-normative Note: For training software and exporters, it is recommended that the Gaussians are trained within the [glTF coordinate system](https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html#coordinate-system-and-units) when targeting glTF. Otherwise, when converting pretrained data from other coordinate systems into the [glTF coordinate system](https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html#coordinate-system-and-units), the positions, quaternions and spherical harmonics must be properly rotated.* + +## glTF JSON Example + +Partial glTF JSON example shown below including optional attributes and properties. This extension only affects any `primitive` nodes containing 3D Gaussian splat data. + +```json +"meshes": [{ + "primitives": [{ + "attributes": { + "POSITION": 0, + "COLOR_0": 1, + "KHR_gaussian_splatting:SCALE": 2, + "KHR_gaussian_splatting:ROTATION": 3, + "KHR_gaussian_splatting:OPACITY": 4, + "KHR_gaussian_splatting:SH_DEGREE_0_COEF_0": 5, + "KHR_gaussian_splatting:SH_DEGREE_1_COEF_0": 6, + "KHR_gaussian_splatting:SH_DEGREE_1_COEF_1": 7, + "KHR_gaussian_splatting:SH_DEGREE_1_COEF_2": 8 + }, + "mode": 0, + "extensions": { + "KHR_gaussian_splatting": { + "kernel": "ellipse", + "colorSpace": "srgb_rec709_display", + "sortingMethod": "cameraDistance", + "projection": "perspective" + } + } + }] +}] +``` + +## Extension Properties + +### Kernel + +Gaussian splats can have a variety of shapes and this has the potential to change over time. The `kernel` property is a required property that provides an indication to the renderer the properties of the kernel used. Renderers are free to ignore any values they do not recognize. + +Additional kernel types can be added over time by supplying an extension that defines an alternative shape definition and parameters. + +| Kernel Type | Description | +| --- | --- | +| ellipse | A 2D ellipse kernel used to project an ellipsoid shape in 3D space. | + +```json +"meshes": [{ + "primitives": [{ + // snip... + "extensions": { + "KHR_gaussian_splatting": { + "kernel": "ellipse" + } + } + }] +}] +``` + +#### Ellipse Kernel + +A 2D `ellipse` kernel type is often used to represent 3D Gaussian splats in an ellipsoid shape based on the kernel defined in [3D Gaussian Splatting for Real-Time Radiance Field Rendering](https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/). This simple type contains no parameters. This is the shape used by the reference renderer implementations for 3D Gaussian splatting. Following the original reference implementation this kernel assumes a _3σ_ cut-off (Mahalanobis distance of 3 units) for correct rendering. + +The mean vector for the Gaussian splat is provided by the position of the mesh primitive. This defines the center of the Gaussian splat ellipsoid in global space. + +The opacity of a Gaussian splat is defined by the `KHR_gaussian_splatting:OPACITY` attribute for this kernel. It stores a normalized value between _0.0_ (transparent) and _1.0_ (opaque). A sigmoid activation function applied during training ensures the value stays within this range. Out-of-range values are invalid. This guarantees that renderers can use the stored opacity directly for alpha blending without any extra processing. + +The scale (`KHR_gaussian_splatting:SCALE`) and rotation (`KHR_gaussian_splatting:ROTATION`) attributes define the size and orientation of the ellipsoid in 3D space. These attributes represent the covariance matrix of the Gaussian in a factored form. The scale attribute values correspond to the spread of the Gaussian along its local principal axes and the rotation attribute values correspond to the orientation of those axes in global space. + +`KHR_gaussian_splatting:SCALE` is stored in log-space, so the actual scale along each principal axis is computed as `exp(scale)`. This allows for representing a wide range of scales while maintaining numerical stability. + +`KHR_gaussian_splatting:ROTATION` is stored as a unit quaternion in the order (x, y, z, w), where `w` is the scalar component. This quaternion represents the rotation from the local space of the Gaussian to global space. + +Together, the scale and rotation can be used to reconstruct the full covariance matrix of the Gaussian splat for rendering purposes. Combined with the position attribute, these values define the identity and shape of the ellipsoid in 3D space. + +More details on how to interpret these attributes for rendering can be found in the [3D Gaussian Splatting for Real-Time Radiance Field Rendering](https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/) paper. + +See [Appendix A: Sample rendering with the base Ellipse Kernel and Spherical Harmonics](#appendix-a-sample-rendering-with-the-base-ellipse-kernel-and-spherical-harmonics) for more details on how to properly implement the ellipse kernel used by this extension. + +### Color Space + +The `colorSpace` property is a required property that specifies the color space of the 3D Gaussian Splat when spherical harmonics are being used for the lighting. The color space is typically determined by the training process for the splats. This color space value only applies to the 3D Gaussian splatting data and does not affect any other color data in the glTF. + +Unless specified otherwise by additional extensions, color space information refers to the reconstructed splat color values, therefore splat reconstruction and alpha blending must be performed on the attribute values as-is, before any color gamut or transfer function conversions. + +Additional values can be added over time by defining extensions to add new color spaces. See the section, [Extending the Base Extension](#extending-the-base-extension), for more information. + +#### Available Color Spaces + +| Color Space | Description | +| --- | --- | +| srgb_rec709_display | BT.709 sRGB (display-referred) color space. | +| lin_rec709_display | BT.709 linear (display-referred) color space. | + +*Non-normative Note: The string values for colorspace follow the color space identification pattern recommended by the ASWF Color Interoperability Forum (CIF). See: [ASWF Color Interoperability Forum](https://lf-aswf.atlassian.net/wiki/spaces/CIF/overview)* + +### Projection + +The `projection` property is an optional property that specifies how the Gaussians should be projected onto the kernel shape. This is typically provided by the training process for the splats. This property is meant to be extended in the future as new projections become standardized within the community. + +This base extension defines a single projection method, `perspective`, which is the default value. This keeps the behavior consistent with the original 3D Gaussian splatting paper. + +_Non-normative Note: See [the original 3D Gaussian Splatting paper](https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/) appendix A, "Details of Gradiant Computation," for more details on how the perspective projection is computed._ + +Additional values can be added over time by defining extensions to add new projection methods. See the section, [Extending the Base Extension](#extending-the-base-extension), for more information. + +#### Known Projection Methods + +| Projection Method | Description | +| --- | --- | +| perspective | (Default) The typical 3D perspective projection based on scene depth. | + +### Sorting Method + +The `sortingMethod` property is an optional property that specifies how the Gaussian particles should be sorted during the rendering process. This typically is provided by the training process for the splats. This property is meant to be extended in the future as new sorting methods become standardized within the community. + +This base extension defines a single sorting method, `cameraDistance`, which is the default value. This keeps the behavior consistent with the original 3D Gaussian splatting paper. + +Additional values can be added over time by defining extensions to add new sorting methods. See the section, [Extending the Base Extension](#extending-the-base-extension), for more information. + +#### Known Sorting Methods + +| Sorting Method | Description | +| --- | --- | +| cameraDistance | (Default) Sort the splats based on the length of the vector from the splat to the camera origin. | + +## Attributes + +| Splat Data | glTF Attribute | Accessor Type | Component Type | Required | Notes | +| --- | --- | --- | --- | --- | --- | +| Rotation | KHR_gaussian_splatting:ROTATION | VEC4 | _float_
_signed byte_ normalized
_signed short_ normalized | yes | Rotation is a quaternion with `w` as the scalar. (xyzw) | +| Scale | KHR_gaussian_splatting:SCALE | VEC3 | _float_
_signed byte_
_signed byte_ normalized
_signed short_
_signed short_ normalized | yes | | +| Opacity | KHR_gaussian_splatting:OPACITY | SCALAR | _float_
_unsigned byte_ normalized
_unsigned short_ normalized | yes | | +| Spherical Harmonics degree 0 | KHR_gaussian_splatting:SH_DEGREE_0_COEF_0 | VEC3 | _float_ | yes (unless using a different method for lighting) | | +| Spherical Harmonics degree 1 | KHR_gaussian_splatting:SH_DEGREE_1_COEF_n (n = 0 to 2) | VEC3 | _float_ | no (yes if degree 2 or 3 are used) | Packed from lowest order $m$ (-1) to highest (1). | +| Spherical Harmonics degree 2 | KHR_gaussian_splatting:SH_DEGREE_2_COEF_n (n = 0 to 4) | VEC3 | _float_ | no (yes if degree 3 is used) | Packed from lowest order $m$ (-2) to highest (2). | +| Spherical Harmonics degree 3 | KHR_gaussian_splatting:SH_DEGREE_3_COEF_n (n = 0 to 6) | VEC3 | _float_ | no | Packed from lowest order $m$ (-3) to highest (3). | + +### Basic Attributes + +Each 3D Gaussian splat has the following attributes. At minimum the attributes must contain `POSITION`, `KHR_gaussian_splatting:ROTATION`, `KHR_gaussian_splatting:SCALE`, and `KHR_gaussian_splatting:OPACITY`. `POSITION` is defined by the base glTF specification. + +The `KHR_gaussian_splatting:ROTATION` and `KHR_gaussian_splatting:SCALE` attributes support quantized storage using normalized signed `byte` or `short` component types to reduce file size. If quantization is not needed, content creators should use the `float` component type for maximum precision. + +The content of `KHR_gaussian_splatting:OPACITY`, `KHR_gaussian_splatting:ROTATION` and `KHR_gaussian_splatting:SCALE` are defined by their kernel. See the [Ellipse Kernel](#ellipse-kernel) section for more information for information about how these are defined for the default `ellipse` kernel. + +### Spherical Harmonics Attributes + +When spherical harmonics are being used for lighting, the coefficients for the diffuse component must be provided using the `KHR_gaussian_splatting:SH_DEGREE_0_COEF_0` attribute semantic. The zero-order spherical harmonic coefficients are always required to allow for properly handling cases where the diffuse color is not in the _BT.709_ color gamut. The `KHR_gaussian_splatting:SH_DEGREE_ℓ_COEF_n` attributes where ℓ > 0 hold the higher degrees of spherical harmonics data and are not required. If higher degrees of spherical harmonics are used then lower degrees are required implicitly. + +Each increasing degree of spherical harmonics requires more coefficients. At the 1st degree, 3 sets of coefficients are required, increasing to 5 sets for the 2nd degree, and increasing to 7 sets at the 3rd degree. With all 3 degrees, this results in 45 spherical harmonic coefficients stored in the `KHR_gaussian_splatting:SH_DEGREE_ℓ_COEF_n` attributes. Attributes are packed from the lowest order ($m$) to highest for each degree. (i.e. Degree 2 spherical harmonics are packed order ($m$) -2 coefficients, rgb, then order -1 coefficients, rgb, and so forth.) + +Spherical harmonic data is packed in an (r, g, b) format within the VEC3 accessor type. Each coefficient contains 3 values representing the red, green, and blue channels of the spherical harmonic coefficient. Spherical harmonic degrees cannot be partially defined. For example, if any degree 2 spherical harmonics attribute semantics are used, then all degree 2 and degree 1 spherical harmonic coefficients must be provided. + +### Improving Fallback with COLOR_0 + +To support better fallback functionality, the `COLOR_0` attribute semantic from the base glTF specification may be used to provide the diffuse color of the 3D Gaussian splat. This allows renderers to color the points in the sparse point cloud when 3D Gaussian splatting is not supported by a renderer. The value of `COLOR_0` is derived by multiplying the 3 diffuse color components of the 3D Gaussian splat with the constant zeroth-order spherical harmonic (ℓ = 0) for the RGB channels. The alpha channel should contain the opacity of the splat. + +*_Non-normative Note:_* If the spherical harmonics are in the BT.709 gamut, the diffuse color can be computed from the `KHR_gaussian_splatting:SH_DEGREE_0_COEF_0` attribute by multiplying each of the RGB components by the constant spherical harmonic value of _0.282095_. + +## Extending the Base Extension + +3D Gaussian splatting is an evolving technology with new techniques and methods being developed over time. To provide a solid foundation for 3D Gaussian splatting in glTF while allowing for future growth and innovation, this extension is designed to be extensible. New kernel types, color spaces, projection methods, and sorting methods can be added over time without requiring changes to the base extension. + +Extensions may define additional attributes or custom properties as needed to support new features. Attribute semantics should be prefixed with their respective extension name to avoid naming collisions. Extensions may also define additional values for the `kernel`, `colorSpace`, `projection`, and `sortingMethod` properties. Custom properties should be included in the body of the new extension object. + +*_Non-normative Note: It is possible to share data between two attributes by using the same accessor index for multiple attribute semantics. This can be useful to optimize the storage of data._* + +Compression extensions that operate on 3D Gaussian splatting data should extend this base extension to ensure compatibility. Compression extensions must define how the data can be decoded back into the base 3D Gaussian splatting format defined by this extension, but may also allow optimizations specific to their compression method. (e.g. passing textures or other data directly to the GPU for decoding.) + +To use an extensions that extends `KHR_gaussian_splatting`, the extension must be included within the `extensions` property of the `KHR_gaussian_splatting` extension object. The extension must also be listed in `extensionsUsed` at the top level of the glTF. + +Extension authors are encouraged to define fallback behaviors for renderers that do not recognize the new extension, but this is not strictly required. If a fallback is not possible, the extension should be listed in `extensionsRequired` to ensure that renderers that do not support the extension do not attempt to render the data incorrectly. + +#### Example: Adding additional Kernel Types + +*This section is non-normative.* + +In order to add additional kernel types, a new extension should be defined that extends `KHR_gaussian_splatting`. This new extension would define the new kernel type and any parameters it may require. A renderer that recognizes the new kernel type can then use the parameters defined in the new extension to render the splats appropriately. Renderers that do not recognize the new kernel type should fall back to the default `ellipse` type. + +For example, a new extension `EXT_gaussian_splatting_kernel_customShape` could be defined that adds a new kernel type `customShape` with additional parameters. + +```json +"meshes": [{ + "primitives": [{ + // ...omitted for brevity... + "extensions": { + "KHR_gaussian_splatting": { + "kernel": "customShape", + "extensions": { + "EXT_gaussian_splatting_kernel_customShape": { + "customParameter1": 1.0, + "customParameter2": [0.0, 1.0, 0.0] + } + } + }, + } + }] +}] +``` + +If the kernel type requires additional attributes, those attributes should be defined within the new extension using unique semantics to avoid collisions. + +```json +"meshes": [{ + "primitives": [{ + "attributes": { + "POSITION": 0, + "KHR_gaussian_splatting:SCALE": 1, + "KHR_gaussian_splatting:ROTATION": 2, + "EXT_gaussian_splatting_kernel_customShape:CUSTOM_ATTRIBUTE": 3 + }, + // ...omitted for brevity... + "extensions": { + "KHR_gaussian_splatting": { + "kernel": "customShape", + "extensions": { + "EXT_gaussian_splatting_kernel_customShape": { + "customParameter1": 1.0 + } + } + } + } + }] +}] +``` + +The extension must also be listed in `extensionsUsed` at the top level of the glTF. + +```json + "extensionsUsed" : [ + "KHR_gaussian_splatting", + "EXT_gaussian_splatting_kernel_customShape" + ] +``` + +## Appendix A: Spherical Harmonics Reference + +*This appendix is non-normative and provided for informational purposes only.* + +### Real Spherical Harmonic Basis Functions + +Degrees $0$ through $3$, in cartesian space. Including the [Condon–Shortley phase](https://mathworld.wolfram.com/Condon-ShortleyPhase.html) $(-1)^m$. + +```math +\textbf{Degree 0, ℓ = 0}\\ +\begin{aligned} +Y_{0,0}(θ, φ) &= \frac{1}{2} \sqrt{\frac{1}{\pi}}\\\\ +\end{aligned} +``` + +```math +\textbf{Degree 1, ℓ = 1}\\ +\begin{aligned} +Y_{1,-1}(θ, φ) &= -\sqrt{\frac{3}{4\pi}} \cdot \frac{y}{r}\\ +Y_{1,0}(θ, φ) &= \sqrt{\frac{3}{4\pi}} \cdot \frac{z}{r}\\ +Y_{1,1}(θ, φ) &= -\sqrt{\frac{3}{4\pi}} \cdot \frac{x}{r}\\\\ +\end{aligned} +``` + +```math +\textbf{Degree 2, ℓ = 2}\\ +\begin{aligned} +Y_{2,-2}(θ, φ) &= \frac{1}{2} \sqrt{\frac{15}{\pi}} \cdot \frac{xy}{r^2}\\ +Y_{2,-1}(θ, φ) &= -\frac{1}{2} \sqrt{\frac{15}{\pi}} \cdot -\frac{yz}{r^2}\\ +Y_{2,0}(θ, φ) &= \frac{1}{4} \sqrt{\frac{5}{\pi}} \cdot \frac{3z^2 - r^2}{r^2}\\ +Y_{2,1}(θ, φ) &= -\frac{1}{2} \sqrt{\frac{15}{\pi}} \cdot -\frac{xz}{r^2}\\ +Y_{2,2}(θ, φ) &= \frac{1}{4} \sqrt{\frac{15}{\pi}} \cdot \frac{x^2 - y^2}{r^2}\\\\ +\end{aligned} +``` + +```math +\textbf{Degree 3, ℓ = 3}\\ +\begin{aligned} +Y_{3,-3}(θ, φ) &= -\frac{1}{4} \sqrt{\frac{35}{2\pi}} \cdot \frac{y(3x^2 - y^2)}{r^3}\\ +Y_{3,-2}(θ, φ) &= \frac{1}{2} \sqrt{\frac{105}{\pi}} \cdot \frac{xyz}{r^3}\\ +Y_{3,-1}(θ, φ) &= -\frac{1}{4} \sqrt{\frac{21}{2\pi}} \cdot \frac{y(5z^2 - r^2)}{r^3}\\ +Y_{3,0}(θ, φ) &= \frac{1}{4} \sqrt{\frac{7}{\pi}} \cdot \frac{z(5z^2 - 3r^2)}{r^3}\\ +Y_{3,1}(θ, φ) &= -\frac{1}{4} \sqrt{\frac{21}{2\pi}} \cdot \frac{x(5z^2 - r^2)}{r^3}\\ +Y_{3,2}(θ, φ) &= \frac{1}{4} \sqrt{\frac{105}{\pi}} \cdot \frac{z(x^2 - y^2)}{r^3}\\ +Y_{3,3}(θ, φ) &= -\frac{1}{4} \sqrt{\frac{35}{2\pi}} \cdot \frac{x(x^2 - 3y^2)}{r^3}\\ +\end{aligned} +``` + +### Table of Constants + +| Used In | Expression | Approximate Constant | +| --- | --- | --- | +| $Y_{0,0}(θ, φ)$ | $\frac{1}{2} \sqrt{\frac{1}{\pi}}$ | $0.2820947917738781$ | +| $Y_{1,-1}(θ, φ)$
$Y_{1,1}(θ, φ)$ | $\sqrt{\frac{3}{4\pi}}$ | $-0.4886025119029199$ | +| $Y_{1,0}(θ, φ)$ | $\sqrt{\frac{3}{4\pi}}$ | $0.4886025119029199$ | +| $Y_{2,-2}(θ, φ)$ | $\frac{1}{2} \sqrt{\frac{15}{\pi}}$ | $1.092548430592079$ | +| $Y_{2,-1}(θ, φ)$
$Y_{2,1}(θ, φ)$ | $-\frac{1}{2} \sqrt{\frac{15}{\pi}}$ | $-1.092548430592079$ | +| $Y_{2,0}(θ, φ)$ | $\frac{1}{4} \sqrt{\frac{5}{\pi}}$ | $0.3153915652525200$ | +| $Y_{2,2}(θ, φ)$ | $\frac{1}{4} \sqrt{\frac{15}{\pi}}$ | $0.5462742152960395$ | +| $Y_{3,-3}(θ, φ)$
$Y_{3,3}(θ, φ)$ | $-\frac{1}{4} \sqrt{\frac{35}{2\pi}}$ | $-0.5900435899266435$ | +| $Y_{3,-2}(θ, φ)$
$Y_{3,2}(θ, φ)$ | $\frac{1}{2} \sqrt{\frac{105}{\pi}}$ | 2.890611442640554 | +| $Y_{3,-1}(θ, φ)$
$Y_{3,1}(θ, φ)$ | $-\frac{1}{4} \sqrt{\frac{21}{2\pi}}$ | $-0.4570457994644657$ | +| $Y_{3,0}(θ, φ)$ | $\frac{1}{4} \sqrt{\frac{7}{\pi}}$ | $0.3731763325901154$ | +| $Y_{3,2}(θ, φ)$ | $\frac{1}{4} \sqrt{\frac{105}{\pi}}$ | $1.445305721320277$ | + +## Known Implementations + +*TODO: Add known implementations before final ratification.* + +*NOTE: If you are a developer of a glTF renderer that implements this extension, please open an issue in the glTF GitHub repository to have your implementation listed here.* + +## Resources + +- [3D Gaussian Splatting for Real-Time Radiance Field Rendering](https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/) +- [Gaussian Splatting Algorithm Details](https://github.com/joeyan/gaussian_splatting/blob/main/MATH.md) + +## Full Khronos Copyright Statement + +Copyright 2026 The Khronos Group Inc. + +This Specification is protected by copyright laws and contains material proprietary +to Khronos. Except as described by these terms, it or any components +may not be reproduced, republished, distributed, transmitted, displayed, broadcast +or otherwise exploited in any manner without the express prior written permission +of Khronos. + +Khronos grants a conditional copyright license to use and reproduce the unmodified +Specification for any purpose, without fee or royalty, EXCEPT no licenses to any patent, +trademark or other intellectual property rights are granted under these terms. + +Khronos makes no, and expressly disclaims any, representations or warranties, +express or implied, regarding this Specification, including, without limitation: +merchantability, fitness for a particular purpose, non-infringement of any +intellectual property, correctness, accuracy, completeness, timeliness, and +reliability. Under no circumstances will Khronos, or any of its Promoters, +Contributors or Members, or their respective partners, officers, directors, +employees, agents or representatives be liable for any damages, whether direct, +indirect, special or consequential damages for lost revenues, lost profits, or +otherwise, arising from or in connection with these materials. + +This specification has been created under the Khronos Intellectual Property Rights +Policy, which is Attachment A of the Khronos Group Membership Agreement available at +https://www.khronos.org/files/member_agreement.pdf. Khronos grants a conditional +copyright license to use and reproduce the unmodified specification for any purpose, +without fee or royalty, EXCEPT no licenses to any patent, trademark or other +intellectual property rights are granted under these terms. Parties desiring to +implement the specification and make use of Khronos trademarks in relation to that +implementation, and receive reciprocal patent license protection under the Khronos +IP Policy must become Adopters and confirm the implementation as conformant under +the process defined by Khronos for this specification; +see https://www.khronos.org/conformance/adopters/file-format-adopter-program. + +Where this Specification identifies specific sections of external references, only those +specifically identified sections define normative functionality. The Khronos Intellectual +Property Rights Policy excludes external references to materials and associated enabling +technology not created by Khronos from the Scope of this Specification, and any licenses +that may be required to implement such referenced materials and associated technologies +must be obtained separately and may involve royalty payments. + +Khronos® is a registered trademark, and glTF™ is a trademark of The Khronos Group Inc. All +other product names, trademarks, and/or company names are used solely for identification +and belong to their respective owners. + diff --git a/extensions/2.0/Khronos/KHR_gaussian_splatting/schema/mesh.primitive.KHR_gaussian_splatting.schema.json b/extensions/2.0/Khronos/KHR_gaussian_splatting/schema/mesh.primitive.KHR_gaussian_splatting.schema.json new file mode 100644 index 0000000000..8cdc3c6759 --- /dev/null +++ b/extensions/2.0/Khronos/KHR_gaussian_splatting/schema/mesh.primitive.KHR_gaussian_splatting.schema.json @@ -0,0 +1,69 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "title": "KHR_gaussian_splatting glTF Mesh Primitive Extension", + "type": "object", + "description": "Data defining a 3D Gaussian Splat primitive.", + "allOf": [ + { + "$ref": "glTFProperty.schema.json" + } + ], + "properties": { + "kernel": { + "type": "string", + "description": "Property specifying parameters regarding the kernel used to generate the Gaussians.", + "anyOf": [ + { + "const": "ellipse" + }, + { + "type": "string" + } + ] + }, + "colorSpace": { + "type": "string", + "description": "Property specifying the color space of the spherical harmonics.", + "anyOf": [ + { + "const": "srgb_rec709_display" + }, + { + "const": "lin_rec709_display" + }, + { + "type": "string" + } + ] + }, + "projection": { + "type": "string", + "description": "Optional property specifying how to project the Gaussians to achieve a perspective correct value. This property defaults to perspective.", + "default": "perspective", + "anyOf": [ + { + "const": "perspective" + }, + { + "type": "string" + } + ] + }, + "sortingMethod": { + "type": "string", + "description": "Optional property specifying how to sort the Gaussians during rendering. This property defaults to cameraDistance.", + "default": "cameraDistance", + "anyOf": [ + { + "const": "cameraDistance" + }, + { + "type": "string" + } + ] + }, + "extensions": {}, + "extras": {} + }, + "required": ["colorSpace", "kernel"] +} diff --git a/extensions/2.0/Khronos/KHR_materials_diffuse_transmission/README.md b/extensions/2.0/Khronos/KHR_materials_diffuse_transmission/README.md index 38b7b9d186..3885ce51d9 100644 --- a/extensions/2.0/Khronos/KHR_materials_diffuse_transmission/README.md +++ b/extensions/2.0/Khronos/KHR_materials_diffuse_transmission/README.md @@ -11,7 +11,7 @@ - Alex Wood, AGI [@abwood](https://github.com/abwood) - Ed Mackey, AGI [@emackey](https://github.com/emackey) -Copyright 2024 The Khronos Group Inc. All Rights Reserved. glTF is a trademark of The Khronos Group Inc. +Copyright 2025 The Khronos Group Inc. All Rights Reserved. glTF is a trademark of The Khronos Group Inc. See [Appendix](#appendix-full-khronos-copyright-statement) for full Khronos Copyright Statement. ## Status @@ -24,12 +24,12 @@ Written against the glTF 2.0 spec. ## Exclusions -* This extension must not be used on a material that also uses `KHR_materials_pbrSpecularGlossiness`. -* This extension must not be used on a material that also uses `KHR_materials_unlit`. +- This extension must not be used on a material that also uses `KHR_materials_pbrSpecularGlossiness`. +- This extension must not be used on a material that also uses `KHR_materials_unlit`. ## Overview -This extension models the physical phenomenon of light being diffusely transmitted through an infinitely thin material. Thin dielectric objects like leaves or paper diffusely transmit light due to dense volumetric scattering within the object. In 3D graphics, it is common to approximate thin volumetric objects as non-volumetric surfaces. The KHR_materials_diffuse_transmission extension models the diffuse transmission of light through such infinitely thin surfaces. For optically thick media (volumes) with short scattering distances and dense scattering behavior, i.e. candles, KHR_materials_diffuse_transmission provides a phenomenologically plausible and cost-effective approximation. +This extension models the physical phenomenon of light being diffusely transmitted through an infinitely thin material. Thin dielectric objects like leaves or paper diffusely transmit light due to dense volumetric scattering within the object. In 3D graphics, it is common to approximate thin volumetric objects as non-volumetric surfaces. The `KHR_materials_diffuse_transmission` extension models the diffuse transmission of light through such infinitely thin surfaces. For optically thick media (volumes) with short scattering distances and dense scattering behavior, i.e. candles, `KHR_materials_diffuse_transmission` provides a phenomenologically plausible and cost-effective approximation.
@@ -44,43 +44,42 @@ This extension models the physical phenomenon of light being diffusely transmitt
- ## Extending Materials The effect is activated by adding the `KHR_materials_diffuse_transmission` extension to any glTF material. ```json { - "materials": [ - { - "extensions": { - "KHR_materials_diffuse_transmission": { - "diffuseTransmissionFactor": 0.25, - "diffuseTransmissionTexture": { - "index": 0 - }, - "diffuseTransmissionColorFactor": [ - 1.0, - 0.9, - 0.85 - ] - } - } + "materials": [ + { + "extensions": { + "KHR_materials_diffuse_transmission": { + "diffuseTransmissionFactor": 0.25, + "diffuseTransmissionTexture": { + "index": 0 + }, + "diffuseTransmissionColorFactor": [ + 1.0, + 0.9, + 0.85 + ] } - ] + } + } + ] } ``` ## Properties -| | Type | Description | Required | -|-------------------------------------|---------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------| -| **diffuseTransmissionFactor** | `number` | The percentage of non-specularly reflected light that is diffusely transmitted through the surface. | No, default: `0` | -| **diffuseTransmissionTexture** | [`textureInfo`](/specification/2.0/README.md#reference-textureInfo) | A texture that defines the percentage of non-specularly reflected light that is diffusely transmitted through the surface. Stored in the alpha (`A`) channel. Will be multiplied by the diffuseTransmissionFactor. | No | -| **diffuseTransmissionColorFactor** | `number[3]` | The color that modulates the transmitted light. | No, default: `[1, 1, 1]` | -| **diffuseTransmissionColorTexture** | [`textureInfo`](/specification/2.0/README.md#reference-textureInfo) | A texture that defines the color that modulates the diffusely transmitted light, stored in the `RGB` channels and encoded in sRGB. This texture will be multiplied by diffuseTransmissionColorFactor. | No | +| | Type | Description | Required | +|-------------------------------------|---------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------| +| **diffuseTransmissionFactor** | `number` | The percentage of non-specularly reflected light that is diffusely transmitted through the surface. | No, default: `0` | +| **diffuseTransmissionTexture** | [`textureInfo`](/specification/2.0/README.md#reference-textureInfo) | A texture that defines the percentage of non-specularly reflected light that is diffusely transmitted through the surface. Stored in the alpha (`A`) channel. Will be multiplied by the `diffuseTransmissionFactor` value. | No | +| **diffuseTransmissionColorFactor** | `number[3]` | The color that modulates the transmitted light. | No, default: `[1, 1, 1]` | +| **diffuseTransmissionColorTexture** | [`textureInfo`](/specification/2.0/README.md#reference-textureInfo) | A texture that defines the color that modulates the diffusely transmitted light, stored in the `RGB` channels and encoded in sRGB. This texture will be multiplied by the `diffuseTransmissionColorFactor` value. | No | -### diffuseTransmissionFactor +### Diffuse Transmission The proportion of light that is diffusely transmitted through a surface, rather than being diffusely re-emitted. This is expressed as a percentage of the light that penetrates the surface (i.e., not specularly reflected), rather than a percentage of the total light incident on the surface. A value of 1.0 indicates that 100% of the light that penetrates the surface is transmitted through it. @@ -106,9 +105,26 @@ The proportion of light that is diffusely transmitted through a surface, rather -### diffuseTransmissionColorFactor +When textured, this parameter is sampled from the `A` channel of the `diffuseTransmissionTexture`. The value is linear and is multiplied by the `diffuseTransmissionFactor` to determine the total diffuse transmission value. + +``` +diffuseTransmission = diffuseTransmissionFactor * diffuseTransmissionTexture.a +``` + + + + + + + + +
+ Backlit, occluded plane with blue baseColorFactor and a striped diffuseTransmissionTexture.
(Input texture shown in the top-left).
+
+ +### Diffuse Transmission Color -The proportion of light at each color channel that is not attenuated by the surface transmission. Attenuation is usually defined as an amount of light at each frequency that is reduced over a given distance through a medium by absorption and scattering interactions. However, since this extension deals exclusively with infinitely thin surfaces, attenuation is constant and equal to 1.0 - `diffuseTransmissionColorFactor`. +The proportion of light at each color channel that is not attenuated by the surface transmission. Attenuation is usually defined as an amount of light at each frequency that is reduced over a given distance through a medium by absorption and scattering interactions. However, since this extension deals exclusively with infinitely thin surfaces, attenuation is constant and equal to 1.0 - `diffuseTransmissionColor`. @@ -132,32 +148,8 @@ The proportion of light at each color channel that is not attenuated by the surf
-### diffuseTransmissionTexture - -The `A` channel of this texture defines proportion of light that is diffusely transmitted through a surface, rather than being diffusely re-emitted. This is expressed as a percentage of the light that penetrates the surface (i.e., not specularly reflected), rather than a percentage of the total light incident on the surface. A value of 1.0 indicates that 100% of the light that penetrates the surface is transmitted through it. - -The value is linear and is multiplied by the `diffuseTransmissionFactor` to determine the total diffuse transmission value. +When textured, the `RGB` channels of the `diffuseTransmissionColorTexture`, encoded in sRGB, define the proportion of light at each color channel that is not attenuated by the surface transmission. The values are multiplied by the `diffuseTransmissionFactor` to determine the final diffuse transmission color. -``` -diffuseTransmission = diffuseTransmissionFactor * diffuseTransmissionTexture.a -``` - - - - - - - - -
- Backlit, occluded plane with blue baseColorFactor and a striped diffuseTransmissionTexture.
(Input texture shown in the top-left).
-
- - -### diffuseTransmissionColorTexture - -The `RGB` channels of this texture define the proportion of light at each color channel that is not attenuated by the surface transmission. -The values are multiplied by the `diffuseTransmissionColorFactor` to determine the total diffuse transmission color. ``` diffuseTransmissionColor = diffuseTransmissionColorFactor * diffuseTransmissionColorTexture.rgb ``` @@ -193,6 +185,7 @@ diffuseTransmissionColor = diffuseTransmissionColorFactor * diffuseTransmissionC *This section is normative.* This extension changes the `dielectric_brdf` defined in [Appendix B](https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html#material-structure) + ``` dielectric_brdf = fresnel_mix( @@ -201,7 +194,9 @@ dielectric_brdf = layer = specular_brdf(α = roughness ^ 2) ) ``` - to the following: + +to the following: + ``` dielectric_brdf = fresnel_mix( @@ -213,12 +208,15 @@ dielectric_brdf = layer = specular_brdf(α = roughness ^ 2) ) ``` -Increasing the strength of the diffuse transmission effect using the `diffuseTransmissionFactor` parameter takes away energy from the diffuse reflection BSDF and passes it to the diffuse transmission BSDF. The specular reflection BSDF and Fresnel weighting are not affected. + +Increasing the strength of the diffuse transmission effect using the `diffuseTransmission` parameter takes away energy from the diffuse reflection BSDF and passes it to the diffuse transmission BSDF. The specular reflection BSDF and Fresnel weighting are not affected. ## Implementation + *This section is non-normative.* With a simple Lambert BRDF model, `diffuse_brdf` and `diffuse_btdf` may be implemented as follows + ``` function diffuse_brdf(color) { if (view and light on same hemisphere) { @@ -258,10 +256,13 @@ function mix(bsdf0, bsdf1, factor) { ## Combining Diffuse Transmission with other Extensions + ### KHR_materials_transmission + Both `KHR_materials_diffuse_transmission` and `KHR_materials_transmission` replace the diffuse BRDF with a mix of diffuse BRDF and a BTDF that transmits light onto the opposite side of the surface. In case of `KHR_materials_transmission`, this is a microfacet BTDF that shares its roughness with the microfacet BRDF. In case of `KHR_materials_diffuse_transmission`, on the other hand, this is a diffuse BTDF. Let's recall the `dielectric_brdf` for `KHR_materials_diffuse_transmission` as defined above + ``` dielectric_brdf = fresnel_mix( @@ -269,12 +270,13 @@ dielectric_brdf = base = mix( diffuse_brdf(color = baseColor), diffuse_btdf(color = diffuseTransmissionColor), - diffuseTransmission, + diffuseTransmission), layer = specular_brdf(α = roughness ^ 2) ) ``` and compare it to the `dielectric_brdf` defined in `KHR_materials_transmission` + ``` dielectric_brdf = fresnel_mix( @@ -336,6 +338,7 @@ Since the diffuse BTDF does not have controls for roughness, the roughness param If `KHR_materials_transmission` is used in combination with `KHR_materials_diffuse_transmission`, the transmission effect overrides the diffuse transmission effect. We can formalize this behavior by combining the two cases from above + ``` dielectric_brdf = fresnel_mix( @@ -352,6 +355,7 @@ diffuse_bsdf = mix( diffuse_btdf(color = diffuseTransmissionColor), diffuseTransmission) ``` + @@ -378,6 +382,7 @@ diffuse_bsdf = mix(
### KHR_materials_volume + When `KHR_materials_diffuse_transmission` is combined with `KHR_materials_volume`, a diffuse transmission BTDF describes the transmission of light through the volume boundary. The object becomes translucent. The light transport inside the volume is solely handled by `KHR_materials_volume` and is not affected by the surface BSDF. @@ -421,14 +426,13 @@ When `KHR_materials_diffuse_transmission` is combined with `KHR_materials_volume
- ## Schema -- [glTF.KHR_materials_diffuse_transmission.schema.json](schema/glTF.KHR_materials_diffuse_transmission.schema.json) +- [material.KHR_materials_diffuse_transmission.schema.json](schema/material.KHR_materials_diffuse_transmission.schema.json) ## Appendix: Full Khronos Copyright Statement -Copyright 2024 The Khronos Group Inc. +Copyright 2025 The Khronos Group Inc. Some parts of this Specification are purely informative and do not define requirements necessary for compliance and so are outside the Scope of this Specification. These diff --git a/extensions/2.0/Khronos/KHR_materials_diffuse_transmission/schema/glTF.KHR_materials_diffuse_transmission.schema.json b/extensions/2.0/Khronos/KHR_materials_diffuse_transmission/schema/material.KHR_materials_diffuse_transmission.schema.json similarity index 68% rename from extensions/2.0/Khronos/KHR_materials_diffuse_transmission/schema/glTF.KHR_materials_diffuse_transmission.schema.json rename to extensions/2.0/Khronos/KHR_materials_diffuse_transmission/schema/material.KHR_materials_diffuse_transmission.schema.json index 38e69464c8..5f8569e548 100644 --- a/extensions/2.0/Khronos/KHR_materials_diffuse_transmission/schema/glTF.KHR_materials_diffuse_transmission.schema.json +++ b/extensions/2.0/Khronos/KHR_materials_diffuse_transmission/schema/material.KHR_materials_diffuse_transmission.schema.json @@ -1,6 +1,6 @@ { "$schema": "http://json-schema.org/draft-04/schema", - "title": "KHR_materials_diffuse_transmission glTF extension", + "title": "KHR_materials_diffuse_transmission glTF Material Extension", "type": "object", "description": "glTF extension that defines the diffuse transmission of the material.", "allOf": [ { "$ref": "glTFProperty.schema.json" } ], @@ -16,25 +16,26 @@ "diffuseTransmissionTexture": { "allOf": [ { "$ref": "textureInfo.schema.json" } ], "description": "A texture that defines the percentage of light transmitted through the surface.", - "gltf_detailedDescription": "A texture that defines the strength of the diffuse transmission effect, stored in the alpha (A) channel. Will be multiplied by the diffuseTransmissionFactor." + "gltf_detailedDescription": "A texture that defines the strength of the diffuse transmission effect, stored in the alpha (A) channel. Will be multiplied by the `diffuseTransmissionFactor` value." }, "diffuseTransmissionColorFactor": { "type": "array", "items": { "type": "number", - "minimum": 0.0 + "minimum": 0.0, + "maximum": 1.0 }, "description": "The color of the transmitted light.", "default": [ 1.0, 1.0, 1.0 ], "minItems": 3, "maxItems": 3, "gltf_detailedDescription": "The color of the transmitted light." - }, - "diffuseTransmissionColorTexture": { - "allOf": [ { "$ref": "textureInfo.schema.json" } ], - "description": "A texture that defines the color of the transmitted light", - "gltf_detailedDescription": "A texture that defines the color of the transmitted light, stored in the RGB channels and encoded in sRGB. This texture will be multiplied by diffuseTransmissionColorFactor." - }, + }, + "diffuseTransmissionColorTexture": { + "allOf": [ { "$ref": "textureInfo.schema.json" } ], + "description": "A texture that defines the color of the transmitted light", + "gltf_detailedDescription": "A texture that defines the color of the transmitted light, stored in the RGB channels and encoded in sRGB. This texture will be multiplied by the `diffuseTransmissionColorFactor` value." + }, "extensions": { }, "extras": { } } diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md new file mode 100644 index 0000000000..78bc55e3dc --- /dev/null +++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md @@ -0,0 +1,723 @@ +# KHR\_meshopt\_compression + +## Contributors + +* Arseny Kapoulkine, [@zeux](https://github.com/zeux) +* Jasper St. Pierre, [@magcius](https://github.com/magcius) +* Alexey Knyazev, [@lexaknyazev](https://github.com/lexaknyazev) +* Don McCurdy, [@donmccurdy](https://github.com/donmccurdy) + +## Status + +Release Candidate + +## Dependencies + +Written against the glTF 2.0 spec. + +## Exclusions + +- This extension must not be used on a buffer view that also uses `EXT_meshopt_compression`. +- This extension must not be used on a buffer that also uses `EXT_meshopt_compression` (see "Fallback buffers"). + +## Overview + +glTF files come with a variety of binary data - vertex attribute data, index data, morph target deltas, animation inputs/outputs - that can be a substantial fraction of the overall transmission size. To optimize for delivery size, general-purpose compression such as gzip can be used - however, it often doesn't capture some common types of redundancy in glTF binary data. + +This extension provides a generic option for compressing binary data that is tailored to the common types of data seen in glTF buffers. The extension works on a bufferView level and as such is agnostic of how the data is used, supporting geometry (vertex and index data, including morph targets), animation (keyframe time and values) and other data, such as instance transforms for `EXT_mesh_gpu_instancing`. + +Similarly to supercompressed textures (see `KHR_texture_basisu`), this extension assumes that the buffer view data is optimized for GPU efficiency - using quantization and using optimal data order for GPU rendering - and provides a compression layer on top of bufferView data. Each bufferView is compressed in isolation which allows the loaders to maximally efficiently decompress the data directly into GPU storage. + +The compressed format is designed to have two properties beyond optimizing compression ratio - very fast decoding (using WebAssembly SIMD, the decoders run at \~1 GB/sec on modern desktop hardware), and byte-wise storage compatible with general-purpose compression. That is, instead of reducing the encoded size as much as possible, the bitstream is constructed in such a way that general-purpose compressor can compress it further. + +This is beneficial for typical Web delivery scenarios, where all files are usually using lossless general-purpose compression (gzip, Brotli, Zstandard) - instead of completely replacing it, the codecs here augment it, while still reducing the size (which is valuable to optimize delivery size when general-purpose compression isn't available, and additionally reduces the performance impact of general-purpose decompression which is typically *much slower* than decoders proposed here). + +## Specifying compressed views + +As explained in the overview, this extension operates on bufferViews. This allows the loaders to directly decompress data into GPU memory and minimizes the JSON size impact of specifying compressed data. To specify the compressed representation, `KHR_meshopt_compression` extension section overrides the source buffer index as well as specifying the buffer parameters and a compression mode/filter (detailed later in the specification): + +```json +{ + "buffer": 1, + "byteOffset": 0, + "byteLength": 2368, + "byteStride": 16, + "target": 34962, + "extensions": { + "KHR_meshopt_compression": { + "buffer": 0, + "byteOffset": 1024, + "byteLength": 347, + "byteStride": 16, + "mode": "ATTRIBUTES", + "count": 148 + } + } +} +``` + +In this example, the uncompressed buffer contents is stored in buffer 1 (this can be used by loaders that don't implement this extension). The compressed data is stored in a separate buffer, specifying a separate byte range (with compressed data). Note that for compressors to work, they need to know the compression `mode`, `filter` (for `"ATTRIBUTES"` mode), and additionally the layout of the encoded data - `count` elements with `byteStride` bytes each. This data is specified in the extension JSON; while in some cases `byteStride` is available on the parent `bufferView` declaration, JSON schema prohibits specifying this for some types of storage such as index data. + +## JSON schema updates + +Each `bufferView` can contain an extension object with the following properties: + +| Property | Type | Description | Required | +|:---------|:--------------|:------------------------------------------| :--------------------------| +| `buffer` | `integer` | The index of the buffer with compressed data. | :white_check_mark: Yes | +| `byteOffset` | `integer` | The offset into the buffer in bytes. | No, default: `0` | +| `byteLength` | `integer` | The length of the compressed data in bytes. | :white_check_mark: Yes | +| `byteStride` | `integer` | The stride, in bytes. | :white_check_mark: Yes | +| `count` | `integer` | The number of elements. | :white_check_mark: Yes | +| `mode` | `string` | The compression mode. | :white_check_mark: Yes | +| `filter` | `string` | The compression filter. | No, default: `"NONE"` | + +`mode` represents the compression mode using an enumerated value that must be one of `"ATTRIBUTES"`, `"TRIANGLES"`, `"INDICES"`. + +`filter` represents the post-decompression filter using an enumerated value that must be one of `"NONE"`, `"OCTAHEDRAL"`, `"QUATERNION"`, `"EXPONENTIAL"`, `"COLOR"`. + +For the extension object to be valid, the following must hold: + +- The parent `bufferView.byteLength` is equal to `byteStride` times `count` +- When `mode` is `"ATTRIBUTES"`, `byteStride` must be divisible by 4 and must be >= 4 and <= 256. +- When `mode` is `"TRIANGLES"`, `count` must be divisible by 3 +- When `mode` is `"TRIANGLES"` or `"INDICES"`, `byteStride` must be equal to 2 or 4 +- When `mode` is `"TRIANGLES"` or `"INDICES"`, `filter` must be equal to `"NONE"` or omitted +- When `filter` is `"OCTAHEDRAL"`, `byteStride` must be equal to 4 or 8 +- When `filter` is `"QUATERNION"`, `byteStride` must be equal to 8 +- When `filter` is `"EXPONENTIAL"`, `byteStride` must be divisible by 4 +- When `filter` is `"COLOR"`, `byteStride` must be equal to 4 or 8 + +The compressed bitstream format is defined by the value of the `mode` property. + +The parent `bufferView` properties define a layout which can hold the data decompressed from the extension object. + +## Compression modes and filters + +Compression mode specifies the bitstream layout and the algorithm used to decompress the data, and can be one of: + +- Mode 0: attributes. Suitable for storing sequences of values of arbitrary size, relies on exploiting similarity between bytes of consecutive elements to reduce the size. +- Mode 1: triangles. Suitable for storing indices that represent triangle lists, relies on exploiting topological redundancy of consecutive triangles. +- Mode 2: indices. Suitable for storing indices that don't represent triangle lists, relies on exploiting similarity between consecutive elements. + +In all three modes, the resulting compressed byte sequence is typically noticeably smaller than the buffer view length, *and* can be additionally compressed by using a general purpose compression algorithm such as Deflate for the resulting glTF file (.glb/.bin). + +The format of the bitstream is specified in [Appendix A (Bitstream)](#appendix-a-bitstream). + +When using attribute encoding, for some types of data exploiting the redundancy between consecutive elements is not enough to achieve good compression ratio; quantization can help but isn't always sufficient either. To that end, when using mode 0, this extension allows a further use of a compression filter, that transforms each element stored in the buffer view to make it more compressible with the attribute codec and often allows to trade precision for compressed size. Filters don't change the size of the output data, they merely improve the compressed size by reducing entropy; note that the use of a compression filter restricts `byteStride` which effectively prohibits storing interleaved data. + +Filter specifies the algorithm used to transform the data after decompression, and can be one of: + +- Filter 0: none. Attribute data is used as is. +- Filter 1: octahedral. Suitable for storing unit length vectors (normals/tangents) as 4-byte or 8-byte values with variable precision octahedral encoding. +- Filter 2: quaternion. Suitable for storing rotation data for animations or instancing as 8-byte values with variable precision max-component encoding. +- Filter 3: exponential. Suitable for storing floating point data as 4-byte values with variable mantissa precision. +- Filter 4: color. Suitable for storing color data as 4-byte or 8-byte values using variable precision YCoCg color model. + +The filters are detailed further in [Appendix B (Filters)](#appendix-b-filters). + +When using filters, the expectation is that the filter is applied after the attribute decoder on the contents of the resulting bufferView; the resulting data can then be used according to the referencing accessors without further modifications. + +When compression filters are used, the decompressed data may not match the original uncompressed data exactly due to precision loss. When a buffer view using filters also has an uncompressed fallback, the `min` and `max` values in accessor bounds must be exact with respect to the uncompressed fallback data and may not be exact with respect to the compressed data. + +**Non-normative** To decompress the data, [meshoptimizer](https://github.com/zeux/meshoptimizer) library may be used; it supports efficient decompression using C++ and/or WebAssembly, including fast SIMD implementation for attribute decoding. + +## Fallback buffers + +While the extension JSON specifies a separate buffer to source compressed data from, the parent `bufferView` must also have a valid `buffer` reference as per glTF 2.0 spec requirement. To produce glTF files that *require* support for this extension and don't have uncompressed data, the referenced buffer can contain no URI as follows: + +```json +{ "byteLength": 1432878 } +``` + +The `byteLength` property of such a placeholder buffer **MUST** be sufficiently large to contain all uncompressed buffer views referencing it. + +When stored in a GLB file, the placeholder buffer should have index 1 or above, to avoid conflicts with GLB binary buffer. + +This extension allows buffers to be optionally tagged as fallback by using the `fallback` attribute as follows: + +```json +{ + "byteLength": 1432878, + "extensions": { + "KHR_meshopt_compression": { + "fallback": true + } + } +} +``` + +This is useful to avoid confusion, and may also be used by loaders that support the extension to skip loading of these buffers. + +When a buffer is marked as a fallback buffer, the following must hold: + +- All references to the buffer must come from `bufferView`s that have a `KHR_meshopt_compression` extension specified +- No references to the buffer may come from `KHR_meshopt_compression` extension JSON + +If a fallback buffer doesn't have a URI and doesn't refer to the GLB binary chunk, it follows that `KHR_meshopt_compression` must be a required extension. + +**Non-normative** To ensure consistency between compressed and uncompressed data, encoders should use the decompressed data to populate the fallback buffer view instead of using the original data. This reduces the chance of divergence between the two representations. + +## Compressing geometry data + +> This section is non-normative. + +The codecs used by this extension can represent geometry exactly, replicating both vertex and index data without changes in contents or order. However, to get optimal compression, it's necessary to pre-process the data. + +To get optimal compression, encoders should optimize vertex and index data for locality of reference. Specifically: + +- Triangle order should be optimized to maximize the recency of previously encountered vertices; this is similar to optimizing meshes for vertex reuse aka post-transform cache in GPU hardware. +- Vertex order should be linearized in the order that vertices appear in the index stream to get optimal index compression + +When index data is not available (e.g. point data sets) or represents topology with a lot of seams (e.g. each triangle has unique vertex indices because it specifies flat-shaded normal), encoders could additionally optimize vertex data for spatial locality, so that vertices close together in the vertex stream are close together in space. + +Vertex data should be quantized using the appropriate representation; this extension cleanly interacts with `KHR_mesh_quantization` by compressing already quantized data. + +Morph targets can be treated identically to other vertex attributes, as long as vertex order optimization is performed on all target streams at the same time. It is recommended to use quantized storage for morph target deltas, possibly with a narrower type than that used for baseline values. + +When storing vertex data, mode 0 (attributes) should be used; for index data, mode 1 (triangles) or mode 2 (indices) should be used instead. Mode 1 only supports triangle list storage; indices of other topology types can be stored using mode 2 (indices). The use of triangle strip topology is not recommended since it's more efficient to store triangle lists using mode 1 (triangles). These are suggestions; the extension does not require any specific mode to be used for any specific type of data. + +Using filter 1 (octahedral) for normal/tangent data, and filter 4 (color) for color data, may improve compression ratio further. + +While using quantized attributes is recommended for optimal compression, it's also possible to use non-quantized floating point attributes. To increase compression ratio in that case, filter 3 (exponential) is recommended - advanced encoders can additionally constrain the exponent to be the same for all components of a vector, or for all values of the same component across the entire mesh, which can further improve compression ratio. + +## Compressing animation data + +> This section is non-normative. + +To minimize the size of animation data, it is important to reduce the number of stored keyframes and reduce the size of each keyframe. + +To reduce the number of keyframes, encoders can either selectively remove keyframes that don't contribute to the resulting movement, resulting in sparse input/output data, or resample the keyframes uniformly, resulting in uniformly dense data. Resampling can be beneficial since it means that all animation channels in the same animation can share the same input accessor, and provides a convenient quality vs size tradeoff, but it's up to the encoder to pick the optimal strategy. + +Additionally it's important to identify tracks with the same output value and use a single keyframe for these. + +To reduce the size of each keyframe, rotation data should be quantized using 16-bit normalized components; for additional compression, the use of filter 2 (quaternion) is recommended. Translation/scale data can be compressed using filter 3 (exponential); for scale data, using the same exponent for all three vector components can enhance compression ratio further. + +Note that animation inputs that specify time values require enough precision to avoid animation distortion. It's recommended to either not use any filters for animation inputs to avoid any precision loss (attribute encoder can still be efficient at reducing the size of animation input track even without filters when the inputs are uniformly spaced), or use filter 3 (exponential) with maximum mantissa bit count (23). + +After pre-processing, both input and output data should be stored using mode 0 (attributes). + +When `EXT_mesh_gpu_instancing` extension is used, the instance transform data can also be compressed with the same techniques as animation data, using mode 0 (attributes) with filter 3 (exponential) for position and scale, and filter 2 (quaternion) for rotation. + +## Content guidelines + +> This section is non-normative. + +This extension expands the compression offered by the existing extension `EXT_meshopt_compression`. Since existing tools and pipelines already support that extension, and existing assets already use it, the following guidelines are recommended for content creators and tool authors: + +- Tools that already support `EXT_meshopt_compression` extension should keep supporting it alongside this extension to be able to read pre-existing assets. +- For maximum compatibility, DCC tools should give users a choice to use either variant when exporting assets. The default option should be eventually switched to the KHR variant once most loaders support it. +- Existing assets that use the EXT variant can be losslessly converted to KHR, if needed, by changing the extension strings inside glTF JSON. +- When producing assets that target loaders supporting both extensions, using this extension with v1 format should be preferred since it provides better compression ratio at no additional runtime cost. + +# Appendix A: Bitstream + +The following sections specify the format of the bitstream for compressed data for various modes. + +## Mode 0: attributes + +Attribute compression exploits similarity between consecutive elements of the buffer by encoding deltas. The deltas are stored for each separate byte which makes the codec more versatile since it can work with components of various sizes. Additionally, the elements are stored with bytes deinterleaved, which means that sequences of deltas are more easily compressible by some general purpose compressors that may run on the resulting data. + +To facilitate efficient decompression, deinterleaving and delta encoding are performed on attribute blocks instead of on the entire buffer; within each block, elements are processed in groups of 16. + +The encoded stream structure is as follows: + +- Header byte, which must be equal to `0xa1` (version 1) or `0xa0` (version 0) +- One or more attribute blocks, detailed below +- Tail padding, which pads the size of the subsequent tail block with zero bytes to a minimum of 24 bytes for version 1 or 32 bytes for version 0 (required for efficient decoding) +- Tail block, which consists of: + - Baseline element stored verbatim (`byteStride` bytes) + - Channel modes (`byteStride / 4` bytes, only in version 1) + +**Non-normative** While using version 1 is preferred for better compression, version 0 is provided for binary compatibility with `EXT_meshopt_compression`. When using version 0, the bitstream is identical to that defined in `EXT_meshopt_compression`. + +Note that there is no way to calculate the length of a stream; instead, the input stream must be correctly sized (using `byteLength`) so that the tail block element can be found. If the decoding procedure reaches the end of stream too early, or any unprocessed bytes remain after decoding and before tail, the stream is invalid. + +Each attribute block encodes a sequence of deltas, with the first element in the first block using the deltas from the baseline element stored in the tail block, and each subsequent element using the deltas from the previous element. The attribute block always stores an integer number of elements, with that number computed as follows: + +``` +maxBlockElements = min((8192 / byteStride) & ~15, 256) +blockElements = min(remainingElements, maxBlockElements) +``` + +Where `remainingElements` is the number of elements that have yet to be decoded (with the initial value of `count` extension property). Decoding the attribute block reduces `remainingElements` value by `blockElements`. + +Each attribute block consists of: +- Control header (only in version 1): `byteStride / 4` bytes specifying a packed 2-bit control mode for each byte position of the element +- `byteStride` "data blocks" (one for each byte of the element), each containing deltas stored for groups of elements + +Each group always contains 16 elements; when the number of elements that needs to be encoded isn't divisible by 16, it gets rounded up and the remaining elements are ignored after decoding. In other terms: + +``` +groupCount = ceil(blockElements / 16) +``` + +For example, a stream with a `byteStride` of 64 containing 200 elements would be broken up into two attribute blocks: one containing 128 elements, and the other containing 72 elements. And these blocks would have 8 and 5 groups, respectively. + +The control header (only present in version 1) contains 2 bits for each byte position, packed into bytes as follows: + +``` +controlByte = (controlForByte0 << 0) | (controlForByte1 << 2) | (controlForByte2 << 4) | (controlForByte3 << 6) +``` + +The control bits specify the control mode for each byte: + +- control mode 0: Use bit lengths `{0, 1, 2, 4}` for encoding +- control mode 1: Use bit lengths `{1, 2, 4, 8}` for encoding +- control mode 2: All delta bytes are 0; no data is stored for this byte +- control mode 3: Literal encoding; delta bytes are stored uncompressed with no header bits + +The structure of each "data block" (when using control mode 0 or 1, or when using version 0) breaks down as follows: +- Header bits, with 2 bits for each group, aligned to the byte boundary with zero padding if groupCount is not divisible by 4 +- Delta blocks, with variable number of bytes stored for each group + +Header bits are stored from least significant to most significant bit - header bits for 4 consecutive groups are packed in a byte together as follows: + +``` +(headerBitsForGroup0 << 0) | (headerBitsForGroup1 << 2) | (headerBitsForGroup2 << 4) | (headerBitsForGroup3 << 6) +``` + +The header bits establish the delta encoding mode for each group of 16 elements: + +For control mode 0 (version 1): +- delta encoding mode 0: All 16 byte deltas are 0; the size of the encoded block is 0 bytes +- delta encoding mode 1: Deltas are stored in 1-bit sentinel encoding; the size of the encoded block is [2..18] bytes +- delta encoding mode 2: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes +- delta encoding mode 3: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes + +For control mode 1 (version 1): +- delta encoding mode 0: Deltas are stored in 1-bit sentinel encoding; the size of the encoded block is [2..18] bytes +- delta encoding mode 1: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes +- delta encoding mode 2: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes +- delta encoding mode 3: All 16 byte deltas are stored as bytes; the size of the encoded block is 16 bytes + +For version 0: +- delta encoding mode 0: All 16 byte deltas are 0; the size of the encoded block is 0 bytes +- delta encoding mode 1: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes +- delta encoding mode 2: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes +- delta encoding mode 3: All 16 byte deltas are stored as bytes; the size of the encoded block is 16 bytes + +When using the sentinel encoding, each delta is stored as a 1-bit, 2-bit, or 4-bit value in packed bytes. For 2-bit and 4-bit encodings, deltas are stored from most significant to least significant bit inside the byte. For 1-bit encoding, deltas are stored from least significant to most significant bit to facilitate better reuse of lookup tables in efficient implementations. The 1-bit encoding is packed as follows with 8 deltas per byte: + +``` +(delta0 << 0) | (delta1 << 1) | (delta2 << 2) | (delta3 << 3) | (delta4 << 4) | (delta5 << 5) | (delta6 << 6) | (delta7 << 7) +``` + +The 2-bit encoding is packed as follows with 4 deltas per byte: + +``` +(delta3 << 0) | (delta2 << 2) | (delta1 << 4) | (delta0 << 6) +``` + +And the 4-bit encoding is packed as follows with 2 deltas per byte: + +``` +(delta1 << 0) | (delta0 << 4) +``` + +A delta that has all bits set to 1 (corresponds to `1` for 1-bit encoding, `3` for 2-bit encoding, and `15` for 4-bit encoding, otherwise known as "sentinel") indicates that the real delta value is outside of the bit range, and is stored as a full byte after the bit deltas for this group. + +To decode deltas into original values, the channel modes (specified in the tail block for version 1) are used. When using version 0, the channel mode is assumed to be 0 (byte deltas); other modes can only be present in version 1. The encoded stride is split into `byteStride / 4` channels, and each channel specifies the mode in a single byte in the tail block, with the low 4 bits of the byte specifying the mode: + +**Channel mode 0 (byte deltas)**: Byte deltas are stored as zigzag-encoded differences between the byte values of the element and the byte values of the previous element in the same position; the zigzag encoding scheme works as follows: + +``` +encode(uint8_t v) = ((v & 0x80) != 0) ? ~(v << 1) : (v << 1) +decode(uint8_t v) = ((v & 1) != 0) ? ~(v >> 1) : (v >> 1) +``` + +For a complete example, assuming 4-bit sentinel coding, the following byte sequence: + +``` +0x17 0x5f 0xf0 0xbc 0x77 0xa9 0x21 0x00 0x34 0xb5 +``` + +Encodes 16 deltas, where the first 8 bytes of the sequence specify 16 4-bit deltas, and the last 2 bytes of the sequence specify the explicit delta code values encoded for elements 3 and 4 in the sequence. After de-zigzagging, the decoded deltas look like: + +``` +-1 -4 -3 26 -91 0 -6 6 -4 -4 5 -5 1 -1 0 0 +``` + +Finally, note that the deltas are computed in 8-bit integer space with wraparound two's complement arithmetic; for example, if the values of the first byte of two consecutive elements are `0x00` and `0xff`, the byte delta that is stored is `-1` (`1` after zigzag encoding). + +**Channel mode 1 (2-byte deltas)**: 2-byte deltas are computed as zigzag-encoded differences between 16-bit values of the element and the previous element in the same position; the zigzag encoding scheme works as follows: + +``` +encode(uint16_t v) = ((v & 0x8000) != 0) ? ~(v << 1) : (v << 1) +decode(uint16_t v) = ((v & 1) != 0) ? ~(v >> 1) : (v >> 1) +``` + +The deltas are computed in 16-bit integer space with wraparound two's complement arithmetic. Values are assumed to be little-endian, so the least significant byte is encoded before the most significant byte. + +**Channel mode 2 (4-byte XOR deltas)**: 4-byte deltas are computed as XOR between 32-bit values of the element and the previous element in the same position, with an additional rotation `r` applied based on the high 4 bits of the channel mode byte: + +``` +rotate(uint32_t v, uint r) = (v << r) | (v >> ((32 - r) & 31)) +``` + +The deltas are computed in 32-bit integer space. Values are assumed to be little-endian, so the least significant byte is encoded before the most significant byte. + +Because the channel mode defines encoding for 4 bytes at once, it's impossible to mix modes 0 and 1 within the same channel: if the first 2-byte group of an aligned 4-byte group uses 2-byte deltas, the second 2-byte group must use 2-byte deltas as well. + +Streams that use channel mode 3 or above, as well as streams that use channel mode 0 or 1 with high 4 bits of the channel mode byte not equal to 0, are invalid. + +## Mode 1: triangles + +Triangle compression compresses triangle list indices by exploiting similarity between consecutive triangles. Given a triangle stream that has been optimized for locality, very often subsequent triangles share an edge with the recently encoded triangle. The encoder uses a few other techniques to try to encode most triangles in optimized triangle lists into a single byte. + +The encoded stream structure is as follows: + +- Header byte, which must be equal to `0xe1` +- Triangle codes, referred to as `code` below, with a single byte for each triangle (for a total of `count` extension property divided by 3, since `count` counts index values) +- Extra data which is necessary to decode triangles that don't fit into a single byte, referred to as `data` below +- Tail block, which consists of a 16-byte lookup table (containing 16 one-byte values), referred to as `codeaux` below + +Note that there is no way to calculate the length of a stream; instead, the input stream must be correctly sized (using `byteLength`) so that the tail block element can be found. If the decoding procedure reaches the end of stream too early, or any unprocessed bytes remain after decoding and before tail, the stream is invalid. + +There are two limitations on the structure of the 16-byte lookup table: + +- The last two bytes must be 0 +- Neither high four bits nor low four bits of any of 16 bytes can be equal to `0xf`. + +During the decoding process, decoder maintains five variables: + +- current offset into `data` section +- `next`: a `uint32_t` referring to the expected next unique index (also known as high-watermark), starts at 0 and is incremented with unsigned 32-bit wraparound +- `last`: a `uint32_t` referring to the last encoded index, starts at 0 +- `edgefifo`: a 16-entry FIFO with two `uint32_t` vertex indices in each entry; initial contents is undefined +- `vertexfifo`: a 16-entry FIFO with a `uint32_t` vertex index in each entry; initial contents is undefined + +To decode each triangle, the decoder needs to analyze the `code` byte, read additional bytes from `data` as necessary, and update the internal state correctly. The `code` byte encoding is optimized to reach a single byte per triangle in most common cases; the resulting data can often be compressed by a general purpose compressor running on the resulting .bin/.glb file. + +When extra data is necessary to decode a triangle and it represents an index value, the decoder uses varint-7 encoding (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)). The encoding stores a 32-bit unsigned integer as a sequence of bytes, where each byte's most significant bit indicates whether more bytes follow. The sequence must consist of 1-5 bytes where the most significant bit of the last byte must be 0 and the most significant bits of all prior bytes must be 1. The value is reconstructed by concatenating the lower 7 bits of each byte, ignoring extra bits: + +``` +0x7f => 0x7f +0x81 0x04 => 0x201 +0xff 0xa0 0x05 => 0x1fd005 +``` + +When decoding the deltas, the 32-bit value is read using the varint-7 encoding (with unsigned 32-bit wraparound). The resulting value specifies a zigzag-encoded signed delta from `last` and can be decoded as follows: + +``` +uint32_t decodeIndex(uint32_t v) { + uint32_t delta = (v & 1) != 0 ? ~(v >> 1) : (v >> 1); + last += delta; // unsigned 32-bit wraparound + return last; +} +``` + +Any streams that require the decoder to read an edge or vertex FIFO entry that was not previously written are invalid. + +The encoding for `code` is split into various cases, some of which are self-sufficient and some need to read extra data. The encoding is detailed below; after either path the triangle (a, b, c) is emitted to the output. + +- `0xX0`, where `X < 0xf`: Encodes a recently encountered edge and a `next` vertex. + - The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge). + - The third index, `c`, is set to `next`. + - `next` is incremented. + + - Edge (c, b) is pushed to the edge FIFO. + - Edge (a, c) is pushed to the edge FIFO. + + - Vertex c is pushed to the vertex FIFO. + +- `0xXY`, where `X < 0xf` and `0 < Y < 0xd`: Encodes a recently encountered edge and a recently encountered vertex. + - The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge). + - The third index, `c`, is read from the vertex FIFO at index Y (where 0 is the most recently added vertex; note that 0 is never actually read here, since `Y > 0`). + + - Edge (c, b) is pushed to the edge FIFO. + - Edge (a, c) is pushed to the edge FIFO. + +- `0xXd` or `0xXe`, where `X < 0xf`: Encodes a recently encountered edge and a vertex that's adjacent to `last`. + + - The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge). + - The third index, `c`, is equal to `last-1` for `0xXd` and `last+1` for `0xXe`. + - `last` is set to `c` (effectively decrementing or incrementing it accordingly). + + - Edge (c, b) is pushed to the edge FIFO. + - Edge (a, c) is pushed to the edge FIFO. + + - Vertex c is pushed to the vertex FIFO. + +- `0xXf`, where `X < 0xf`: Encodes a recently encountered edge and a free-standing vertex encoded explicitly. + + - The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge). + - The third index, `c`, is decoded using `decodeIndex` by reading extra bytes from `data` (note, this also updates `last`). + + - Edge (c, b) is pushed to edge FIFO. + - Edge (a, c) is pushed to edge FIFO. + + - Vertex c is pushed to the vertex FIFO. + +- `0xfY`, where `Y < 0xe`: Encodes three indices using `codeaux` table lookup and vertex FIFO. + + - The table `codeaux` is used to read the element Y; let's assume that results in `0xZW`. + + - The first index, `a`, is set to `next`. + - `next` is incremented. + + - If `Z == 0`: + - The second index, `b`, is set to `next`. + - `next` is incremented. + - Otherwise the second index, `b`, is read from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex). + + - If `W == 0`: + - The third index, `c`, is set to `next`. + - `next` is incremented. + - Otherwise the third index, `c`, is read from vertex FIFO at index `W-1` (where 0 is the most recently added vertex). + + - Note that in the process `next` is incremented from 1 to 3 times depending on values of Z/W. + + - Edge (b, a) is pushed to the edge FIFO. + - Edge (c, b) is pushed to the edge FIFO. + - Edge (a, c) is pushed to the edge FIFO. + + - Vertex a is pushed to the vertex FIFO. + - Vertex b is pushed to the vertex FIFO if `Z == 0`. + - Vertex c is pushed to the vertex FIFO if `W == 0`. + +- `0xfe` or `0xff`: Encodes three indices explicitly. + - Read one byte from `data` as-is, without using LEB128 decoding; let's assume that results in `0xZW`. + - If `0xZW` == `0x00`, then `next` is reset to 0. This is a special mechanism used to restart the `next` sequence which is useful for concatenating independent triangle streams. This must be done before further processing. + + - If using `0xfe` encoding: + - The first index, `a`, is set to `next`. + - `next` is incremented. + - Otherwise the first index, `a`, is read using `decodeIndex` by reading extra bytes from `data` (note, this also updates `last`). + + - If `Z == 0`: + - The second index, `b`, is set to `next`. + - `next` is incremented. + - Else if `Z < 0xf`: + - The second index, `b`, is read from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex). + - Otherwise the second index, `b`, is read using `decodeIndex` by reading extra bytes from `data` (note, this also updates `last`). + + - If `W == 0`: + - The third index, `c`, is set to `next`. + - `next` is incremented. + - Else if `W < 0xf`: + - The third index, `c`, is read from vertex FIFO at index `W-1` (where 0 is the most recently added vertex). + - Otherwise the third index, `c`, is read using `decodeIndex` by reading extra bytes from `data` (note, this also updates `last`). + + - Edge (b, a) is pushed to the edge FIFO. + - Edge (c, b) is pushed to the edge FIFO. + - Edge (a, c) is pushed to the edge FIFO. + + - Vertex a is pushed to the vertex FIFO. + - Vertex b is pushed to the vertex FIFO if `Z == 0` or `Z == 0xf`. + - Vertex c is pushed to the vertex FIFO if `W == 0` or `W == 0xf`. + +After decoding, the triangle indices a, b, c are emitted as 32-bit unsigned integers (if `byteStride == 4`) or 16-bit unsigned integers with wraparound (if `byteStride == 2`). + +At the end of the decoding, `data` is expected to be fully read by all the triangle codes and not contain any extra bytes. + +## Mode 2: indices + +Index compression exploits similarity between consecutive indices. Note that, unlike the triangle index compression (mode 1), this mode doesn't assume a specific topology and as such is less efficient in terms of the resulting size. However, unlike mode 1, this mode can be used to compress triangle strips, line lists and other types of mesh index data, and can additionally be used to compress non-mesh index data such as sparse indices for accessors. + +The encoded stream structure is as follows: + +- Header byte, which must be equal to `0xd1` +- A sequence of index deltas (with number of elements equal to `count` extension property), with encoding specified below +- Tail block, which consists of 4 padding bytes that should be set to 0 + +Note that there is no way to calculate the length of a stream; instead, the input stream must be correctly sized (using `byteLength`). If the decoding procedure reaches the end of stream too early, or any unprocessed bytes remain after decoding and before tail, the stream is invalid. + +Instead of simply encoding deltas vs the previous index, the decoder tracks *two* baseline index values, that start at 0. Each delta is specified in relation to one of these values and updates it so that the next delta that references the same baseline uses the encoded index value as a reference. This encoding is more efficient at handling some types of bimodal sequences where two independent monotonic sequences are spliced together, which can occur for some common cases of triangle strips or line lists. + +To specify the index delta, the varint-7 encoding (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)) is used. The encoding stores a 32-bit unsigned integer as a sequence of bytes, where each byte's most significant bit indicates whether more bytes follow. The sequence must consist of 1-5 bytes where the most significant bit of the last byte must be 0 and the most significant bits of all prior bytes must be 1. The value is reconstructed by concatenating the lower 7 bits of each byte, ignoring extra bits: + +``` +0x7f => 0x7f +0x81 0x04 => 0x201 +0xff 0xa0 0x05 => 0x1fd005 +``` + +When decoding the deltas, the 32-bit value is read using the varint-7 encoding (with unsigned 32-bit wraparound). The least significant bit of the value indicates one of the baseline values; the remaining bits specify a zigzag-encoded signed delta and can be decoded as follows: + +``` +uint32_t decode(uint32_t v) { + uint32_t baseline = v & 1; + uint32_t delta = (v & 2) != 0 ? ~(v >> 2) : (v >> 2); + last[baseline] += delta; // unsigned 32-bit wraparound + return last[baseline]; +} +``` + +After decoding, the resulting value is emitted as a 32-bit unsigned integer (if `byteStride == 4`) or a 16-bit unsigned integer with wraparound (if `byteStride == 2`). + +It's up to the encoder to determine the optimal selection of the baseline for each index; this encoding scheme can be used to do basic delta encoding (with baseline bit always set to 0) as well as more complex bimodal encodings. Since zigzag-encoded delta uses a 31-bit integer, the deltas are limited to [-2^30..2^30-1]. + +# Appendix B: Filters + +Filters are functions that transform each encoded attribute. For each filter, this document specifies the transformation used for decoding the data; it's up to the encoder to pick the parameters of the encoding for each element to balance quality and precision. + +For performance reasons the results of the decoding process are specified to one unit in last place (ULP) in terms of the decoded data, e.g. if a filter results in a 16-bit signed normalized integer, decoding may produce results within 1/32767 of specified value. The exponential filter is an exception and must be decoded exactly. + +## Filter 1: octahedral + +Octahedral filter allows to encode unit length 3D vectors (normals/tangents) using octahedral encoding, which results in a more optimal quality vs precision tradeoff compared to storing raw components. + +This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, then the input and output of this filter are four 8-bit signed components, and when `byteStride` is 8, the input and output of this filter are four 16-bit signed components. + +The input to the filter is four 8-bit or 16-bit components, where the first two specify the X and Y components in octahedral encoding encoded as signed normalized K-bit integers (2 <= K <= 16, integers are stored in two's complement format), the third component explicitly encodes 1.0 as a signed normalized K-bit integer. The last component may contain arbitrary data which is passed through unfiltered (this can be useful for tangents). + +The encoding of the third component allows to compute K for each vector independently from the bit representation, and must encode 1.0 precisely which is equivalent to `(1 << (K - 1)) - 1` as an integer; values of the third component that aren't equal to `(1 << (K - 1)) - 1` for a valid `K` are invalid and the result of decoding such vectors is unspecified. + +When storing a K-bit integer in a 8-bit or 16-bit component when K is less than the component's bit width, the remaining bits (e.g. top 6 bits in case of K=10) must be equal to the sign bit; the valid range of the resulting integer is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified. + +The output of the filter is three decoded unit vector components, stored as 8-bit or 16-bit normalized integers, and the last input component verbatim. + +``` +void decode(intN_t input[4], intN_t output[4]) { + // input[2] encodes a K-bit representation of 1.0 + float32_t one = input[2]; + + float32_t x = input[0] / one; + float32_t y = input[1] / one; + float32_t z = 1.0 - abs(x) - abs(y); + + // octahedral fixup for negative hemisphere + float32_t t = min(z, 0.0); + + x -= copysign(t, x); + y -= copysign(t, y); + + // renormalize (x, y, z) + float32_t len = sqrt(x * x + y * y + z * z); + + x /= len; + y /= len; + z /= len; + + output[0] = round(x * INTN_MAX); + output[1] = round(y * INTN_MAX); + output[2] = round(z * INTN_MAX); + output[3] = input[3]; +} +``` + +`INTN_MAX` is equal to 127 when using 8-bit components (N is 8) and equal to 32767 when using 16-bit components (N is 16). + +`copysign` returns the value with the magnitude of the first argument and the sign of the second argument. + +`round` returns the nearest integer value, rounding halfway cases away from zero. + +## Filter 2: quaternion + +Quaternion filter allows to encode unit length quaternions using normalized 16-bit integers for all components, but allows control over the precision used for the components and provides better quality compared to naively encoding each component one by one. + +This filter is only valid if `byteStride` is 8. + +The input to the filter is three quaternion components, excluding the component with the largest magnitude, encoded as signed normalized K-bit integers (4 <= K <= 16, integers are stored in two's complement format), and an index of the largest component that is omitted in the encoding. The largest component is assumed to always be positive (which is possible due to quaternion double-cover). To allow per-element control over K, the last input element must explicitly encode 1.0 as a signed normalized K-bit integer, except for the least significant 2 bits that store the index of the maximum component. + +When storing a K-bit integer in a 16-bit component when K is less than the component's bit width, the remaining bits (e.g. top 6 bits in case of K=10) must be equal to the sign bit; the valid range of the resulting integer is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified. + +The output of the filter is four decoded quaternion components, stored as 16-bit normalized integers. + +After eliminating the maximum component, the maximum magnitude of the remaining components is `1.0/sqrt(2.0)`. Because of this the input components store the original component value scaled by `sqrt(2.0)` to increase precision. + +``` +void decode(int16_t input[4], int16_t output[4]) { + float32_t range = 1.0 / sqrt(2.0); + + // input[3] encodes a K-bit representation of 1.0 except for bottom two bits + float32_t one = input[3] | 3; + + float32_t x = input[0] / one * range; + float32_t y = input[1] / one * range; + float32_t z = input[2] / one * range; + + float32_t w = sqrt(max(0.0, 1.0 - x * x - y * y - z * z)); + + int maxcomp = input[3] & 3; + + // maxcomp specifies a cyclic rotation of the quaternion components + output[(maxcomp + 1) % 4] = round(x * 32767.0); + output[(maxcomp + 2) % 4] = round(y * 32767.0); + output[(maxcomp + 3) % 4] = round(z * 32767.0); + output[(maxcomp + 0) % 4] = round(w * 32767.0); +} +``` + +## Filter 3: exponential + +Exponential filter allows to encode floating point values with a range close to the full range of a 32-bit floating point value, but allows more control over the exponent/mantissa to trade quality for precision, and has a bit structure that is more optimally aligned to the byte boundary to facilitate better compression. + +This filter is only valid if `byteStride` is a multiple of 4. + +The input to the filter is a sequence of 32-bit little endian integers, with the most significant 8 bits specifying a (signed) exponent value, and the remaining 24 bits specifying a (signed) mantissa value. The integers are stored in two's complement format. + +The output of the filter is a sequence of 32-bit floating point values, represented according to IEEE 754 standard. Each value is computed from the integer input as `2^e * m`: + +``` +float32_t decode(int32_t input) { + int32_t e = input >> 24; + int32_t m = (input << 8) >> 8; + return pow(2.0, e) * m; +} +``` + +The valid range of `e` is [-100, +100], which facilitates performant implementations. Decoding out of range values results in unspecified behavior, and encoders are expected to clamp `e` to the valid range. + +## Filter 4: color + +Color filter allows to encode color data using YCoCg color model, which results in better compression for typical color data by exploiting correlation between color channels. + +This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, then the input and output of this filter are four 8-bit components, and when `byteStride` is 8, the input and output of this filter are four 16-bit components. + +The input to the filter is four 8-bit or 16-bit components, where the first component stores the Y (luma) value as a K-bit unsigned integer, the second and third components store Co/Cg (chrominance) values as K-bit signed integers, and the fourth component stores the alpha value as a K-1-bit unsigned integer with the bit K-1 set to 1 and more significant bits set to 0. K can be determined from the position of the most significant bit of the fourth component. 2 <= K <= 16, signed integers are stored in two's complement format. + +When storing a K-bit integer in a 8-bit or 16-bit component when K is less than the component's bit width, the remaining bits (e.g. top 6 bits in case of K=10) must be zero for the first and fourth component, and equal to the sign bit for the second and third component, which are signed; the valid range of the two signed integers is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified. + +The transformation uses YCoCg encoding; reconstruction of RGB values can be performed in integer space or in floating point space, depending on the implementation. The Y, Co and Cg values must be chosen so that the original RGB values can be reconstructed using 32-bit signed integer math, with the final result fitting into a K-bit unsigned integer ([0..2^K-1]). + +The output of the filter is four decoded color components (R, G, B, A), stored as 8-bit or 16-bit unsigned normalized integers. + +``` +void decode(uintN_t input[4], uintN_t output[4]) { + // recover scale from alpha high bit + int as = (1 << (findMSB(input[3]) + 1)) - 1; + + // convert to RGB in fixed point + int y = input[0], co = intN_t(input[1]), cg = intN_t(input[2]); + + int r = y + co - cg; + int g = y + cg; + int b = y - co - cg; + + // expand alpha by one bit to match other components, replicating least significant bit + int a = input[3] & (as >> 1); + a = (a << 1) | (a & 1); + + // compute scaling factor + float ss = UINTN_MAX / float(as); + + output[0] = round(float(r) * ss); + output[1] = round(float(g) * ss); + output[2] = round(float(b) * ss); + output[3] = round(float(a) * ss); +} + +// returns position of most significant bit set (0-based) +int findMSB(uintN_t v) { + for (int i = N - 1; i >= 0; --i) { + if (v & (1u << i)) { + return i; + } + } + return -1; +} +``` + +`UINTN_MAX` is equal to 255 when using 8-bit components (N is 8) and equal to 65535 when using 16-bit components (N is 16). + +# Appendix C: Differences from EXT_meshopt_compression + +This extension is derived from `EXT_meshopt_compression` with the following changes: + +- Vertex data supports an upgraded v1 format which provides more granular bit packing (via control modes) and enhanced delta encoding (via channel modes) to compress data better +- For compatibility, the v0 format (identical to `EXT_meshopt_compression` format) is still supported; however, use of v1 format is preferred +- New `COLOR` filter supports lossy color compression at higher compression ratios using YCoCg encoding + +These improvements achieve better compression ratios for typical glTF content while maintaining the same fast decompression performance. diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/schema/buffer.KHR_meshopt_compression.schema.json b/extensions/2.0/Khronos/KHR_meshopt_compression/schema/buffer.KHR_meshopt_compression.schema.json new file mode 100644 index 0000000000..1afcc64aac --- /dev/null +++ b/extensions/2.0/Khronos/KHR_meshopt_compression/schema/buffer.KHR_meshopt_compression.schema.json @@ -0,0 +1,16 @@ +{ + "$schema": "http://json-schema.org/draft-04/schema", + "title": "KHR_meshopt_compression buffer extension", + "type": "object", + "description": "Compressed data for bufferView.", + "allOf": [ { "$ref": "glTFProperty.schema.json" } ], + "properties": { + "fallback": { + "type": "boolean", + "description": "Set to true to indicate that the buffer is only referenced by bufferViews that have KHR_meshopt_compression extension and as such doesn't need to be loaded.", + "default": false + }, + "extensions": { }, + "extras": { } + } +} diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.KHR_meshopt_compression.schema.json b/extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.KHR_meshopt_compression.schema.json new file mode 100644 index 0000000000..7f4fc117a9 --- /dev/null +++ b/extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.KHR_meshopt_compression.schema.json @@ -0,0 +1,51 @@ +{ + "$schema": "http://json-schema.org/draft-04/schema", + "title": "KHR_meshopt_compression bufferView extension", + "type": "object", + "description": "Compressed data for bufferView.", + "allOf": [ { "$ref": "glTFProperty.schema.json" } ], + "properties": { + "buffer": { + "allOf": [ { "$ref": "glTFid.schema.json" } ], + "description": "The index of the buffer with compressed data." + }, + "byteOffset": { + "type": "integer", + "description": "The offset into the buffer in bytes.", + "minimum": 0, + "default": 0 + }, + "byteLength": { + "type": "integer", + "description": "The length of the compressed data in bytes.", + "minimum": 1 + }, + "byteStride": { + "type": "integer", + "description": "The stride, in bytes.", + "minimum": 2, + "maximum": 256 + }, + "count": { + "type": "integer", + "description": "The number of elements.", + "minimum": 1 + }, + "mode": { + "type": "string", + "description": "The compression mode.", + "enum": [ "ATTRIBUTES", "TRIANGLES", "INDICES" ] + }, + "filter": { + "type": "string", + "description": "The compression filter.", + "enum": [ "NONE", "OCTAHEDRAL", "QUATERNION", "EXPONENTIAL", "COLOR" ], + "default": "NONE" + }, + "extensions": { }, + "extras": { } + }, + "required": [ + "buffer", "byteLength", "byteStride", "count", "mode" + ] +} diff --git a/extensions/2.0/Khronos/KHR_node_visibility/README.md b/extensions/2.0/Khronos/KHR_node_visibility/README.md new file mode 100644 index 0000000000..7bbcc4ccf3 --- /dev/null +++ b/extensions/2.0/Khronos/KHR_node_visibility/README.md @@ -0,0 +1,112 @@ +# KHR\_node\_visibility + +## Contributors + +- Dwight Rodgers, Adobe +- Peter Martin, Adobe +- Emmett Lalish, Google +- Alexey Knyazev, Independent + +## Status + +Release Candidate + +## Dependencies + +Written against the glTF 2.0 spec. + +## Overview + +This extension allows to control visibility of node hierarchies. It is intended for use in conjunction with `KHR_animation_pointer` and/or interactivity extensions but can also be used on its own. + +## Extending Nodes + +The `KHR_node_visibility` extension object is added to the objects within the `nodes` array. The extension object contains a single boolean `visible` property. This value is mutable through JSON pointers as defined in the glTF 2.0 Asset Object Model and controls visibility of the node that contains it and all its children nodes recursively. A value of `false` causes all nodes below in the hierarchy to be omitted from display, even any nodes below that have a value of `true`. + +When a node is not visible, all its visual features including but not limited to meshes, light sources (e.g., attached with `KHR_lights_punctual`), point clouds, particles, billboards, volumetric effects, etc., are not rendered. Visibility affects neither cameras, nor node's interactivity features such as selectability or hoverability. + +| Property | Type | Description | Required | +|-------------|-----------|----------------------------------------|---------------------| +| **visible** | `boolean` | Specifies whether the node is visible. | No, default: `true` | + +In other words, a node is visible if and only if its own `visible` property is `true` and all its parents are visible. This allows a single change of a `visible` property at a high level of the hierarchy to hide or show complex (multi-node) objects. + +In the following example, both nodes (and therefore their meshes) are initially hidden. + +```json +{ + "nodes": [ + { + "children": [1], + "mesh": 0, + "extensions": { + "KHR_node_visibility": { + "visible": false + } + }, + }, + { + "mesh": 1 + } + ] +} +``` + +## Extending glTF 2.0 Asset Object Model + +The following pointer template represents the mutable property defined by this extension. + +| Pointer | Type | +|----------------------------------------------------|--------| +| `/nodes/{}/extensions/KHR_node_visibility/visible` | `bool` | + +## JSON Schema + +- [node.KHR_node_visibility.schema.json](schema/node.KHR_node_visibility.schema.json) + +## Appendix: Full Khronos Copyright Statement + +Copyright 2025 The Khronos Group Inc. + +This Specification is protected by copyright laws and contains material proprietary +to Khronos. Except as described by these terms, it or any components +may not be reproduced, republished, distributed, transmitted, displayed, broadcast +or otherwise exploited in any manner without the express prior written permission +of Khronos. + +Khronos grants a conditional copyright license to use and reproduce the unmodified +Specification for any purpose, without fee or royalty, EXCEPT no licenses to any patent, +trademark or other intellectual property rights are granted under these terms. + +Khronos makes no, and expressly disclaims any, representations or warranties, +express or implied, regarding this Specification, including, without limitation: +merchantability, fitness for a particular purpose, non-infringement of any +intellectual property, correctness, accuracy, completeness, timeliness, and +reliability. Under no circumstances will Khronos, or any of its Promoters, +Contributors or Members, or their respective partners, officers, directors, +employees, agents or representatives be liable for any damages, whether direct, +indirect, special or consequential damages for lost revenues, lost profits, or +otherwise, arising from or in connection with these materials. + +This specification has been created under the Khronos Intellectual Property Rights +Policy, which is Attachment A of the Khronos Group Membership Agreement available at +https://www.khronos.org/files/member_agreement.pdf. Khronos grants a conditional +copyright license to use and reproduce the unmodified specification for any purpose, +without fee or royalty, EXCEPT no licenses to any patent, trademark or other +intellectual property rights are granted under these terms. Parties desiring to +implement the specification and make use of Khronos trademarks in relation to that +implementation, and receive reciprocal patent license protection under the Khronos +IP Policy must become Adopters and confirm the implementation as conformant under +the process defined by Khronos for this specification; +see https://www.khronos.org/conformance/adopters/file-format-adopter-program. + +Where this Specification identifies specific sections of external references, only those +specifically identified sections define normative functionality. The Khronos Intellectual +Property Rights Policy excludes external references to materials and associated enabling +technology not created by Khronos from the Scope of this Specification, and any licenses +that may be required to implement such referenced materials and associated technologies +must be obtained separately and may involve royalty payments. + +Khronos® is a registered trademark, and glTF™ is a trademark of The Khronos Group Inc. All +other product names, trademarks, and/or company names are used solely for identification +and belong to their respective owners. \ No newline at end of file diff --git a/extensions/2.0/Khronos/KHR_node_visibility/schema/node.KHR_node_visibility.schema.json b/extensions/2.0/Khronos/KHR_node_visibility/schema/node.KHR_node_visibility.schema.json new file mode 100644 index 0000000000..d6bc68199d --- /dev/null +++ b/extensions/2.0/Khronos/KHR_node_visibility/schema/node.KHR_node_visibility.schema.json @@ -0,0 +1,17 @@ +{ + "$schema": "http://json-schema.org/draft-04/schema", + "title": "KHR_node_visibility glTF Node Extension", + "type": "object", + "description": "glTF extension that defines node's visibility.", + "allOf": [ { "$ref": "glTFProperty.schema.json" } ], + "properties": { + "visible": { + "type": "boolean", + "description": "Specifies whether the node is visible.", + "default": true, + "gltf_detailedDescription": "Specifies whether the node is visible. A value of false means that the node and all its children are hidden." + }, + "extensions": { }, + "extras": { } + } +} diff --git a/extensions/2.0/Vendor/EXT_mesh_primitive_restart/README.md b/extensions/2.0/Vendor/EXT_mesh_primitive_restart/README.md index 89153aa8be..b77497001c 100644 --- a/extensions/2.0/Vendor/EXT_mesh_primitive_restart/README.md +++ b/extensions/2.0/Vendor/EXT_mesh_primitive_restart/README.md @@ -1,3 +1,8 @@ + + # EXT_mesh_primitive_restart ## Contributors diff --git a/specification/2.0/ObjectModel.adoc b/specification/2.0/ObjectModel.adoc index a08b83b5f8..268e37df47 100644 --- a/specification/2.0/ObjectModel.adoc +++ b/specification/2.0/ObjectModel.adoc @@ -119,7 +119,7 @@ This document defines a set of <> that glTF implemen 1. The Object Model is defined only for valid glTF assets. Querying or setting properties of invalid glTF assets are undefined. -2. Upon loading an asset, an implementation registers specific glTF object properties for the Object Model by resolving JSON pointers identified by templates provided by this document to JSON properties of the asset being loaded. Undefined glTF properties that have schema-default values are considered defined with their default values. +2. Upon loading an asset, an implementation registers specific glTF object properties for the Object Model by resolving JSON pointers identified by templates provided by this document to JSON properties of the asset being loaded. Undefined glTF properties that have default values are considered defined with their default values. 3. Each instance of empty curly braces (`{}`) in the pointer templates is replaced with the corresponding array element index for each glTF asset property matching the template. @@ -185,8 +185,6 @@ The following pointer templates represent mutable properties defined in the core | `/materials/{}/pbrMetallicRoughness/baseColorFactor` | `float4` | `/materials/{}/pbrMetallicRoughness/metallicFactor` | `float` | `/materials/{}/pbrMetallicRoughness/roughnessFactor` | `float` -| `/meshes/{}/weights` | `float[]` -| `/meshes/{}/weights/{}` | `float` | `/nodes/{}/translation` | `float3` | `/nodes/{}/rotation` | `float4` | `/nodes/{}/scale` | `float3` @@ -194,10 +192,26 @@ The following pointer templates represent mutable properties defined in the core | `/nodes/{}/weights/{}` | `float` |==== +The `/nodes/{}/translation`, `/nodes/{}/rotation`, and `/nodes/{}/scale` pointers represent the current TRS properties of the node. If the static `matrix` property is defined on the node object in JSON, the corresponding rotation and scale pointers are undefined; the translation pointer is always defined. + +[NOTE] +.Rationale +==== +Since the glTF 2.0 specification allows negative scale factors, TRS matrix decomposition is ambiguous with regard to rotation and scale. Two implementations that choose different scale factor signs for the same matrix may produce two different rotation quaternions for it. Therefore, updating only the rotation or the scale may result in different node transformation matrices in different implementations. + +This restriction does not apply to the translation because the translation vector directly corresponds to the first three values of the last column of the transformation matrix regardless of the decomposition process. +==== + +The `/nodes/{}/weights` and `/nodes/{}/weights/{}` pointers represent the current morph target weights (as an array and as individual scalars respectively) of the mesh instantiated by the node regardless of whether the static `weights` property is defined on the node object in JSON. If the node instantiates no mesh or if the mesh has no morph targets, these pointers are undefined. + [NOTE] .Note ==== -As in the core glTF 2.0 Specification, lengths of the `weights` arrays match the number of the associated morph targets. +As in the core glTF 2.0 Specification, lengths of the `/nodes/{}/weights` arrays match the number of the associated morph targets. + +If a mesh defines default morph target weights (via its own `weights` JSON property), they are used as the `/nodes/{}/weights` and `/nodes/{}/weights/{}` default values for nodes that instantiate that mesh. + +If a mesh does not define default morph target weights, the `/nodes/{}/weights` and `/nodes/{}/weights/{}` default values are all zeros for nodes that instantiate that mesh. ==== Additionally, the following pointer templates represent read-only runtime properties. @@ -212,7 +226,6 @@ Additionally, the following pointer templates represent read-only runtime proper | `/meshes.length` | `int` | Number of meshes | `/meshes/{}/primitives.length` | `int` | Number of primitives | `/meshes/{}/primitives/{}/material` | `int` | Index of the material -| `/meshes/{}/weights.length` | `int` | Number of morph targets | `/nodes.length` | `int` | Number of nodes | `/nodes/{}/camera` | `int` | Index of the camera | `/nodes/{}/children.length` | `int` | Number of children nodes @@ -222,7 +235,7 @@ Additionally, the following pointer templates represent read-only runtime proper | `/nodes/{}/mesh` | `int` | Index of the mesh | `/nodes/{}/parent` | `int` | Index of the parent node | `/nodes/{}/skin` | `int` | Index of the skin -| `/nodes/{}/weights.length` | `int` | Number of the associated mesh's morph targets +| `/nodes/{}/weights.length` | `int` | Number of the instantiated mesh's morph targets | `/scene` | `int` | Index of the scene | `/scenes.length` | `int` | Number of scenes | `/scenes/{}/nodes.length` | `int` | Number of root nodes diff --git a/specification/2.0/Specification.adoc b/specification/2.0/Specification.adoc index cc19eb74e3..48d3649b30 100644 --- a/specification/2.0/Specification.adoc +++ b/specification/2.0/Specification.adoc @@ -3049,11 +3049,11 @@ with the Trowbridge-Reitz/GGX microfacet distribution D = \frac{\alpha^2 \, \chi^{+}(N \cdot H)}{\pi ((N \cdot H)^2 (\alpha^2 - 1) + 1)^2} ++++ -and the separable form of the Smith joint masking-shadowing function +and the height-correlated form of the Smith joint masking-shadowing function [latexmath] ++++ -G = \frac{2 \, \left| N \cdot L \right| \, \chi^{+}(H \cdot L)}{\left| N \cdot L \right| + \sqrt{\alpha^2 + (1 - \alpha^2) (N \cdot L)^2}} \frac{2 \, \left| N \cdot V \right| \, \chi^{+}(H \cdot V)}{\left| N \cdot V \right| + \sqrt{\alpha^2 + (1 - \alpha^2) (N \cdot V)^2}} +G = \frac{2 \, \left| N \cdot L \right| \, \left| N \cdot V \right| \, \chi^{+}(H \cdot L) \, \chi^{+}(H \cdot V)}{\left| N \cdot V \right| \, \sqrt{\alpha^2 + (1 - \alpha^2) (N \cdot L)^2} + \left| N \cdot L \right| \, \sqrt{\alpha^2 + (1 - \alpha^2) (N \cdot V)^2}} ++++ where χ^+^(*x*) denotes the Heaviside function: 1 if *x* > 0 and 0 if *x* <= 0. See <> for a derivation of the formulas. @@ -3076,7 +3076,7 @@ with [latexmath] ++++ -\nu = \frac{\, \chi^{+}(H \cdot L)}{\left| N \cdot L\right| + \sqrt{\alpha^2 + (1 - \alpha^2) (N \cdot L)^2}} \frac{\, \chi^{+}(H \cdot V)}{\left| N \cdot V \right| + \sqrt{\alpha^2 + (1 - \alpha^2) (N \cdot V)^2}} +\nu = \frac{\chi^{+}(H \cdot L) \, \chi^{+}(H \cdot V)}{2 \, (\left| N \cdot V \right| \, \sqrt{\alpha^2 + (1 - \alpha^2) (N \cdot L)^2} + \left| N \cdot L \right| \, \sqrt{\alpha^2 + (1 - \alpha^2) (N \cdot V)^2})} ++++ Thus, we have the function @@ -3088,6 +3088,11 @@ function specular_brdf(α) { } ---- +[NOTE] +==== +A roughness of zero (α = 0) cannot be evaluated directly with this formulation. As α approaches zero, the GGX distribution latexmath:[D] collapses to a delta function and the BRDF becomes singular, leading to divisions by zero or numerically unstable results. Implementations should therefore never use α = 0 in the equations above. Instead, α should be clamped to a small positive value, or the surface should be handled as an ideal specular (mirror) reflector using a separate code path. + +==== [[diffuse-brdf]] === Diffuse BRDF