Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/guides/interface_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ As one last note, if your model returns field quantity data, this will be stored
## Specifying a surrogate method
The `Component.interpolator` attribute provides the underlying surrogate "interpolation" method, i.e. the specific mathematical relationship that approximates the model's outputs as a function of its inputs. In this sense, we use the terms "interpolator" and "surrogate" interchangeably to mean the underlying approximation method -- the `Component.interpolator` does not necessarily have to "interpolate" the output by passing through all the training data directly. The naming convention mostly arises from the usage of polynomial interpolation in sparse grids.

Currently, the available methods are [Lagrange][amisc.interpolator.Lagrange] polynomial interpolation, which is set by default, [Linear][amisc.interpolator.Linear] regression, and [GPR][amisc.interpolator.GPR]. Multivariate Lagrange polynomials are formed by a tensor-product of univariate Lagrange polynomials in each input dimension, and integrate well with the `SparseGrid` data structure. Lagrange polynomials work well up to an input dimension of around 12-15 for sufficiently smooth functions. More details on how they work can be found in the [theory](../theory/polynomials.md) section. Linear regression and Gaussian Process Regressions are implemented through the [scikit-learn](https://scikit-learn.org/stable/) library. Linear Regression may optionally include polynomial features while Gaussian Process Regression may optionally include scalers.
Currently, the available methods are [Lagrange][amisc.interpolator.Lagrange] polynomial interpolation, which is set by default, [Linear][amisc.interpolator.Linear] regression, and [GPR][amisc.interpolator.GPR]. Multivariate Lagrange polynomials are formed by a tensor-product of univariate Lagrange polynomials in each input dimension, and integrate well with the `SparseGrid` data structure. Lagrange polynomials work well up to an input dimension of around 12-15 for sufficiently smooth functions. More details on how they work can be found in the [theory](../theory/polynomials.md) section. Linear regression and Gaussian Process Regressions are implemented through the [scikit-learn](https://scikit-learn.org/stable/) library. Linear Regression may optionally include polynomial features while Gaussian Process Regression may optionally include scalers.

You may configure the interpolation method via:
```python
Expand Down Expand Up @@ -322,6 +322,8 @@ Generally, the surrogate predictions (e.g. `Component.predict`) make use of `num
!!! Note "Loop dimensions"
If you pass a scalar input array of arbitrary shape (say $(100, 2, 5)$ for example), `Component.call_model()` treats all axes in the array as "loop dimensions". So effectively, all the axes will be flattened to a single 1d array of length `N = np.prod(input.shape)`, which would be $N=1000$ for an input shape of $(100, 2, 5)$. Then, `call_model` will loop over $N$, passing each input sample to the underlying model (unless `vectorized=True`). The outputs will be reshaped back to the original input "loop" shape before returning. If you do pass an input array for one variable, you should pass the same shape array for all other inputs, or at least an array that is broadcastable to the full loop shape. See [`numpy` broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html) rules.

Note that the leading dimensions of the returned arrays from `Component.call_model()` will always be the loop dimensions; the loop dimension for singleton values is assumed to be `(1,)`, and return arrays will always be at least a 1d numpy array (even if you pass in only floats for example).

The situation is a bit more complicated for non-scalar inputs (i.e. field quantities). You may still pass arrays of field quantities as inputs, so that they are looped over in the same fashion as scalar inputs. However, which axes are a "loop shape" vs. what axes are part of the field quantity's shape itself must be inferred or indicated. Generally, as long as you provide at least one scalar input, then the loop shape is known and the first leading axes of the field quantity that match the loop shape will be used as such, while the rest are assumed to be the field quantity's own shape. For example:
```python
x = np.random.rand(100, 2) # scalar
Expand Down
4 changes: 3 additions & 1 deletion src/amisc/component.py
Original file line number Diff line number Diff line change
Expand Up @@ -976,7 +976,9 @@ def call_model(self, inputs: dict | Dataset,
break
elif key == var:
if output_dict.get(key) is None:
output_dict.setdefault(key, np.full((N, *np.atleast_1d(val).shape), np.nan))
_val = np.atleast_1d(val)
_extra_shape = () if len(_val.shape) == 1 and _val.shape[0] == 1 else _val.shape
output_dict.setdefault(key, np.full((N, *_extra_shape), np.nan))
output_dict[key][i, ...] = np.atleast_1d(val)
is_component_var = True
break
Expand Down
13 changes: 3 additions & 10 deletions src/amisc/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -405,15 +405,14 @@ def _shorten_shape(name, array):
def format_outputs(outputs: Dataset, loop_shape: tuple[int, ...]) -> Dataset:
"""Reshape all outputs to the common loop shape. Loop shape is as obtained from a call to `format_inputs`.
Assumes that all outputs are the same along the first dimension. This first dimension gets reshaped back into
the `loop_shape`. Singleton outputs are squeezed along the last dimension. A singleton loop shape is squeezed
along the first dimension.
the `loop_shape`.

!!! Example
```python
outputs = {'x': np.random.rand(10, 1, 5), 'y': np.random.rand(10, 1), 'z': np.random.rand(10, 20, 3)}
loop_shape = (2, 5)
fmt_outputs = format_outputs(outputs, loop_shape)
# Output: {'x': np.ndarray(2, 5, 1, 5), 'y': np.ndarray(2, 5), 'z': np.ndarray(200, 3)}, (2, 5, 20, 3)
# Output: {'x': np.ndarray(2, 5, 1, 5), 'y': np.ndarray(2, 5, 1), 'z': np.ndarray(2, 5, 20, 3)}
```

:param outputs: `dict` of output arrays
Expand All @@ -423,13 +422,7 @@ def format_outputs(outputs: Dataset, loop_shape: tuple[int, ...]) -> Dataset:
output_dict = {}
for key, val in outputs.items():
val = np.atleast_1d(val)
output_shape = val.shape[1:] # Assumes (N, ...) output shape to start with
val = val.reshape(loop_shape + output_shape)
if output_shape == (1,):
val = np.atleast_1d(np.squeeze(val, axis=-1)) # Squeeze singleton outputs
if loop_shape == (1,):
val = np.atleast_1d(np.squeeze(val, axis=0)) # Squeeze singleton loop dimensions
output_dict[key] = val
output_dict[key] = val.reshape(loop_shape + val.shape[1:]) # Assumes (N, ...) output shape to start with
return output_dict


Expand Down