-
Notifications
You must be signed in to change notification settings - Fork 23
Add PhACE architecture #921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
770bde6
293c7a6
94df82e
9eff61f
2fd9b8d
6c8f1da
713cfbb
33e2c70
4b8d8c1
3377b2d
63bcfac
f79a36d
3c136bf
c33bcf2
4b2a877
129c9b9
d88cfcf
a675bfe
72a6502
fc33c70
a951638
92d2c80
45c2fc7
b8cd9ba
60b6b0d
4c614f7
172af0b
0f3bfd7
0e8e2f7
d632dc2
b65536a
cf62ee7
0ebcc9a
f7355dd
906d01a
d850528
5d6ccdb
f881951
de9a28e
4f89649
a9773f4
5cc0806
775be90
940bd3c
00f289e
758b9c5
6d99827
012f141
4146954
e93ac4a
e86192d
a1c2146
b8647f2
00434c6
b743030
b815435
21d6098
f3843d9
fd4a858
fb4a24d
239428d
15093de
2a52859
13da2b8
078fbd6
77f275b
af5c126
0370921
76b941f
13e7769
da544ff
0e6c46f
3ff9370
737c895
998b7ed
5da6e75
4e1c38a
04159f2
dd23a63
853ccf0
a777fcb
27e6104
3be99cc
89bddcc
160124f
79e8b30
8712d27
afb9059
e665c7a
aeff14f
4aa2a1c
5cc35c0
d0be159
a325942
a69dfa7
025048d
03c6df3
b6f5cd5
7b4ae53
a5416be
a5a102c
155aac5
6fcc7f4
46223b5
5328def
175264b
405c2cf
0c207fc
e158719
7d3bc7d
2a590b0
ab1446a
a15ae29
364bd9b
dff9612
7c73db6
d28583e
8296733
3a66aa9
f598aab
23a99fc
d35fd1b
7754e6d
59e3363
e6993e4
d4c59b6
7d2c940
d46ca0f
b137a4a
5dbc9ed
40ef3ad
ae90f6f
5c5f3f4
9fb5a93
a6f0dad
22014dd
8fcb14e
7cf75d2
4a2f5ec
6223234
8d19694
596ff08
04a58a0
497c538
be3e37f
204c37f
a8aea20
df5414d
98963f9
f44b15c
08cce72
3dbb2af
2d56150
0294053
f3ecf4f
8e9a96d
cde75c9
f90b32d
d3bf923
a8b5987
c0665f3
025c756
30b67c3
2e89ac3
206493d
ed8e64d
3bd5a12
9b6d871
134e670
2f50d70
ac719b4
3962929
37ed487
3bd43ad
da82cc8
ebb35b2
cdad8b4
ffd4638
d1ca76e
1595899
cada851
e92fb54
32961f4
d6a5681
632daff
d69a4aa
dee4694
f04946c
a99b56e
f252cb3
bce45de
4fbe9ff
c37649e
40fbbdd
1546f1d
caa6d02
0fa4094
0a9e743
499d089
962acfc
269c402
a89d865
9cc65a0
3a2c722
a38ddec
202f14c
43db1f3
6f3f928
613fb5b
c29a65c
93baa32
dd564e3
077dfaa
07e29d4
a52b409
e16f3a4
8b8d1df
ba361cd
58be984
994e13b
6aa345a
5b75207
894ebf1
de4608f
fef73ca
fefd0a6
c8c6eaa
fb922cf
96b7937
28268df
ff71154
02c4922
573f156
5607d2d
98b3289
898bbd7
4117ce3
384210a
ad6945b
398a797
bd259b0
b4eedf0
4938935
bfaeea2
34b1203
c21e387
c75b088
0ded1cc
d7f1d70
4ca5b62
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,7 +20,8 @@ jobs: | |
| - mace | ||
| - nanopet | ||
| - pet | ||
| - soap-bpnn | ||
| - phace | ||
| - soap-bpnn | ||
|
|
||
| runs-on: ubuntu-22.04 | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| from .model import PhACE | ||
| from .trainer import Trainer | ||
|
|
||
|
|
||
| __model__ = PhACE | ||
| __trainer__ = Trainer | ||
|
|
||
| __authors__ = [ | ||
| ("Filippo Bigi <filippo.bigi@epfl.ch>", "@frostedoyster"), | ||
| ] | ||
|
|
||
| __maintainers__ = [ | ||
| ("Filippo Bigi <filippo.bigi@epfl.ch>", "@frostedoyster"), | ||
| ] |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,280 @@ | ||||||
| """ | ||||||
| PhACE | ||||||
| ===== | ||||||
|
|
||||||
| PhACE is a physics-inspired equivariant neural network architecture. Compared to, for | ||||||
| example, MACE and GRACE, it uses a geometrically motivated basis and a fast and | ||||||
| elegant tensor product implementation. The tensor product used in PhACE leverages a | ||||||
| equivariant representation that differs from the typical spherical one. You can read | ||||||
| more about it here: https://pubs.acs.org/doi/10.1021/acs.jpclett.4c02376. | ||||||
|
|
||||||
| {{SECTION_INSTALLATION}} | ||||||
|
|
||||||
| {{SECTION_DEFAULT_HYPERS}} | ||||||
|
|
||||||
| Tuning hyperparameters | ||||||
| ---------------------- | ||||||
|
|
||||||
| The default hyperparameters above will work well in most cases, but they | ||||||
| may not be optimal for your specific use case. There is good number of | ||||||
| parameters to tune, both for the | ||||||
| :ref:`model <architecture-{{architecture}}_model_hypers>` and the | ||||||
| :ref:`trainer <architecture-{{architecture}}_trainer_hypers>`. Here, we provide a | ||||||
| **list of the parameters that are in general the most important** (in decreasing order | ||||||
| of importance) for the PhACE architecture: | ||||||
|
|
||||||
| .. container:: mtt-hypers-remove-classname | ||||||
|
|
||||||
| .. autoattribute:: {{model_hypers_path}}.radial_basis | ||||||
| :no-index: | ||||||
|
|
||||||
| .. autoattribute:: {{model_hypers_path}}.num_element_channels | ||||||
| :no-index: | ||||||
|
|
||||||
| .. autoattribute:: {{trainer_hypers_path}}.num_epochs | ||||||
| :no-index: | ||||||
|
|
||||||
| .. autoattribute:: {{trainer_hypers_path}}.batch_size | ||||||
| :no-index: | ||||||
|
|
||||||
| .. autoattribute:: {{model_hypers_path}}.num_message_passing_layers | ||||||
| :no-index: | ||||||
|
|
||||||
| .. autoattribute:: {{trainer_hypers_path}}.learning_rate | ||||||
| :no-index: | ||||||
|
|
||||||
| .. autoattribute:: {{model_hypers_path}}.cutoff | ||||||
| :no-index: | ||||||
|
|
||||||
| .. autoattribute:: {{model_hypers_path}}.force_rectangular | ||||||
| :no-index: | ||||||
|
|
||||||
| .. autoattribute:: {{model_hypers_path}}.spherical_linear_layers | ||||||
| :no-index: | ||||||
| """ | ||||||
|
|
||||||
| from typing import Literal, Optional | ||||||
|
|
||||||
| from typing_extensions import TypedDict | ||||||
|
|
||||||
| from metatrain.utils.additive import FixedCompositionWeights | ||||||
| from metatrain.utils.hypers import init_with_defaults | ||||||
| from metatrain.utils.loss import LossSpecification | ||||||
| from metatrain.utils.scaler import FixedScalerWeights | ||||||
|
|
||||||
|
|
||||||
| class RadialBasisHypers(TypedDict): | ||||||
| """Hyperparameter concerning the radial basis functions used in the model.""" | ||||||
|
|
||||||
| max_eigenvalue: float = 25.0 | ||||||
| """Maximum eigenvalue for the radial basis.""" | ||||||
|
|
||||||
| scale: float = 0.7 | ||||||
| """Scaling factor for the radial basis.""" | ||||||
|
|
||||||
| optimizable_lengthscales: bool = False | ||||||
| """Whether the length scales in the radial basis are optimizable.""" | ||||||
|
|
||||||
|
|
||||||
| ########################### | ||||||
| # MODEL HYPERPARAMETERS # | ||||||
| ########################### | ||||||
|
|
||||||
|
|
||||||
| class ModelHypers(TypedDict): | ||||||
| """Hyperparameters for the experimental.phace model.""" | ||||||
|
|
||||||
| max_correlation_order_per_layer: int = 3 | ||||||
| """Maximum correlation order per layer.""" | ||||||
|
|
||||||
| num_message_passing_layers: int = 2 | ||||||
| """Number of message passing layers. | ||||||
|
|
||||||
| Increasing this value might increase the accuracy of the model (especially on | ||||||
| larger datasets), at the expense of computational efficiency. | ||||||
| """ | ||||||
|
|
||||||
| cutoff: float = 5.0 | ||||||
| """Cutoff radius for neighbor search. | ||||||
|
|
||||||
| This should be set to a value after which most of the interactions | ||||||
| between atoms is expected to be negligible. A lower cutoff will lead | ||||||
| to faster models. | ||||||
| """ | ||||||
|
|
||||||
| cutoff_width: float = 1.0 | ||||||
| """Width of the cutoff smoothing function.""" | ||||||
|
|
||||||
| num_element_channels: int = 128 | ||||||
| """Number of channels per element. | ||||||
|
|
||||||
| This determines the size of the embedding used to encode the atomic species, and it | ||||||
| increases or decreases the size of the internal features used in the model. | ||||||
| """ | ||||||
|
|
||||||
| force_rectangular: bool = False | ||||||
| """Makes the number of channels per irrep the same. | ||||||
|
|
||||||
| This might improve accuracy with a limited increase in computational cost. | ||||||
| """ | ||||||
|
|
||||||
| spherical_linear_layers: bool = False | ||||||
| """Whether to perform linear layers in the spherical representation.""" | ||||||
|
|
||||||
| radial_basis: RadialBasisHypers = init_with_defaults(RadialBasisHypers) | ||||||
| """Hyperparameters for the radial basis functions. | ||||||
|
|
||||||
| Raising``max_eigenvalue`` from its default will increase the number of spherical | ||||||
| irreducible representations (irreps) used in the model, which can improve accuracy | ||||||
| at the cost of computational efficiency. Increasing this value will also increase | ||||||
| the number of radial basis functions (and therefore internal features) used for each | ||||||
| irrep. | ||||||
| """ | ||||||
|
|
||||||
| nu_scaling: float = 0.1 | ||||||
| """Scaling for the nu term.""" | ||||||
|
|
||||||
| mp_scaling: float = 0.1 | ||||||
| """Scaling for message passing.""" | ||||||
|
|
||||||
| overall_scaling: float = 1.0 | ||||||
| """Overall scaling factor.""" | ||||||
|
|
||||||
| disable_nu_0: bool = True | ||||||
| """Whether to disable nu=0.""" | ||||||
|
|
||||||
| use_sphericart: bool = False | ||||||
| """Whether to use spherical Cartesian coordinates.""" | ||||||
|
|
||||||
| head_num_layers: int = 1 | ||||||
| """Number of layers in the head.""" | ||||||
|
|
||||||
| heads: dict[str, Literal["linear", "mlp"]] = {} | ||||||
| """Heads to use in the model, with options being "linear" or "mlp".""" | ||||||
|
|
||||||
| zbl: bool = False | ||||||
| """Whether to use the ZBL potential in the model.""" | ||||||
|
|
||||||
|
|
||||||
| ############################## | ||||||
| # TRAINER HYPERPARAMETERS # | ||||||
| ############################## | ||||||
|
|
||||||
|
|
||||||
| class TrainerHypers(TypedDict): | ||||||
| """Hyperparameters for training the experimental.phace model.""" | ||||||
|
|
||||||
| compile: bool = True | ||||||
| """Whether to use `torch.compile` during training. | ||||||
|
|
||||||
| This can lead to significant speedups, but it will cause a compilation step at the | ||||||
| beginning of training which might take up to 5-10 minutes, mainly depending on | ||||||
| ``max_eigenvalue``. | ||||||
| """ | ||||||
|
|
||||||
| distributed: bool = False | ||||||
| """Whether to use distributed training.""" | ||||||
|
|
||||||
| distributed_port: int = 39591 | ||||||
| """Port for DDP communication.""" | ||||||
|
|
||||||
| batch_size: int = 8 | ||||||
| """Batch size for training. | ||||||
|
|
||||||
| Decrease this value if you run into out-of-memory errors during training. You can | ||||||
| try to increase it if your structures are very small (less than 20 atoms) and you | ||||||
| have a good GPU. | ||||||
| """ | ||||||
|
|
||||||
| num_epochs: int = 1000 | ||||||
| """Number of epochs to train the model. | ||||||
|
|
||||||
| A larger number of epochs might lead to better accuracy. In general, if you see | ||||||
| that the validation metrics are not much worse than the training ones at the end of | ||||||
| training, it might be a good idea to increase this value. | ||||||
| """ | ||||||
|
|
||||||
| learning_rate: float = 0.01 | ||||||
| """Learning rate for the optimizer. | ||||||
|
|
||||||
| You can try to increase this value (e.g., to 0.02 or 0.03) if training is very | ||||||
| slow or decrease it (e.g., to 0.005 or less) if you see that training explodes in | ||||||
| the first few epochs. | ||||||
| """ | ||||||
|
|
||||||
| warmup_fraction: float = 0.01 | ||||||
| """Fraction of training steps for learning rate warmup.""" | ||||||
|
|
||||||
| gradient_clipping: Optional[float] = None | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed - shall we set a default here?
Suggested change
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I think so. I'm running some more experiments to understand what a good default could be |
||||||
| """Gradient clipping value. If None, no clipping is applied.""" | ||||||
|
|
||||||
| log_interval: int = 1 | ||||||
| """Interval to log metrics during training.""" | ||||||
|
|
||||||
| checkpoint_interval: int = 25 | ||||||
| """Interval to save model checkpoints.""" | ||||||
|
|
||||||
| scale_targets: bool = True | ||||||
| """Whether to scale targets during training.""" | ||||||
|
|
||||||
| atomic_baseline: FixedCompositionWeights = {} | ||||||
| """The baselines for each target. | ||||||
|
|
||||||
| By default, ``metatrain`` will fit a linear model (:class:`CompositionModel | ||||||
| <metatrain.utils.additive.composition.CompositionModel>`) to compute the | ||||||
| least squares baseline for each atomic species for each target. | ||||||
|
|
||||||
| However, this hyperparameter allows you to provide your own baselines. | ||||||
| The value of the hyperparameter should be a dictionary where the keys are the | ||||||
| target names, and the values are either (1) a single baseline to be used for | ||||||
| all atomic types, or (2) a dictionary mapping atomic types to their baselines. | ||||||
| For example: | ||||||
|
|
||||||
| - ``atomic_baseline: {"energy": {1: -0.5, 6: -10.0}}`` will fix the energy | ||||||
| baseline for hydrogen (Z=1) to -0.5 and for carbon (Z=6) to -10.0, while | ||||||
| fitting the baselines for the energy of all other atomic types, as well | ||||||
| as fitting the baselines for all other targets. | ||||||
| - ``atomic_baseline: {"energy": -5.0}`` will fix the energy baseline for | ||||||
| all atomic types to -5.0. | ||||||
| - ``atomic_baseline: {"mtt:dos": 0.0}`` sets the baseline for the "mtt:dos" | ||||||
| target to 0.0, effectively disabling the atomic baseline for that target. | ||||||
|
|
||||||
| This atomic baseline is substracted from the targets during training, which | ||||||
| avoids the main model needing to learn atomic contributions, and likely makes | ||||||
| training easier. When the model is used in evaluation mode, the atomic baseline | ||||||
| is added on top of the model predictions automatically. | ||||||
|
|
||||||
| .. note:: | ||||||
| This atomic baseline is a per-atom contribution. Therefore, if the property | ||||||
| you are predicting is a sum over all atoms (e.g., total energy), the | ||||||
| contribution of the atomic baseline to the total property will be the | ||||||
| atomic baseline multiplied by the number of atoms of that type in the | ||||||
| structure. | ||||||
|
|
||||||
| .. note:: | ||||||
| If a MACE model is loaded through the ``mace_model`` hyperparameter, the | ||||||
| atomic baselines in the MACE model are used by default for the target | ||||||
| indicated in ``mace_head_target``. If you want to override them, you need | ||||||
| to set explicitly the baselines for that target in this hyperparameter. | ||||||
| """ | ||||||
|
|
||||||
| fixed_scaling_weights: FixedScalerWeights = {} | ||||||
| """Fixed scaling weights for the model.""" | ||||||
|
|
||||||
| num_workers: Optional[int] = None | ||||||
| """Number of workers for data loading.""" | ||||||
|
|
||||||
| per_structure_targets: list[str] = [] | ||||||
| """List of targets to calculate per-structure losses.""" | ||||||
|
|
||||||
| log_separate_blocks: bool = False | ||||||
| """Whether to log per-block error during training.""" | ||||||
|
|
||||||
| log_mae: bool = False | ||||||
| """Whether to log MAE alongside RMSE during training.""" | ||||||
|
|
||||||
| best_model_metric: Literal["rmse_prod", "mae_prod", "loss"] = "rmse_prod" | ||||||
| """Metric used to select the best model checkpoint.""" | ||||||
|
|
||||||
| loss: str | dict[str, LossSpecification] = "mse" | ||||||
| """Loss function used for training.""" | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we copy the trainer hypers that are common to PET from the PET trainer, so that they have more extensive documentation? This would also rename |
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this needs tuning with respect to the max L of the target, some useful heuristics on how to set this would be useful. For instance, with all other parameters as default, I found I had to set it to
max_eigenvalue: 200.0to reach L=8.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the laplacian eigenvalues paper uses nmax as the only parameter, and then finds lmax as a consequence. That is far more user-friendly then asking a mysterious eigenvalue. IDK if this applies also with the physical weighting, but it should
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on this one @frostedoyster ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good idea because there are many exact truncations that you can't get (or are ambiguous) if you move to
n_maxorl_max. I would rather add more documentation to the eigenvalue