Problem fitting bulk polarizabilities for H2O, HCl, and LiCl

Hello!

I am currently trying to use TENSOAP to fit the system polarizability tensors for relatively small boxes of H2O, HCl in water, and LiCl in water. The system sizes are specified as follows:

- H2O: 384 atoms (128 molecules) in a 12.42 Angstrom box
- HCl: 308 atoms in a 14.897 Angstrom box
- LiCl: 337 atoms in a 15.165 Angstrom box

Since I am just trying to test this out on nodes with relatively low CPU memory (64GB of RAM max) I am using small sets of frames for each system of size 50, 100, 200, and 300. To set up my polarizability fitting, I am following the example in the README for doing bulk water with sparsification, and my script setup is as follows for each system:

```
#Get the L = 0 kernel without sparsification
sagpr_get_PS -f coords_with_polar.xyz -lm 0 -p -o PS0 
sagpr_get_kernel -ps PS0.npy -z 2 -s PS0_natoms.npy -o kernel0

#Get L = 2 kernel with sparsification
#Sparsify on all configurations but retaining only 100 components
sagpr_get_PS -f coords_with_polar.xyz -lm 2 -p -nc 100 -o PS2 
#Get the sparse power spectrum
sagpr_get_PS -f coords_with_polar.xyz -lm 2 -p -sf PS2 -o PS2_sparse 
#Get the L = 2 kernel
sagpr_get_kernel -z 2 -ps PS2_sparse.npy -ps0 PS0.npy -s PS2_sparse_natoms.npy -o kernel2

#Fit the polarizability, adjusting -rdm to be half train half test (e.g. 100 frames so 50 train, 50 test)
sagpr_train -r 2 -reg 1e-8 1e-5 -f coords_with_polar.xyz -k kernel0.npy kernel2.npy -p polarizability -rdm 50 -pr -t 1.0 
```

With 50 frames and an `-rdm` value of 25, this workflow works for all three systems and I can generate a `predictions_cartesian.txt` file which contains the 9 components of the polarizability tensor. However, jumping even to 100 frames, this workflow only works for H2O and not for HCl or LiCl where the code only gets as far as generating the `predictions_L0.txt` file for predictions of the L=0 component. For LiCl and HCl, I encounter the following error:

```
Loading kernel matrices...
...Kernels loaded.

testing data points:  50  
training data points:  50  
--------------------------------
RESULTS FOR L=0 MODULI (lambda=0.000000)
-----------------------------------------------------
STD 8.857322520103208
ABS RMSE 1.7207289535930423
RMSE = 19.4272 %
Traceback (most recent call last):
  File "/home/frankhu/TENSOAP/bin/sagpr_train", line 442, in <module>
    main()
  File "/home/frankhu/TENSOAP/bin/sagpr_train", line 48, in main
    [ov, tv, na] = sagpr_utils.do_sagpr_spherical(kernel[l],spherical_tensor[l],reg[l],rank_str=str_rank,nat=nat,fractrain=fractrain,rdm=rdm,sel=sel,peratom=peratom,prediction=pre
  File "/home/frankhu/TENSOAP/soapfast/utils/sagpr_utils.py", line 92, in do_sagpr_spherical
    invktrvec = get_weights(ktrain,vtrain_part,mode,jitter)
  File "/home/frankhu/TENSOAP/soapfast/utils/sagpr_utils.py", line 18, in get_weights
    return scipy.linalg.solve(ktrain,vtrain_part)
  File "/home/frankhu/mambaforge/envs/tensoap_env/lib/python3.10/site-packages/scipy/linalg/_basic.py", line 191, in solve
    raise ValueError('Input b has to have same number of rows as '
ValueError: Input b has to have same number of rows as input a
```
Once I move on to 200 frames, this problem affects H2O and HCl (LiCl runs out of memory on my system), and the same is true for 300 frames. Any help or suggestions on how to resolve this issue, or on how to fit these systems in general, would be greatly appreciated as this is my first time using TENSOAP.

Thank you very much!

Frank

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem fitting bulk polarizabilities for H2O, HCl, and LiCl #19

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problem fitting bulk polarizabilities for H2O, HCl, and LiCl #19

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions