Skip to content

Conversation

@riclarsson
Copy link
Contributor

This moves a lot of cpp-code to headers. It also introduces nonstd::pow.

The former will likely make some compile-times a lot slower. The latter expresses std::pow(v, x) as std::exp(x * std::log(v)), since logarithms and natural exponentials are built-in commands. All tests complete without issues and the numerical error in the cases I tested was actually 0, so I guess there is simply some logic involved and we end up calling that expression regardless.

I ran some tests on all this. In the lbl_temperature_X tests, 10 million calls to the respective temperature models were done. The input is generated by RNG. There are four scenarios. The keyword "Original" means ARTS3 inlining today and "Inline" means the code is in the header. The other change, "using exp() and log() instead of pow()" means as above. Note that some methods below were already inlined as they are constexpr-able.

The short of all this is that there is a 10x speed-up in the most commonly used line-shape function. This is the function call that I often refers to as "as slow as 1000 frequency points", so hopefully, the linear growth in time complexity with frequency will now start at 100 frequency points rather than 1000.

Original:

Method Average Time Min Time Max Time # Calls
lbl_temperature_aer 6.94 ms 6.94 ms 6.94 ms 1
lbl_temperature_dpl 170.89 ms 170.89 ms 170.89 ms 1
lbl_temperature_poly 24.00 ms 24.00 ms 24.00 ms 1
lbl_temperature_t0 6.59 ms 6.59 ms 6.59 ms 1
lbl_temperature_t1 152.56 ms 152.56 ms 152.56 ms 1
lbl_temperature_t2 157.69 ms 157.69 ms 157.69 ms 1
lbl_temperature_t3 6.96 ms 6.96 ms 6.96 ms 1
lbl_temperature_t4 155.86 ms 155.86 ms 155.86 ms 1
lbl_temperature_t5 155.66 ms 155.66 ms 155.66 ms 1

Original using exp() and log() instead of pow():

Method Average Time Min Time Max Time # Calls
lbl_temperature_aer 6.94 ms 6.94 ms 6.94 ms 1
lbl_temperature_dpl 159.35 ms 159.35 ms 159.35 ms 1
lbl_temperature_poly 24.06 ms 24.06 ms 24.06 ms 1
lbl_temperature_t0 6.51 ms 6.51 ms 6.51 ms 1
lbl_temperature_t1 148.66 ms 148.66 ms 148.66 ms 1
lbl_temperature_t2 157.67 ms 157.67 ms 157.67 ms 1
lbl_temperature_t3 6.66 ms 6.66 ms 6.66 ms 1
lbl_temperature_t4 153.97 ms 153.97 ms 153.97 ms 1
lbl_temperature_t5 150.23 ms 150.23 ms 150.23 ms 1

Inline:

Method Average Time Min Time Max Time # Calls
lbl_temperature_aer 6.98 ms 6.98 ms 6.98 ms 1
lbl_temperature_dpl 117.90 ms 117.90 ms 117.90 ms 1
lbl_temperature_poly 14.92 ms 14.92 ms 14.92 ms 1
lbl_temperature_t0 6.57 ms 6.57 ms 6.57 ms 1
lbl_temperature_t1 57.71 ms 57.71 ms 57.71 ms 1
lbl_temperature_t2 61.02 ms 61.02 ms 61.02 ms 1
lbl_temperature_t3 6.97 ms 6.97 ms 6.97 ms 1
lbl_temperature_t4 57.97 ms 57.97 ms 57.97 ms 1
lbl_temperature_t5 58.04 ms 58.04 ms 58.04 ms 1

Inline using exp() and log() instead of pow():

Method Average Time Min Time Max Time # Calls
lbl_temperature_aer 6.94 ms 6.94 ms 6.94 ms 1
lbl_temperature_dpl 33.91 ms 33.91 ms 33.91 ms 1
lbl_temperature_poly 15.17 ms 15.17 ms 15.17 ms 1
lbl_temperature_t0 6.00 ms 6.00 ms 6.00 ms 1
lbl_temperature_t1 16.99 ms 16.99 ms 16.99 ms 1
lbl_temperature_t2 18.97 ms 18.97 ms 18.97 ms 1
lbl_temperature_t3 6.51 ms 6.51 ms 6.51 ms 1
lbl_temperature_t4 17.68 ms 17.68 ms 17.68 ms 1
lbl_temperature_t5 18.23 ms 18.23 ms 18.23 ms 1

The above is all clang so I tested GCC for the original and best scenario.

Original:

Method Average Time Min Time Max Time # Calls
lbl_temperature_aer 6.66 ms 6.66 ms 6.66 ms 1
lbl_temperature_dpl 126.01 ms 126.01 ms 126.01 ms 1
lbl_temperature_poly 24.94 ms 24.94 ms 24.94 ms 1
lbl_temperature_t0 6.70 ms 6.70 ms 6.70 ms 1
lbl_temperature_t1 67.43 ms 67.43 ms 67.43 ms 1
lbl_temperature_t2 94.36 ms 94.36 ms 94.36 ms 1
lbl_temperature_t3 6.73 ms 6.73 ms 6.73 ms 1
lbl_temperature_t4 73.14 ms 73.14 ms 73.14 ms 1
lbl_temperature_t5 67.67 ms 67.67 ms 67.67 ms 1

Inline using exp() and log() instead of pow():

Method Average Time Min Time Max Time # Calls
lbl_temperature_aer 6.66 ms 6.66 ms 6.66 ms 1
lbl_temperature_dpl 34.83 ms 34.83 ms 34.83 ms 1
lbl_temperature_poly 15.41 ms 15.41 ms 15.41 ms 1
lbl_temperature_t0 6.79 ms 6.79 ms 6.79 ms 1
lbl_temperature_t1 17.54 ms 17.54 ms 17.54 ms 1
lbl_temperature_t2 18.46 ms 18.46 ms 18.46 ms 1
lbl_temperature_t3 6.66 ms 6.66 ms 6.66 ms 1
lbl_temperature_t4 18.07 ms 18.07 ms 18.07 ms 1
lbl_temperature_t5 17.89 ms 17.89 ms 17.89 ms 1

@riclarsson
Copy link
Contributor Author

@olemke I am concerned about the error in disort.

I was looking through the code to see if there were some errors from this optimization (and there were, so those are now fixed with a spiffed up version of Rational). But logically, this should not be happening by my change.

The only difference between LGPL and normal ARTS behavior for DISORT-CPP is that I rely on Lapack for the matrix eigenvalues instead of my own port of the code. This means there seem to be an update to OpenBLAS causing this issue - they released a new version 5 days ago. How do we deal with this? Should I reduce the test accuracy demand? Or open a bug towards OpenBLAS?

@olemke
Copy link
Member

olemke commented Jan 21, 2026

Something has changed in the packaging of libopenblas. Even though libopenblas is installed, cmake decided to link against libblas (the netlib version).

I'll commit a fix that worked locally by changing our dependency from libopenblas to openblas. Seems that openblas installs additional files that makes cmake detect it correctly again.

@riclarsson riclarsson merged commit 9d2a06c into atmtools:main Jan 21, 2026
8 checks passed
@riclarsson riclarsson deleted the inline-stuff branch January 21, 2026 09:00
@riclarsson
Copy link
Contributor Author

@olemke Thank you for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants