neo_bspline: 1D B-spline + matrix-free CGLS LSQ#151
neo_bspline: 1D B-spline + matrix-free CGLS LSQ#151krystophny wants to merge 15 commits intomainfrom
Conversation
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
||||||||||||||||||||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||||||||||
Update: Unified High-Performance ImplementationThis commit consolidates all B-spline functionality (1D/2D/3D) into a single optimized module with grid-based API: Key Changes
Benchmark Results (up to 100,000 points)3D Performance (cubic B-splines, degree 3):
Evaluation Performance (3D):
Note: The LSQ construction in All 44 tests pass. |
Cache spans and basis function values before CG iterations, then reuse them through cached apply_A3D and apply_A3D_T operators. This eliminates redundant find_span and basis_funs calls in each iteration. Benchmark results for 9261 3D data points: - Before: 1.05s - After: 0.26s (4x speedup)
Consolidate all B-spline functionality (1D/2D/3D) into a single module with optimized grid-based API exploiting separable tensor product structure: - Remove neo_bspline_3d.f90 and merge into unified neo_bspline.f90 - Precompute basis functions per grid line (O(n1+n2+n3) storage vs O(n*n*n)) - OpenMP parallel do + SIMD for construction operators - SIMD-only evaluation (no thread overhead for single-point queries) - New grid-based LSQ CGLS API: bspline_Nd_lsq_cgls(spl, x1, x2, ..., f_grid, coeff) - Update all tests and benchmarks for new API - 44/44 tests pass This enables efficient least-squares fitting on regular grids by caching basis functions once per coordinate axis instead of per grid point.
…rpolation Reorganize the B-spline code into clean, single-responsibility modules: - neo_bspline_base: core types, initialization, evaluation, basis functions - neo_bspline_interp: direct interpolation via LAPACK collocation solve - neo_bspline_lsq: least-squares fitting via matrix-free CGLS The direct interpolation variant solves A*c = f where A_ij = B_j(x_i), using separable tensor-product structure in 2D/3D to reduce dense solve to dimension-by-dimension LU factorizations. OpenMP parallelizes the back-substitution loops in 3D. Benchmarks updated to compare all three methods: interpolate module, neo_bspline LSQ, and neo_bspline direct. Large grid sizes skip direct interpolation (O(n^3) scaling) to avoid timeouts.
6b4b71e to
d5355ef
Compare
User description
This PR introduces a new 1D B-spline module with a textbook Cox–de Boor basis and a matrix-free CGLS LSQ solver, plus tests and plotting utilities.
Highlights:
[1/9] cd /home/ert/code/libneo/extra/MyMPILib && /home/ert/code/libneo/extra/MyMPILib/Scripts/do_versioning.sh
Versioning MyMPILib...
[2/9] Building Fortran preprocessed extra/MyMPILib/CMakeFiles/MyMPILib.dir/Specific/mpiprovider_module.f90-pp.f90
[3/9] Generating Fortran dyndep file extra/MyMPILib/CMakeFiles/MyMPILib.dir/Fortran.dd
[4/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Specific/mpiprovider_module.f90.o
[5/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Generic/genericWorkunit_module.f90.o
[6/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Generic/initWorkunit_module.f90.o
[7/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Internal/wuMergeWorkunit_module.f90.o
[8/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Internal/wuMergeChunk_module.f90.o
[9/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Internal/wuDataRequester_module.f90.o
[10/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Generic/scheduler_module.f90.o
[11/15] Linking Fortran static library extra/MyMPILib/libMyMPILib.a
[12/15] Building Fortran object test/CMakeFiles/test_mympilib.x.dir/source/derived_scheduler_module.f90.o
[13/15] Linking Fortran executable test/test_arnoldi.x
[14/15] Building Fortran object test/CMakeFiles/test_mympilib.x.dir/source/test_mympilib.f90.o
[15/15] Linking Fortran executable test/test_mympilib.x
cd build && ctest
Test project /home/ert/code/libneo/build
Start 1: test_binsrc
1/42 Test Make MPI optional and minimize dependencies on Fortran version and external libs #1: test_binsrc ....................... Passed 0.01 sec
Start 2: test_boozer_class
2/42 Test Integrate, document and test VMEC interfaces #2: test_boozer_class ................. Passed 0.02 sec
Start 3: test_efit_class
3/42 Test Supply latest version of all magnetic field routine variants #3: test_efit_class ................... Passed 0.02 sec
Start 4: test_geqdsk_tools
4/42 Test Ninjarun: handle Fortran module dependencies #4: test_geqdsk_tools ................. Passed 0.03 sec
Start 5: test_simpson
5/42 Test bdivfree routine: double precision should be changed back to double complex #5: test_simpson ...................... Passed 0.01 sec
Start 6: test_hdf5_tools
6/42 Test Remove aug from repository #6: test_hdf5_tools ................... Passed 2.02 sec
Start 7: test_util
7/42 Test Vmec #7: test_util ......................... Passed 0.01 sec
Start 8: test_neo_bspline_1d
8/42 Test Python VMEC magfie #8: test_neo_bspline_1d ............... Passed 0.02 sec
Start 9: test_interpolate
9/42 Test Adapting for Marconi Intel configuration #9: test_interpolate .................. Passed 1.86 sec
Start 10: test_batch_interpolate
10/42 Test Merge changes from magfie #10: test_batch_interpolate ............ Passed 1.86 sec
Start 11: test_collision_freqs
11/42 Test Move issues from magfie #11: test_collision_freqs .............. Passed 0.01 sec
Start 12: test_transport
12/42 Test Update build and CI/CD based on magfie #12: test_transport .................... Passed 0.01 sec
Start 13: test_vmec_modules
13/42 Test First eqdsk tests #13: test_vmec_modules ................. Passed 0.01 sec
Start 14: test_coordinate_systems
14/42 Test extract vacfield (and coil_tools) from MEPHIT #14: test_coordinate_systems ........... Passed 0.01 sec
Start 15: test_analytical_circular
15/42 Test Consistency check mixes geometric and logical coordinates #15: test_analytical_circular .......... Passed 0.06 sec
Start 16: test_ascot5_compare
16/42 Test Add check for existence of convexwall #16: test_ascot5_compare ............... Passed 17.08 sec
Start 17: test_arnoldi_setup
17/42 Test Evaluate equilibrium field only if specified #17: test_arnoldi_setup ................ Passed 0.35 sec
Start 18: test_arnoldi
18/42 Test Implement Biot-Savart variant #18: test_arnoldi ...................... Passed 0.06 sec
Start 19: test_mympilib
19/42 Test Replace pfile format by mgrid format #19: test_mympilib ..................... Passed 0.06 sec
Start 20: test_system_utility
20/42 Test Check maybe undefined behavior #20: test_system_utility ............... Passed 0.01 sec
Start 21: setup_test_jorek_field
21/42 Test Created converter for the conversion between poloidal and toroidal flux lable s_pol and s_tor #21: setup_test_jorek_field ............ Passed 0.01 sec
Start 30: test_jorek_field
22/42 Test Test even order in spline three to five #30: test_jorek_field .................. Passed 0.05 sec
Start 31: test_field
23/42 Test Integration test API #31: test_field ........................ Passed 0.05 sec
Start 22: cleanup_test_jorek_field
24/42 Test SPLIME #22: cleanup_test_jorek_field .......... Passed 0.01 sec
Start 23: test_biotsavart
25/42 Test Revised EQDSK class #23: test_biotsavart ................... Passed 0.39 sec
Start 24: test_example_field
26/42 Test Support CHEASE EQDSK files #24: test_example_field ................ Passed 0.01 sec
Start 25: test_biotsavart_field
27/42 Test Support FREEGS EQDSK files #25: test_biotsavart_field ............. Passed 0.02 sec
Start 26: test_mesh
28/42 Test Updated comments/naming #26: test_mesh ......................... Passed 0.01 sec
Start 27: test_field_mesh
29/42 Test branch psipol2phitor: Created converter between poloidal and toroidal #27: test_field_mesh ................... Passed 0.01 sec
Start 28: test_polylag_field
30/42 Test Add check for q profile for EQDSK #28: test_polylag_field ................ Passed 0.09 sec
Start 29: test_spline_field
31/42 Test Add routines to convert to harmonics in flux coordinates #29: test_spline_field ................. Passed 0.49 sec
Start 32: test_stretch_coords
32/42 Test Add Boozer Fourier transformations #32: test_stretch_coords ............... Passed 0.12 sec
Start 33: test_coil_tools_biot_savart
33/42 Test Add checks between Fortran and Python EQDSK routines #33: test_coil_tools_biot_savart ....... Passed 17.48 sec
Start 34: tilted_coil_generate_geometry
34/42 Test Add splines for covariant B components #34: tilted_coil_generate_geometry ..... Passed 0.14 sec
Start 35: tilted_coil_fourier_modes
35/42 Test Design MARS interface via OMFIT #35: tilted_coil_fourier_modes ......... Passed 26.39 sec
Start 36: tilted_coil_field_validation
36/42 Test Extend EQDSK interface to output B0 #36: tilted_coil_field_validation ...... Passed 3.81 sec
Start 37: test_polylag_5
37/42 Test Move all tests that depend on data in /proj/plasma to CODE #37: test_polylag_5 .................... Passed 0.01 sec
Start 38: test_poincare
38/42 Test Small workflow changes for development environment #38: test_poincare ..................... Passed 0.02 sec
Start 39: test_odeint_allroutines_context
39/42 Test Activate highest optimization for coil_tools #39: test_odeint_allroutines_context ... Passed 0.01 sec
Start 40: test_odeint_thread_safety
40/42 Test Splime2 #40: test_odeint_thread_safety ......... Passed 0.01 sec
Start 41: build_test_golden_record_odeint
41/42 Test Fix index errors in 3D splines #41: build_test_golden_record_odeint ... Passed 0.02 sec
Start 42: test_golden_record_odeint
42/42 Test Add derivatives to splines #42: test_golden_record_odeint ......... Passed 0.19 sec
100% tests passed, 0 tests failed out of 42
Label Time Summary:
biot_savart = 17.48 secproc (1 test)
build-helper = 0.02 secproc (1 test)
field = 1.25 secproc (10 tests)
magfie = 47.82 secproc (4 tests)
poincare = 0.02 secproc (1 test)
polylag = 0.01 secproc (1 test)
tilted_coil = 30.34 sec*proc (3 tests)
Total Test time (real) = 72.91 sec passes all Fortran and Python tests.
This provides a clean, textbook 1D B-spline basis and a proven matrix-free LSQ core as a foundation for future 2D/3D and batch extensions.
PR Type
Enhancement
Description
Implement 1D B-spline basis with Cox–de Boor recursion
Add matrix-free CGLS solver for least-squares fitting
Include partition of unity and LSQ accuracy tests
Provide Python visualization utility for fit results
Diagram Walkthrough
File Walkthrough
neo_bspline.f90
1D B-spline basis and matrix-free CGLS implementationsrc/interpolate/neo_bspline.f90
bspline_1dtype with degree, control points, andopen-uniform knot vector
bspline_1d_init_uniformfor knot vector initialization on [x_min,x_max]
bspline_1d_evalusing Cox–de Boor recursion for basisevaluation
bspline_1d_lsq_cglswith operator-basedA and A^T
find_span,basis_funs,apply_A,apply_ATtest_neo_bspline_1d.f90
B-spline partition of unity and LSQ accuracy teststest/source/test_neo_bspline_1d.f90
accuracy
data
plot_neo_bspline_1d.py
Python visualization utility for B-spline LSQ fitstest/scripts/plot_neo_bspline_1d.py
CMakeLists.txt
Register neo_bspline module in build systemsrc/interpolate/CMakeLists.txt
neo_bspline.f90to interpolate library buildCMakeLists.txt
Add neo_bspline test to CMake and CTesttest/CMakeLists.txt
test_neo_bspline_1d.xexecutable target