Deepdive: libneo now has correctness coverage for evaluate_batch_splines_3d_der2_rmix (see PR #228), but the general (non-5/5/5) path currently falls back to full evaluate_batch_splines_3d_der2_core and slices (dr2, drd2, drd3). This is correct but can be slower than a dedicated rmix kernel.\n\nGoal:\n- Implement a real optimized rmix kernel for general spline orders (and NQ>1) that computes y, dy, and only (d2/dx1^2, d2/dx1dx2, d2/dx1dx3).\n\nAcceptance:\n- Keep PR #228 regression tests passing (plus add perf microbenchmarks if appropriate).\n- Demonstrate speedup in a representative hot path (e.g., Newton solver needing only rmix).