Skip to content

Simplify mixed precision: compute types on demand instead of caching #759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ChrisRackauckas-Claude
Copy link
Contributor

Summary

  • Simplified mixed precision implementations by computing types on demand rather than caching them
  • Reduces complexity while maintaining zero allocations for subsequent solves
  • Cleaner implementation as requested in review feedback

Changes

Modified all 6 mixed precision implementations to compute T32 and Torig types on demand in solve! functions instead of storing them in the cache:

  • MKL32MixedLUFactorization
  • OpenBLAS32MixedLUFactorization
  • AppleAccelerate32MixedLUFactorization
  • RF32MixedLUFactorization
  • CUDAOffload32MixedLUFactorization
  • MetalOffload32MixedLUFactorization

Before

# In init_cacheval:
T32 = eltype(A) <: Complex ? ComplexF32 : Float32
Torig = eltype(u)
return (luinst, ipiv, A_32, b_32, u_32, T32, Torig)

# In solve!:
fact, ipiv, A_32, b_32, u_32, T32, Torig = @get_cacheval(cache, :MKL32MixedLUFactorization)

After

# In init_cacheval:
return (luinst, ipiv, A_32, b_32, u_32)

# In solve!:
fact, ipiv, A_32, b_32, u_32 = @get_cacheval(cache, :MKL32MixedLUFactorization)
# Compute types on demand
T32 = eltype(A) <: Complex ? ComplexF32 : Float32
Torig = eltype(cache.u)

Performance

The type computations (eltype(A) <: Complex and eltype(cache.u)) are simple operations that don't allocate, so computing them on demand has negligible performance impact while making the code cleaner and easier to understand.

Related

This is a cleaner reimplementation of #758 based on review feedback.

🤖 Generated with Claude Code

Remove cached T32 and Torig types from init_cacheval return tuples.
Instead compute these types on demand in solve! functions to reduce
complexity while maintaining zero allocations for subsequent solves.

This change affects all mixed precision implementations:
- MKL32MixedLUFactorization
- OpenBLAS32MixedLUFactorization
- AppleAccelerate32MixedLUFactorization
- RF32MixedLUFactorization
- CUDAOffload32MixedLUFactorization
- MetalOffload32MixedLUFactorization

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ChrisRackauckas ChrisRackauckas merged commit 7492b7f into SciML:main Aug 23, 2025
130 of 136 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants