add GPU support for steadystate

ytdHuang · ytdHuang · commit 535e1f546ae7 · 2025-04-16T13:54:52.000+09:00
diff --git a/docs/src/extensions/CUDA.md b/docs/src/extensions/CUDA.md
@@ -12,7 +12,7 @@ typeof(M.data) <: CuSparseMatrixCSC # solve on GPU
 We wrapped several functions in CUDA and CUDA.CUSPARSE in order to not only converting QuantumObject.data into GPU arrays, but also changing the element type and word size (32 and 64) since some of the GPUs perform better in 32-bit. The functions are listed as follows (where input A is a QuantumObject):
 
 Therefore, we wrapped several functions in `CUDA` and `CUDA.CUSPARSE` in order to not only converting a HEOMLS-matrix-type object into GPU arrays, but also changing the element type and word size (`32` and `64`) since some of the GPUs perform better in `32`-bit. The functions are listed as follows (where input `M` is a `AbstractHEOMLSMatrix`):
-- `cu(M, word_size=64)` : Translate `M.data` into CUDA arrays with specified `word_size`.
+- `cu(M, word_size=64)` : Translate `M.data` into CUDA arrays with specified `word_size` (default to `64`).
 - `CuSparseMatrixCSC{T}(M)` : Translate `M.data` into the type `CuSparseMatrixCSC{T, Int32}`
 
 ### Demonstration
@@ -21,7 +21,6 @@ The extension will be automatically loaded if user imports the package `CUDA.jl`
 
 ```julia
 using HierarchicalEOM
-using LinearSolve # to change the solver for better GPU performance
 using CUDA
 CUDA.allowscalar(false) # Avoid unexpected scalar indexing
 ```
@@ -43,12 +42,12 @@ tier  = 3
 tlist = 0:0.1:10
 ωlist = -10:1:10
 
-σm = [0 1; 0  0]
-σz = [1 0; 0 -1]
-II = [1 0; 0  1]
-d_up = kron(     σm, II)
-d_dn = kron(-1 * σz, σm)
-ρ0   = kron([1 0; 0 0], [1 0; 0 0])
+σm = sigmam()
+σz = sigmaz()
+II = qeye(2)
+d_up = tensor(     σm, II)
+d_dn = tensor(-1 * σz, σm)
+ψ0   = tensor(basis(2, 0), basis(2, 0))
 Hsys = ϵ * (d_up' * d_up + d_dn' * d_dn) + U * (d_up' * d_up * d_dn' * d_dn)
 
 bath_up = Fermion_Lorentz_Pade(d_up, Γ, μ, W, kT, N)
@@ -62,34 +61,52 @@ M_even_gpu = cu(M_even_cpu, word_size = 32)
 # odd HEOMLS matrix
 M_odd_cpu  = M_Fermion(Hsys, tier, bath_list, ODD)
 M_odd_gpu  = cu(M_odd_cpu, word_size = 32)
+```
+
+### Solving time evolution with CPU
+
+```julia
+ados_list = HEOMsolve(M_even_cpu, ψ0, tlist)
+```
+
+### Solving time evolution with GPU
+
+```julia
+ados_list = HEOMsolve(M_even_gpu, ψ0, tlist)
+```
+
+### Solving steady state with CPU using linear-solve method
 
-# solve steady state with CPU
+```julia
 ados_ss = steadystate(M_even_cpu);
 ```
 
-!!! note "Note"
-    This extension does not support for solving [stationary state](@ref doc-Stationary-State) on GPU since it is not efficient and might get wrong solutions. If you really want to obtain the stationary state with GPU, you can repeatedly solve the [time evolution](@ref doc-Time-Evolution) until you find it.
+### Solving steady state with GPU using linear-solve method
 
-### Solving time evolution with CPU
+```julia
+ados_ss = steadystate(M_even_gpu);
+```
+
+### Solving steady state with CPU using ODE method
 
 ```julia
-ados_list_cpu = HEOMsolve(M_even_cpu, ρ0, tlist)
+ados_ss = steadystate(M_even_cpu, ψ0);
 ```
 
-### Solving time evolution with GPU
+### Solving steady state with GPU using ODE method
 
 ```julia
-ados_list_gpu = HEOMsolve(M_even_gpu, ρ0, tlist)
+ados_ss = steadystate(M_even_gpu, ψ0);
 ```
 
-### Solving Spectrum with CPU
+### Solving spectrum with CPU
 
 ```julia
-dos_cpu = DensityOfStates(M_odd_cpu, ados_ss, d_up, ωlist)
+dos = DensityOfStates(M_odd_cpu, ados_ss, d_up, ωlist)
 ```
 
-### Solving Spectrum with GPU
+### Solving spectrum with GPU
 
 ```julia
-dos_gpu = DensityOfStates(M_odd_gpu, ados_ss, d_up, ωlist; solver=KrylovJL_BICGSTAB(rtol=1f-10, atol=1f-12))
+dos = DensityOfStates(M_odd_gpu, ados_ss, d_up, ωlist)
 ```
diff --git a/docs/src/stationary_state.md b/docs/src/stationary_state.md
@@ -3,6 +3,9 @@
 
 To solve the stationary state of the reduced state and also all the [ADOs](@ref doc-ADOs), you only need to call [`steadystate`](@ref). Different methods are implemented with different input parameters of the function which makes it easy to switch between different methods. The output of the function [`steadystate`](@ref) for each methods will always be in the type of the auxiliary density operators [`ADOs`](@ref).
 
+!!! compat "Extension for CUDA.jl"
+    `HierarchicalEOM.jl` provides an extension to support GPU ([`CUDA.jl`](https://github.com/JuliaGPU/CUDA.jl)) acceleration for [`steadystate`](@ref). See [here](@ref doc-ext-CUDA) for more details.
+
 ## Solve with [LinearSolve.jl](http://linearsolve.sciml.ai/stable/)
 The first method is implemented by solving the linear problem
 ```math
@@ -16,6 +19,7 @@ The first method is implemented by solving the linear problem
 M::AbstractHEOMLSMatrix  
 ados_steady = steadystate(M)
 ```
+
 !!! warning "Unphysical solution"
     This method does not require an initial condition ``\rho^{(m,n,p)}_{\textbf{j} \vert \textbf{q}}(0)``. Although this method works for most of the cases, it does not guarantee that one can obtain a physical (or unique) solution. If there is any problem within the solution, please try the second method which solves with an initial condition, as shown below.
 
diff --git a/docs/src/time_evolution.md b/docs/src/time_evolution.md
@@ -78,7 +78,7 @@ end
 The first method is implemented by solving the ordinary differential equation (ODE). `HierarchicalEOM.jl` wraps some of the functions in [`DifferentialEquations.jl`](https://diffeq.sciml.ai/stable/), which is a very rich numerical library for solving the differential equations and provides many ODE solvers. It offers quite a few options for the user to tailor the solver to their specific needs. The default solver (and its corresponding settings) are chosen to suit commonly encountered problems and should work fine for most of the cases. If you require more specialized methods, such as the choice of algorithm, please refer to [DifferentialEquations solvers](@ref ODE-solvers) and also the documentation of [`DifferentialEquations.jl`](https://diffeq.sciml.ai/stable/).
 
 !!! compat "Extension for CUDA.jl"
-    `HierarchicalEOM.jl` provides an extension to support GPU ([`CUDA.jl`](https://github.com/JuliaGPU/CUDA.jl)) acceleration for solving the time evolution (only for ODE method with time-independent system Hamiltonian). See [here](@ref doc-ext-CUDA) for more details.
+    `HierarchicalEOM.jl` provides an extension to support GPU ([`CUDA.jl`](https://github.com/JuliaGPU/CUDA.jl)) acceleration for [`HEOMsolve`](@ref) (only for ODE method). See [here](@ref doc-ext-CUDA) for more details.
 
 See the docstring of this method:  
 
diff --git a/ext/HierarchicalEOM_CUDAExt.jl b/ext/HierarchicalEOM_CUDAExt.jl
@@ -1,7 +1,7 @@
 module HierarchicalEOM_CUDAExt
 
 using HierarchicalEOM
-import HierarchicalEOM: _HandleVectorType, _HandleTraceVectorType
+import HierarchicalEOM: _HandleVectorType, _HandleTraceVectorType, _HandleSteadyStateMatrix, _SteadyStateConstraint
 import QuantumToolbox: _CType, _convert_eltype_wordsize, makeVal, getVal
 import CUDA
 import CUDA: cu, CuArray
@@ -81,4 +81,7 @@ _convert_to_gpu_matrix(A::AddedOperator, ElType) = AddedOperator(map(op -> _conv
 _HandleVectorType(M::Type{<:CuSparseMatrixCSC}, V::SparseVector) = CuArray{_CType(eltype(M))}(V)
 
 _HandleTraceVectorType(M::Type{<:CuSparseMatrixCSC}, V::SparseVector) = CuSparseVector{_CType(eltype(M))}(V)
+
+_HandleSteadyStateMatrix(M::AbstractHEOMLSMatrix{<:MatrixOperator{T,MT}}) where {T<:Number,MT<:CuSparseMatrixCSC} =
+    M.data.A + cu(_SteadyStateConstraint(T, prod(M.dimensions), size(M, 1)))
 end
diff --git a/src/HeomBase.jl b/src/HeomBase.jl
@@ -95,16 +95,12 @@ _HandleTraceVectorType(M::AbstractHEOMLSMatrix, V::SparseVector) =
     _HandleTraceVectorType(_get_SciML_matrix_wrapper(M), V)
 _HandleTraceVectorType(M::Type{<:SparseMatrixCSC}, V::SparseVector) = V
 
-function _HandleSteadyStateMatrix(M::AbstractHEOMLSMatrix{<:MatrixOperator})
-    S = size(M, 1)
-    ElType = eltype(M)
-    D = prod(M.dimensions)
-    A = copy(M.data.A)
-
-    # sparse(row_idx, col_idx, values, row_dims, col_dims)
-    A += sparse(ones(ElType, D), [(n - 1) * (D + 1) + 1 for n in 1:D], ones(ElType, D), S, S)
-    return A
-end
+_HandleSteadyStateMatrix(M::AbstractHEOMLSMatrix{<:MatrixOperator{T,MT}}) where {T<:Number,MT<:SparseMatrixCSC} =
+    M.data.A + _SteadyStateConstraint(T, prod(M.dimensions), size(M, 1))
+
+# this adds the trace == 1 contraint for reduced density operator during linear solve of steadystate
+_SteadyStateConstraint(T::Type{<:Number}, D::Int, S::Int) =
+    sparse(ones(T, D), [(n - 1) * (D + 1) + 1 for n in 1:D], ones(T, D), S, S)
 
 function _check_sys_dim_and_ADOs_num(A, B)
     if (A.dimensions != B.dimensions)
diff --git a/test/gpu/CUDAExt.jl b/test/gpu/CUDAExt.jl
@@ -88,23 +88,18 @@ CUDA.@time @testset "CUDA Extension" begin
     L_even_cpu = M_Fermion(Hsys, tier, bath_list; verbose = false)
     L_even_gpu = cu(L_even_cpu)
     ados_cpu = steadystate(L_even_cpu; verbose = false)
-    ados_gpu = steadystate(L_even_gpu, ψ0, 10; verbose = false)
+    ados_gpu1 = steadystate(L_even_gpu; verbose = false)
+    ados_gpu2 = steadystate(L_even_gpu, ψ0, 10; verbose = false)
     @test L_even_gpu.data.A isa CUDA.CUSPARSE.CuSparseMatrixCSC{ComplexF64,Int32}
-    @test all(isapprox.(ados_cpu.data, ados_gpu.data; atol = 1e-6))
+    @test all(isapprox.(ados_cpu.data, ados_gpu1.data; atol = 1e-6))
+    @test all(isapprox.(ados_cpu.data, ados_gpu2.data; atol = 1e-6))
 
     ## solve density of states
     ωlist = -5:0.5:5
     L_odd_cpu = M_Fermion(Hsys, tier, bath_list, ODD; verbose = false)
     L_odd_gpu = cu(L_odd_cpu, word_size = 32)
     dos_cpu = DensityOfStates(L_odd_cpu, ados_cpu, d_up, ωlist; verbose = false)
-    dos_gpu = DensityOfStates(
-        L_odd_gpu,
-        ados_cpu,
-        d_up,
-        ωlist;
-        solver = KrylovJL_BICGSTAB(rtol = 1.0f-10, atol = 1.0f-12),
-        verbose = false,
-    )
+    dos_gpu = DensityOfStates(L_odd_gpu, ados_cpu, d_up, ωlist; verbose = false)
     @test L_odd_gpu.data.A isa CUDA.CUSPARSE.CuSparseMatrixCSC{ComplexF32,Int32}
     for (i, ω) in enumerate(ωlist)
         @test dos_cpu[i] ≈ dos_gpu[i] atol = 1e-6