Skip to content

Commit 535e1f5

Browse files
committed
add GPU support for steadystate
1 parent 47e4757 commit 535e1f5

File tree

6 files changed

+56
-41
lines changed

6 files changed

+56
-41
lines changed

docs/src/extensions/CUDA.md

Lines changed: 36 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ typeof(M.data) <: CuSparseMatrixCSC # solve on GPU
1212
We wrapped several functions in CUDA and CUDA.CUSPARSE in order to not only converting QuantumObject.data into GPU arrays, but also changing the element type and word size (32 and 64) since some of the GPUs perform better in 32-bit. The functions are listed as follows (where input A is a QuantumObject):
1313

1414
Therefore, we wrapped several functions in `CUDA` and `CUDA.CUSPARSE` in order to not only converting a HEOMLS-matrix-type object into GPU arrays, but also changing the element type and word size (`32` and `64`) since some of the GPUs perform better in `32`-bit. The functions are listed as follows (where input `M` is a `AbstractHEOMLSMatrix`):
15-
- `cu(M, word_size=64)` : Translate `M.data` into CUDA arrays with specified `word_size`.
15+
- `cu(M, word_size=64)` : Translate `M.data` into CUDA arrays with specified `word_size` (default to `64`).
1616
- `CuSparseMatrixCSC{T}(M)` : Translate `M.data` into the type `CuSparseMatrixCSC{T, Int32}`
1717

1818
### Demonstration
@@ -21,7 +21,6 @@ The extension will be automatically loaded if user imports the package `CUDA.jl`
2121

2222
```julia
2323
using HierarchicalEOM
24-
using LinearSolve # to change the solver for better GPU performance
2524
using CUDA
2625
CUDA.allowscalar(false) # Avoid unexpected scalar indexing
2726
```
@@ -43,12 +42,12 @@ tier = 3
4342
tlist = 0:0.1:10
4443
ωlist = -10:1:10
4544

46-
σm = [0 1; 0 0]
47-
σz = [1 0; 0 -1]
48-
II = [1 0; 0 1]
49-
d_up = kron( σm, II)
50-
d_dn = kron(-1 * σz, σm)
51-
ρ0 = kron([1 0; 0 0], [1 0; 0 0])
45+
σm = sigmam()
46+
σz = sigmaz()
47+
II = qeye(2)
48+
d_up = tensor( σm, II)
49+
d_dn = tensor(-1 * σz, σm)
50+
ψ0 = tensor(basis(2, 0), basis(2, 0))
5251
Hsys = ϵ * (d_up' * d_up + d_dn' * d_dn) + U * (d_up' * d_up * d_dn' * d_dn)
5352

5453
bath_up = Fermion_Lorentz_Pade(d_up, Γ, μ, W, kT, N)
@@ -62,34 +61,52 @@ M_even_gpu = cu(M_even_cpu, word_size = 32)
6261
# odd HEOMLS matrix
6362
M_odd_cpu = M_Fermion(Hsys, tier, bath_list, ODD)
6463
M_odd_gpu = cu(M_odd_cpu, word_size = 32)
64+
```
65+
66+
### Solving time evolution with CPU
67+
68+
```julia
69+
ados_list = HEOMsolve(M_even_cpu, ψ0, tlist)
70+
```
71+
72+
### Solving time evolution with GPU
73+
74+
```julia
75+
ados_list = HEOMsolve(M_even_gpu, ψ0, tlist)
76+
```
77+
78+
### Solving steady state with CPU using linear-solve method
6579

66-
# solve steady state with CPU
80+
```julia
6781
ados_ss = steadystate(M_even_cpu);
6882
```
6983

70-
!!! note "Note"
71-
This extension does not support for solving [stationary state](@ref doc-Stationary-State) on GPU since it is not efficient and might get wrong solutions. If you really want to obtain the stationary state with GPU, you can repeatedly solve the [time evolution](@ref doc-Time-Evolution) until you find it.
84+
### Solving steady state with GPU using linear-solve method
7285

73-
### Solving time evolution with CPU
86+
```julia
87+
ados_ss = steadystate(M_even_gpu);
88+
```
89+
90+
### Solving steady state with CPU using ODE method
7491

7592
```julia
76-
ados_list_cpu = HEOMsolve(M_even_cpu, ρ0, tlist)
93+
ados_ss = steadystate(M_even_cpu, ψ0);
7794
```
7895

79-
### Solving time evolution with GPU
96+
### Solving steady state with GPU using ODE method
8097

8198
```julia
82-
ados_list_gpu = HEOMsolve(M_even_gpu, ρ0, tlist)
99+
ados_ss = steadystate(M_even_gpu, ψ0);
83100
```
84101

85-
### Solving Spectrum with CPU
102+
### Solving spectrum with CPU
86103

87104
```julia
88-
dos_cpu = DensityOfStates(M_odd_cpu, ados_ss, d_up, ωlist)
105+
dos = DensityOfStates(M_odd_cpu, ados_ss, d_up, ωlist)
89106
```
90107

91-
### Solving Spectrum with GPU
108+
### Solving spectrum with GPU
92109

93110
```julia
94-
dos_gpu = DensityOfStates(M_odd_gpu, ados_ss, d_up, ωlist; solver=KrylovJL_BICGSTAB(rtol=1f-10, atol=1f-12))
111+
dos = DensityOfStates(M_odd_gpu, ados_ss, d_up, ωlist)
95112
```

docs/src/stationary_state.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@
33

44
To solve the stationary state of the reduced state and also all the [ADOs](@ref doc-ADOs), you only need to call [`steadystate`](@ref). Different methods are implemented with different input parameters of the function which makes it easy to switch between different methods. The output of the function [`steadystate`](@ref) for each methods will always be in the type of the auxiliary density operators [`ADOs`](@ref).
55

6+
!!! compat "Extension for CUDA.jl"
7+
`HierarchicalEOM.jl` provides an extension to support GPU ([`CUDA.jl`](https://github.com/JuliaGPU/CUDA.jl)) acceleration for [`steadystate`](@ref). See [here](@ref doc-ext-CUDA) for more details.
8+
69
## Solve with [LinearSolve.jl](http://linearsolve.sciml.ai/stable/)
710
The first method is implemented by solving the linear problem
811
```math
@@ -16,6 +19,7 @@ The first method is implemented by solving the linear problem
1619
M::AbstractHEOMLSMatrix
1720
ados_steady = steadystate(M)
1821
```
22+
1923
!!! warning "Unphysical solution"
2024
This method does not require an initial condition ``\rho^{(m,n,p)}_{\textbf{j} \vert \textbf{q}}(0)``. Although this method works for most of the cases, it does not guarantee that one can obtain a physical (or unique) solution. If there is any problem within the solution, please try the second method which solves with an initial condition, as shown below.
2125

docs/src/time_evolution.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ end
7878
The first method is implemented by solving the ordinary differential equation (ODE). `HierarchicalEOM.jl` wraps some of the functions in [`DifferentialEquations.jl`](https://diffeq.sciml.ai/stable/), which is a very rich numerical library for solving the differential equations and provides many ODE solvers. It offers quite a few options for the user to tailor the solver to their specific needs. The default solver (and its corresponding settings) are chosen to suit commonly encountered problems and should work fine for most of the cases. If you require more specialized methods, such as the choice of algorithm, please refer to [DifferentialEquations solvers](@ref ODE-solvers) and also the documentation of [`DifferentialEquations.jl`](https://diffeq.sciml.ai/stable/).
7979

8080
!!! compat "Extension for CUDA.jl"
81-
`HierarchicalEOM.jl` provides an extension to support GPU ([`CUDA.jl`](https://github.com/JuliaGPU/CUDA.jl)) acceleration for solving the time evolution (only for ODE method with time-independent system Hamiltonian). See [here](@ref doc-ext-CUDA) for more details.
81+
`HierarchicalEOM.jl` provides an extension to support GPU ([`CUDA.jl`](https://github.com/JuliaGPU/CUDA.jl)) acceleration for [`HEOMsolve`](@ref) (only for ODE method). See [here](@ref doc-ext-CUDA) for more details.
8282

8383
See the docstring of this method:
8484

ext/HierarchicalEOM_CUDAExt.jl

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
module HierarchicalEOM_CUDAExt
22

33
using HierarchicalEOM
4-
import HierarchicalEOM: _HandleVectorType, _HandleTraceVectorType
4+
import HierarchicalEOM: _HandleVectorType, _HandleTraceVectorType, _HandleSteadyStateMatrix, _SteadyStateConstraint
55
import QuantumToolbox: _CType, _convert_eltype_wordsize, makeVal, getVal
66
import CUDA
77
import CUDA: cu, CuArray
@@ -81,4 +81,7 @@ _convert_to_gpu_matrix(A::AddedOperator, ElType) = AddedOperator(map(op -> _conv
8181
_HandleVectorType(M::Type{<:CuSparseMatrixCSC}, V::SparseVector) = CuArray{_CType(eltype(M))}(V)
8282

8383
_HandleTraceVectorType(M::Type{<:CuSparseMatrixCSC}, V::SparseVector) = CuSparseVector{_CType(eltype(M))}(V)
84+
85+
_HandleSteadyStateMatrix(M::AbstractHEOMLSMatrix{<:MatrixOperator{T,MT}}) where {T<:Number,MT<:CuSparseMatrixCSC} =
86+
M.data.A + cu(_SteadyStateConstraint(T, prod(M.dimensions), size(M, 1)))
8487
end

src/HeomBase.jl

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -95,16 +95,12 @@ _HandleTraceVectorType(M::AbstractHEOMLSMatrix, V::SparseVector) =
9595
_HandleTraceVectorType(_get_SciML_matrix_wrapper(M), V)
9696
_HandleTraceVectorType(M::Type{<:SparseMatrixCSC}, V::SparseVector) = V
9797

98-
function _HandleSteadyStateMatrix(M::AbstractHEOMLSMatrix{<:MatrixOperator})
99-
S = size(M, 1)
100-
ElType = eltype(M)
101-
D = prod(M.dimensions)
102-
A = copy(M.data.A)
103-
104-
# sparse(row_idx, col_idx, values, row_dims, col_dims)
105-
A += sparse(ones(ElType, D), [(n - 1) * (D + 1) + 1 for n in 1:D], ones(ElType, D), S, S)
106-
return A
107-
end
98+
_HandleSteadyStateMatrix(M::AbstractHEOMLSMatrix{<:MatrixOperator{T,MT}}) where {T<:Number,MT<:SparseMatrixCSC} =
99+
M.data.A + _SteadyStateConstraint(T, prod(M.dimensions), size(M, 1))
100+
101+
# this adds the trace == 1 contraint for reduced density operator during linear solve of steadystate
102+
_SteadyStateConstraint(T::Type{<:Number}, D::Int, S::Int) =
103+
sparse(ones(T, D), [(n - 1) * (D + 1) + 1 for n in 1:D], ones(T, D), S, S)
108104

109105
function _check_sys_dim_and_ADOs_num(A, B)
110106
if (A.dimensions != B.dimensions)

test/gpu/CUDAExt.jl

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -88,23 +88,18 @@ CUDA.@time @testset "CUDA Extension" begin
8888
L_even_cpu = M_Fermion(Hsys, tier, bath_list; verbose = false)
8989
L_even_gpu = cu(L_even_cpu)
9090
ados_cpu = steadystate(L_even_cpu; verbose = false)
91-
ados_gpu = steadystate(L_even_gpu, ψ0, 10; verbose = false)
91+
ados_gpu1 = steadystate(L_even_gpu; verbose = false)
92+
ados_gpu2 = steadystate(L_even_gpu, ψ0, 10; verbose = false)
9293
@test L_even_gpu.data.A isa CUDA.CUSPARSE.CuSparseMatrixCSC{ComplexF64,Int32}
93-
@test all(isapprox.(ados_cpu.data, ados_gpu.data; atol = 1e-6))
94+
@test all(isapprox.(ados_cpu.data, ados_gpu1.data; atol = 1e-6))
95+
@test all(isapprox.(ados_cpu.data, ados_gpu2.data; atol = 1e-6))
9496

9597
## solve density of states
9698
ωlist = -5:0.5:5
9799
L_odd_cpu = M_Fermion(Hsys, tier, bath_list, ODD; verbose = false)
98100
L_odd_gpu = cu(L_odd_cpu, word_size = 32)
99101
dos_cpu = DensityOfStates(L_odd_cpu, ados_cpu, d_up, ωlist; verbose = false)
100-
dos_gpu = DensityOfStates(
101-
L_odd_gpu,
102-
ados_cpu,
103-
d_up,
104-
ωlist;
105-
solver = KrylovJL_BICGSTAB(rtol = 1.0f-10, atol = 1.0f-12),
106-
verbose = false,
107-
)
102+
dos_gpu = DensityOfStates(L_odd_gpu, ados_cpu, d_up, ωlist; verbose = false)
108103
@test L_odd_gpu.data.A isa CUDA.CUSPARSE.CuSparseMatrixCSC{ComplexF32,Int32}
109104
for (i, ω) in enumerate(ωlist)
110105
@test dos_cpu[i] dos_gpu[i] atol = 1e-6

0 commit comments

Comments
 (0)