Skip to content

Conversation

Malmahrouqi3
Copy link
Collaborator

@Malmahrouqi3 Malmahrouqi3 commented Jul 1, 2025

User description

Description

Reduced PR from (#855),

Essentially, I refactored math-critical items in s_hllc_riemann_solver subroutine.


PR Type

Enhancement


Description

  • Refactor HLLC Riemann solver variable initialization

  • Consolidate loop structures for better performance

  • Optimize Reynolds number calculations

  • Streamline pressure and energy computations


Diagram Walkthrough

flowchart LR
  A["Variable Initialization"] --> B["Loop Consolidation"]
  B --> C["Reynolds Number Optimization"]
  C --> D["Energy Calculation Streamlining"]
  D --> E["Performance Improvement"]
Loading

File Walkthrough

Relevant files
Enhancement
m_riemann_solvers.fpp
HLLC Riemann solver performance optimization                         

src/simulation/m_riemann_solvers.fpp

  • Move variable initializations to loop beginning for better
    organization
  • Consolidate separate loops into single loops for efficiency
  • Optimize Reynolds number calculations by combining left/right
    computations
  • Streamline pressure assignment and energy calculations
+54/-154

@Copilot Copilot AI review requested due to automatic review settings July 1, 2025 03:28
@Malmahrouqi3 Malmahrouqi3 requested a review from a team as a code owner July 1, 2025 03:28
Copy link

qodo-merge-pro bot commented Jul 1, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Logic Error

In the merged loop for mpp_lim case, the left state normalization is now inside the same loop as the accumulation, which could cause incorrect calculations since alpha_L_sum is being modified while still being accumulated.

    do i = 1, num_fluids
        qL_prim_rs${XYZ}$_vf(j, k, l, i) = max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, i))
        qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = min(max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)), 1._wp)
        alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)
        qR_prim_rs${XYZ}$_vf(j + 1, k, l, i) = max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, i))
        qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = min(max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)), 1._wp)
        alpha_R_sum = alpha_R_sum + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)
    end do

    !$acc loop seq
    do i = 1, num_fluids
        qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps)
        qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)/max(alpha_R_sum, sgm_eps)
    end do
end if
Duplicate Calculation

The shear modulus G_L and G_R are calculated twice in the hyperelasticity branch - once at the beginning of the energy adjustment block and again specifically for hyperelasticity, leading to redundant computation.

G_L = 0._wp; G_R = 0._wp;
!$acc loop seq
do i = 1, num_fluids
    ! Mixture left and right shear modulus
    G_L = G_L + alpha_L(i)*Gs(i)
    G_R = G_R + alpha_R(i)*Gs(i)
end do
Missing Assignment

The missing assignment for Re_L(i) calculation completion in the viscous Reynolds number computation could lead to incorrect viscous flux calculations.

Re_L(i) = 1._wp/max(Re_L(i), sgm_eps)
Re_R(i) = 1._wp/max(Re_R(i), sgm_eps)

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the s_hllc_riemann_solver subroutine in m_riemann_solvers.fpp to streamline variable initialization, consolidate energy adjustment logic, and introduce a loop_end helper for fluid loops.

  • Consolidated zero-initialization of primitive-state variables at the start of each cell loop
  • Merged hypoelastic and hyperelastic energy adjustments into a single conditional block
  • Introduced loop_end to simplify fluid‐count branching

Copy link

qodo-merge-pro bot commented Jul 1, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Remove duplicate shear modulus calculation

The shear modulus calculation for G_L and G_R is duplicated in both the outer
scope and within the hyperelasticity branch. Remove the redundant calculation
inside the hyperelasticity branch to avoid unnecessary computation and potential
inconsistency.

src/simulation/m_riemann_solvers.fpp [1381-1431]

 if (hypoelasticity .or. hyperelasticity) then
     G_L = 0._wp; G_R = 0._wp
     !$acc loop seq
     do i = 1, num_fluids
         G_L = G_L + alpha_L(i)*Gs(i)
         G_R = G_R + alpha_R(i)*Gs(i)
     end do
     if (hypoelasticity) then
         ...
     else if (hyperelasticity) then
         ...
-        G_L = 0._wp; G_R = 0._wp;
-        !$acc loop seq
-        do i = 1, num_fluids
-            ! Mixture left and right shear modulus
-            G_L = G_L + alpha_L(i)*Gs(i)
-            G_R = G_R + alpha_R(i)*Gs(i)
-        end do
+        ! Elastic contribution to energy if G large enough
+        if (G_L > verysmall .and. G_R > verysmall) then
+            E_L = E_L + G_L*qL_prim_rs${XYZ}$_vf(j, k, l, xiend + 1)
+            E_R = E_R + G_R*qR_prim_rs${XYZ}$_vf(j + 1, k, l, xiend + 1)
+        end if
         ...
     end if
 end if
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that the calculation for G_L and G_R is duplicated, which is a code quality issue introduced in the PR.

Low

[Generating...]

@Malmahrouqi3
Copy link
Collaborator Author

Pushing this after ensuring select buggy tests that failed prior have passed neatly.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link

codecov bot commented Jul 1, 2025

Codecov Report

❌ Patch coverage is 42.85714% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.91%. Comparing base (fe87fc5) to head (5d7c04a).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/simulation/m_riemann_solvers.fpp 42.85% 22 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #912      +/-   ##
==========================================
- Coverage   40.92%   40.91%   -0.02%     
==========================================
  Files          70       70              
  Lines       20299    20270      -29     
  Branches     2521     2520       -1     
==========================================
- Hits         8307     8293      -14     
+ Misses      10454    10439      -15     
  Partials     1538     1538              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sbryngelson sbryngelson self-requested a review July 4, 2025 01:59
@sbryngelson sbryngelson marked this pull request as draft July 19, 2025 21:39
@sbryngelson sbryngelson self-requested a review August 19, 2025 19:01
@Malmahrouqi3 Malmahrouqi3 marked this pull request as ready for review September 2, 2025 16:00
Copy link

qodo-merge-pro bot commented Sep 2, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

In multiple places, E_L computation was removed or reordered relative to pres_L assignment; ensure pres_L is set before using it and that the added initializations do not skip required branches. Specifically validate the new placement of pres_L/pres_R and E_L/E_R calculations to avoid using uninitialized or stale values.

pres_L = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx)
pres_R = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx)

E_L = gamma_L*pres_L + pi_inf_L + 5.e-1_wp*rho_L*vel_L_rms + qv_L
E_R = gamma_R*pres_R + pi_inf_R + 5.e-1_wp*rho_R*vel_R_rms + qv_R

H_L = (E_L + pres_L)/rho_L
Normalization Change

Alpha normalization for left states is now combined with right in one block. Confirm that accumulating alpha_L_sum and normalizing in the same conditional path preserves prior behavior for all mpp_lim and num_fluids cases, especially when alpha_*_sum can be near zero.

if (mpp_lim) then
    $:GPU_LOOP(parallelism='[seq]')
    do i = 1, num_fluids
        qL_prim_rs${XYZ}$_vf(j, k, l, i) = max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, i))
        qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = min(max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)), 1._wp)
        qR_prim_rs${XYZ}$_vf(j + 1, k, l, i) = max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, i))
        qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = min(max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)), 1._wp)
        alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)
        alpha_R_sum = alpha_R_sum + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)
    end do

    $:GPU_LOOP(parallelism='[seq]')
    do i = 1, num_fluids
        qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps)
        qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)/max(alpha_R_sum, sgm_eps)
    end do
Logic Simplification

The branch handling for num_fluids (>2, ==1, else) was consolidated and reordered; verify equivalence for edge cases (num_fluids==2 and num_fluids>2 without mpp_lim) to ensure rho/gamma/pi_inf/qv are still correctly derived on both sides.

! Retain this in the refactor
if (mpp_lim .and. (num_fluids > 2)) then
    $:GPU_LOOP(parallelism='[seq]')
    do i = 1, num_fluids
        rho_L = rho_L + qL_prim_rs${XYZ}$_vf(j, k, l, i)
        gamma_L = gamma_L + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)*gammas(i)
        pi_inf_L = pi_inf_L + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)*pi_infs(i)
        qv_L = qv_L + qL_prim_rs${XYZ}$_vf(j, k, l, i)*qvs(i)
        rho_R = rho_R + qR_prim_rs${XYZ}$_vf(j + 1, k, l, i)
        gamma_R = gamma_R + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)*gammas(i)
        pi_inf_R = pi_inf_R + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)*pi_infs(i)
        qv_R = qv_R + qR_prim_rs${XYZ}$_vf(j + 1, k, l, i)*qvs(i)
    end do
else if (num_fluids > 2) then
    $:GPU_LOOP(parallelism='[seq]')
    do i = 1, num_fluids - 1
        rho_L = rho_L + qL_prim_rs${XYZ}$_vf(j, k, l, i)
        gamma_L = gamma_L + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)*gammas(i)
        pi_inf_L = pi_inf_L + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)*pi_infs(i)
        qv_L = qv_L + qL_prim_rs${XYZ}$_vf(j, k, l, i)*qvs(i)
        rho_R = rho_R + qR_prim_rs${XYZ}$_vf(j + 1, k, l, i)
        gamma_R = gamma_R + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)*gammas(i)
        pi_inf_R = pi_inf_R + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)*pi_infs(i)
        qv_R = qv_R + qR_prim_rs${XYZ}$_vf(j + 1, k, l, i)*qvs(i)
    end do
else
    rho_L = qL_prim_rs${XYZ}$_vf(j, k, l, 1)
    gamma_L = gammas(1)
    pi_inf_L = pi_infs(1)
    qv_L = qvs(1)
    rho_R = qR_prim_rs${XYZ}$_vf(j + 1, k, l, 1)
    gamma_R = gammas(1)
    pi_inf_R = pi_infs(1)
    qv_R = qvs(1)
end if

Copy link

qodo-merge-pro bot commented Sep 2, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add division by zero protection

Division by zero can occur if Cv_L, Cv_R, Gamm_L-1, or Gamm_R-1 equals zero. Add
safety checks to prevent potential floating-point exceptions in thermodynamic
calculations.

src/simulation/m_riemann_solvers.fpp [2406-2407]

-Gamm_L = Cp_L/Cv_L; Gamm_R = Cp_R/Cv_R
-gamma_L = 1.0_wp/(Gamm_L - 1.0_wp); gamma_R = 1.0_wp/(Gamm_R - 1.0_wp)
+Gamm_L = Cp_L/max(Cv_L, sgm_eps); Gamm_R = Cp_R/max(Cv_R, sgm_eps)
+gamma_L = 1.0_wp/max(Gamm_L - 1.0_wp, sgm_eps); gamma_R = 1.0_wp/max(Gamm_R - 1.0_wp, sgm_eps)
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential division-by-zero risk and proposes a reasonable fix using max with a small epsilon sgm_eps, which is a common pattern in this codebase.

Medium

[Generating...]

Comment on lines 1223 to 1239
if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, i) = max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, i))
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = min(max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)), 1._wp)
alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)
end do

$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps)
end do

$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qR_prim_rs${XYZ}$_vf(j + 1, k, l, i) = max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, i))
qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = min(max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)), 1._wp)
alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)
alpha_R_sum = alpha_R_sum + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)
end do

$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps)
qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)/max(alpha_R_sum, sgm_eps)
end do
end if
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Guard against the case where all volume fractions are zero after limiting. If both alpha_*_sum are zero, the normalization keeps zeros and can cause later divisions by rho_* or invalid EOS mixtures. Ensure a fallback that assigns a minimum fraction to a valid phase to keep sums positive. [possible issue, importance: 8]

Suggested change
if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, i) = max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, i))
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = min(max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)), 1._wp)
alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)
end do
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps)
end do
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qR_prim_rs${XYZ}$_vf(j + 1, k, l, i) = max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, i))
qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = min(max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)), 1._wp)
alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)
alpha_R_sum = alpha_R_sum + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)
end do
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps)
qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)/max(alpha_R_sum, sgm_eps)
end do
end if
if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, i) = max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, i))
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = min(max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)), 1._wp)
qR_prim_rs${XYZ}$_vf(j + 1, k, l, i) = max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, i))
qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = min(max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)), 1._wp)
alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)
alpha_R_sum = alpha_R_sum + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)
end do
if (alpha_L_sum <= sgm_eps) then
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + 1) = 1._wp
alpha_L_sum = 1._wp
end if
if (alpha_R_sum <= sgm_eps) then
qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + 1) = 1._wp
alpha_R_sum = 1._wp
end if
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps)
qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)/max(alpha_R_sum, sgm_eps)
end do
end if

Comment on lines 1276 to 1277
E_L = gamma_L*pres_L + pi_inf_L + 5.e-1_wp*rho_L*vel_L_rms + qv_L

E_R = gamma_R*pres_R + pi_inf_R + 5.e-1_wp*rho_R*vel_R_rms + qv_R
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Protect against zero or negative rho_* before using in energy/enthalpy computations. During limiting or reconstruction, rho_* can be zero, leading to NaNs downstream when computing H_* or wave speeds. Clamp to a minimum positive value. [possible issue, importance: 8]

Suggested change
E_L = gamma_L*pres_L + pi_inf_L + 5.e-1_wp*rho_L*vel_L_rms + qv_L
E_R = gamma_R*pres_R + pi_inf_R + 5.e-1_wp*rho_R*vel_R_rms + qv_R
rho_L = max(rho_L, sgm_eps)
rho_R = max(rho_R, sgm_eps)
E_L = gamma_L*pres_L + pi_inf_L + 5.e-1_wp*rho_L*vel_L_rms + qv_L
E_R = gamma_R*pres_R + pi_inf_R + 5.e-1_wp*rho_R*vel_R_rms + qv_R

Comment on lines 1224 to 1238
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, i) = max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, i))
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = min(max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)), 1._wp)
alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)
end do

$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps)
end do

$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qR_prim_rs${XYZ}$_vf(j + 1, k, l, i) = max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, i))
qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = min(max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)), 1._wp)
alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)
alpha_R_sum = alpha_R_sum + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)
end do

$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps)
qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)/max(alpha_R_sum, sgm_eps)
end do
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Guard against the case where all alpha values clamp to zero after limiting, which would make alpha_*_sum be zero and leave all alpha entries at zero after normalization. This can lead to rho=0 and divisions by zero downstream when forming H and wave speeds. If the sum is below sgm_eps, redistribute uniformly (or keep the largest component) to ensure a valid partition of unity. [possible issue, importance: 8]

Suggested change
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, i) = max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, i))
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = min(max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)), 1._wp)
alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)
end do
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps)
end do
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qR_prim_rs${XYZ}$_vf(j + 1, k, l, i) = max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, i))
qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = min(max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)), 1._wp)
alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)
alpha_R_sum = alpha_R_sum + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)
end do
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps)
qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)/max(alpha_R_sum, sgm_eps)
end do
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, i) = max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, i))
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = min(max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)), 1._wp)
qR_prim_rs${XYZ}$_vf(j + 1, k, l, i) = max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, i))
qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = min(max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)), 1._wp)
alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)
alpha_R_sum = alpha_R_sum + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)
end do
if (alpha_L_sum < sgm_eps) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = 1._wp/real(num_fluids, wp)
end do
alpha_L_sum = 1._wp
end if
if (alpha_R_sum < sgm_eps) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = 1._wp/real(num_fluids, wp)
end do
alpha_R_sum = 1._wp
end if
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/alpha_L_sum
qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)/alpha_R_sum
end do

Comment on lines +1643 to 1646
E_L = gamma_L*pres_L + pi_inf_L + 5.e-1_wp*rho_L*vel_L_rms + qv_L
E_R = gamma_R*pres_R + pi_inf_R + 5.e-1_wp*rho_R*vel_R_rms + qv_R

H_L = (E_L + pres_L)/rho_L
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Protect divisions by rho_L and rho_R in enthalpy when density can approach zero (e.g., after limiting or at vacuum states). Clamp densities with max(rho, sgm_eps) to avoid NaNs and ensure stable wave speed estimates in the HLLC solver. [possible issue, importance: 8]

New proposed code:
 E_L = gamma_L*pres_L + pi_inf_L + 5.e-1_wp*rho_L*vel_L_rms + qv_L
 E_R = gamma_R*pres_R + pi_inf_R + 5.e-1_wp*rho_R*vel_R_rms + qv_R
 
-H_L = (E_L + pres_L)/rho_L
-H_R = (E_R + pres_R)/rho_R
+H_L = (E_L + pres_L)/max(rho_L, sgm_eps)
+H_R = (E_R + pres_R)/max(rho_R, sgm_eps)

Comment on lines 1258 to 1273
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_L(i) = dflt_real

Re_R(i) = dflt_real
if (Re_size(i) > 0) Re_L(i) = 0._wp

if (Re_size(i) > 0) Re_R(i) = 0._wp
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_L(i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + Re_idx(i, q))/Res(i, q) &
+ Re_L(i)
end do

Re_L(i) = 1._wp/max(Re_L(i), sgm_eps)

end do

$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_R(i) = dflt_real

if (Re_size(i) > 0) Re_R(i) = 0._wp

$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_R(i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + Re_idx(i, q))/Res(i, q) &
+ Re_R(i)
end do

Re_L(i) = 1._wp/max(Re_L(i), sgm_eps)
Re_R(i) = 1._wp/max(Re_R(i), sgm_eps)
end do
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Prevent division by zero if any Res(i,q) entries are zero or extremely small due to user input or preprocessing. Apply max(Res(i,q), sgm_eps) in the denominator when accumulating to ensure finite Reynolds numbers. [general, importance: 7]

Suggested change
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_L(i) = dflt_real
Re_R(i) = dflt_real
if (Re_size(i) > 0) Re_L(i) = 0._wp
if (Re_size(i) > 0) Re_R(i) = 0._wp
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_L(i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + Re_idx(i, q))/Res(i, q) &
+ Re_L(i)
end do
Re_L(i) = 1._wp/max(Re_L(i), sgm_eps)
end do
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_R(i) = dflt_real
if (Re_size(i) > 0) Re_R(i) = 0._wp
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_R(i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + Re_idx(i, q))/Res(i, q) &
+ Re_R(i)
end do
Re_L(i) = 1._wp/max(Re_L(i), sgm_eps)
Re_R(i) = 1._wp/max(Re_R(i), sgm_eps)
end do
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_L(i) = dflt_real
Re_R(i) = dflt_real
if (Re_size(i) > 0) Re_L(i) = 0._wp
if (Re_size(i) > 0) Re_R(i) = 0._wp
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_L(i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + Re_idx(i, q))/max(Res(i, q), sgm_eps) &
+ Re_L(i)
Re_R(i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + Re_idx(i, q))/max(Res(i, q), sgm_eps) &
+ Re_R(i)
end do
Re_L(i) = 1._wp/max(Re_L(i), sgm_eps)
Re_R(i) = 1._wp/max(Re_R(i), sgm_eps)
end do

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have two loops when we could have one

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ask

why do we have two loops when we could have one

Answer:

In the original code, Re_L and Re_R were computed in two separate loops (first for the left state, then for the right). Functionally, both loops perform the same accumulation over q for each i, then apply the same stabilization 1/max(..., sgm_eps). There’s no data dependency between the left and right accumulations, so keeping them separate wasn’t necessary.

In this refactor, they’re consolidated into a single loop over i that:

  • Initializes both Re_L(i) and Re_R(i),
  • Accumulates both left and right sums in the same inner loop over q,
  • Finalizes both with the same normalization.

Benefits of one loop:

  • Reduces loop overhead and improves cache locality by touching Re_idx/Res and the corresponding qL/qR entries once per i.
  • Lowers code duplication, making maintenance less error-prone (e.g., any future fixes like adding max(Res(i,q), sgm_eps) can be made in one place).
  • Better GPU mapping: fewer separate GPU_LOOP regions and improved kernel fusion potential.

In short, there’s no correctness need for two loops; a single loop computes both sides independently and more efficiently.

@sbryngelson sbryngelson self-requested a review September 8, 2025 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants