docs: expand and reorganize tutorial by psychocoderHPC · Pull Request #5 · psychocoderHPC/alpaka3

psychocoderHPC · 2026-04-02T15:35:58Z

Summary by CodeRabbit

New Features
- Added comprehensive tutorial library with 20+ example snippets covering kernels, memory operations, algorithms, random numbers, atomics, shared memory, and warp-level operations.
- Added backend-parameterized test infrastructure for cross-backend compatibility validation.
Documentation
- Added 25+ new tutorial pages covering foundations, kernel fundamentals, hierarchical execution, algorithms, memory operations, performance tuning, and vendor interoperability.
- Enhanced existing documentation with collapsible source file references and improved navigation structure.
Chores
- Removed debug logging from build configuration.
- Minor whitespace cleanup in documentation.

coderabbitai · 2026-04-08T14:31:49Z

📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive tutorial overhaul for alpaka, adding 19 new C++ example/test snippets covering kernels, shared memory, algorithms, random numbers, and numerics, along with 30+ new Sphinx documentation pages that establish a structured learning path. Existing snippets are converted to backend-parameterized tests via TEMPLATE_LIST_TEST_CASE, and documentation capitalization is standardized to lowercase alpaka.

Changes

Cohort / File(s)	Summary
Documentation Infrastructure `docs/source/conf.py`, `docs/source/index.rst`	Updated Sphinx theme configuration (collapse navigation, depth) and reorganized tutorial toctree with new topic structure and adjusted maxdepth.
Core Tutorial Pages `docs/source/tutorial/foundations.rst`, `docs/source/tutorial/intro.rst`, `docs/source/tutorial/mentalModel.rst`, `docs/source/tutorial/execution.rst`	Introduced new foundational pages defining mental model (`IdxRange`, `FrameSpec`, `makeIdxMap`), tutorial structure, execution configuration, and device enumeration concepts.
Kernel Tutorial Pages `docs/source/tutorial/kernels.rst`, `docs/source/tutorial/kernel.rst`, `docs/source/tutorial/hierarchy.rst`, `docs/source/tutorial/multidim.rst`, `docs/source/tutorial/sharedMemory.rst`, `docs/source/tutorial/memFence.rst`, `docs/source/tutorial/chunked.rst`, `docs/source/tutorial/atomics.rst`, `docs/source/tutorial/miniProject.rst`	Added comprehensive kernel-focused tutorials covering basic kernels, execution hierarchy (blocks/threads/warps), multidimensional kernels, shared memory, memory fences, chunked frames, atomics, and a mini project combining multiple concepts.
Numerics and Algorithms Tutorials `docs/source/tutorial/numerics.rst`, `docs/source/tutorial/algorithms.rst`, `docs/source/tutorial/random.rst`, `docs/source/tutorial/math.rst`, `docs/source/tutorial/intrinsics.rst`, `docs/source/tutorial/warp.rst`	Added tutorial pages documenting host-side algorithms, random number generation, math functions, bit intrinsics, and warp-level communication.
Data and Memory Tutorials `docs/source/tutorial/memory.rst`, `docs/source/tutorial/memoryOperations.rst`, `docs/source/tutorial/views.rst`, `docs/source/tutorial/device.rst`, `docs/source/tutorial/queue.rst`, `docs/source/tutorial/vector.rst`, `docs/source/tutorial/events.rst`	Updated and added pages covering memory allocation/operations, views/subviews, device selection, queues, vector types, and event-based synchronization.
Migration and Advanced Tutorials `docs/source/tutorial/migration.rst`, `docs/source/tutorial/migrationMap.rst`, `docs/source/tutorial/portingKernel.rst`, `docs/source/tutorial/backendDifferences.rst`, `docs/source/tutorial/vendorInterop.rst`, `docs/source/tutorial/tuning.rst`	Added migration guidance for CUDA/HIP/SYCL users, backend-specific considerations, kernel porting example, vendor interop patterns, and performance tuning strategy.
Existing Documentation Updates `docs/source/advanced/cmake.rst`, `docs/source/advanced/datastorage.rst`, `docs/source/basic/terms.rst`, `docs/source/basic/cheatsheet.rst`, `docs/source/basic/example.rst`, `docs/source/basic/install.rst`, `docs/source/basic/library.rst`, `docs/source/contribution/*`, `docs/source/dev/logging.rst`	Standardized capitalization to lowercase `alpaka`, added "Complete Source File" collapsible sections to example pages, and cleaned up whitespace.
Example/Snippet Infrastructure `docs/snippets/example/include/docsTest.hpp`, `docs/snippets/example/CMakeLists.txt`	Added new test header defining `docs::test::TestBackends` type alias for backend-parameterized tests; removed cmake status logging.
New Kernel Example Snippets `docs/snippets/example/02_execution.cpp`, `docs/snippets/example/08_events.cpp`, `docs/snippets/example/12_kernelIntro.cpp`, `docs/snippets/example/13_hierarchy.cpp`, `docs/snippets/example/16_sharedMemory.cpp`, `docs/snippets/example/18_multidimKernel.cpp`, `docs/snippets/example/22_atomics.cpp`, `docs/snippets/example/24_math.cpp`, `docs/snippets/example/26_warp.cpp`, `docs/snippets/example/28_chunkedFrames.cpp`, `docs/snippets/example/30_random.cpp`, `docs/snippets/example/31_monteCarloPi.cpp`, `docs/snippets/example/32_intrinsics.cpp`, `docs/snippets/example/34_memFence.cpp`, `docs/snippets/example/36_portingKernel.cpp`, `docs/snippets/example/38_vendorInterop.cpp`, `docs/snippets/example/40_imagePipeline.cpp`	Added 17 new Catch2 templated test/example snippets covering device enumeration, events, basic kernels, hierarchy, shared memory, multidimensional kernels, atomics, math functions, warp shuffles, chunked frames, random numbers, Monte Carlo, bit intrinsics, memory fences, SAXPY porting, vendor interop, and image pipelines.
Updated Example Snippets `docs/snippets/example/05_device.cpp`, `docs/snippets/example/06_queue.cpp`, `docs/snippets/example/10_memory.cpp`, `docs/snippets/example/15_kernel.cpp`, `docs/snippets/example/20_simdKernel.cpp`, `docs/snippets/example/11_views.cpp`	Converted test cases from single-backend `TEST_CASE` to `TEMPLATE_LIST_TEST_CASE` over `docs::test::TestBackends`; strengthened kernel operator signatures to require `onAcc::concepts::Acc auto const&`; switched device/executor selection from hardcoded values to backend-driven configuration; added early availability checks.

Sequence Diagram(s)

sequenceDiagram
    participant Host
    participant Device as onHost::Device
    participant Queue as onHost::Queue
    participant Kernel as Kernel Functor

    Host->>Host: 1. Enumerate backends via<br/>onHost::allBackends(...)
    Host->>Host: 2. Select available backend<br/>from TestBackends
    Host->>Device: 3. Create device via<br/>selector.makeDevice(0)
    Host->>Queue: 4. Create queue for device
    Host->>Queue: 5. Allocate device buffers<br/>via onHost::malloc
    Host->>Queue: 6. Copy host data to device<br/>via onHost::memcpy
    Host->>Queue: 7. Enqueue kernel execution<br/>with FrameSpec
    Queue->>Kernel: 8. Launch kernel on device<br/>with executor & accelerator
    Kernel->>Kernel: 9. Each worker iterates<br/>via makeIdxMap over range
    Kernel->>Device: 10. Perform work<br/>(atomics, shared mem, etc.)
    Queue->>Host: 11. Return after enqueue<br/>(for non-blocking queue)
    Host->>Queue: 12. onHost::wait(queue)<br/>to synchronize completion
    Host->>Queue: 13. Copy result back<br/>via onHost::memcpy
    Host->>Host: 14. Validate results<br/>via CHECK assertions

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Whiskers twitching with tutorial delight,
New kernels hop through documentation bright,
From device selection to warp-shuffle ways,
The alpaka burrow's expanded its maze!
Backend-neutral and structured with care,
Let learners discover portability everywhere! 🎓✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: expanding and reorganizing the tutorial documentation across numerous new and updated files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/tutorial-refresh

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/snippets/example/06_queue.cpp (1)

45-45: ⚠️ Potential issue | 🟡 Minor

Fix typo in tutorial comment (Line 45).

“untile” should be “until”.

✏️ Proposed fix

-    // no wait required, enqueue will wait untile the task is finished
+    // no wait required, enqueue will wait until the task is finished

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/snippets/example/06_queue.cpp` at line 45, Replace the typo in the
inline comment inside docs/snippets/example/06_queue.cpp where the comment reads
"// no wait required, enqueue will wait untile the task is finished" by changing
"untile" to "until" so it reads "// no wait required, enqueue will wait until
the task is finished"; update the exact comment text in the file (search for the
"enqueue will wait untile" substring) to apply this simple spelling fix.

🧹 Nitpick comments (4)

docs/source/tutorial/memory.rst (1)
8-12: Consider adding an explicit cross-link to the memory-operations page.

Since this chapter now focuses on allocation concepts, a short :ref: link to the dedicated memory operations section would improve navigation for readers looking for copy/fill/memset details.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/tutorial/memory.rst` around lines 8 - 12, Add an explicit RTD
cross-reference to the memory operations page by inserting a short
:ref:`memory-operations` link (rendered as "memory operations") into the
paragraph that introduces allocation concepts — e.g., append or parenthetically
add "see :ref:`memory-operations` for copy/fill/memset details" after the
sentence about allocation concepts so readers can jump directly to the
copy/fill/memset reference.
docs/snippets/example/14_algorithms.cpp (1)
13-13: Unused include <bit>.

The <bit> header doesn't appear to be used anywhere in this file. Consider removing it to keep the includes clean.
Proposed fix
 `#include` <array>
-#include <bit>
 `#include` <functional>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/snippets/example/14_algorithms.cpp` at line 13, Remove the unused
include directive by deleting the line that contains `#include` <bit> from the
file (it is not referenced anywhere in this snippet), then rebuild/run tests to
ensure no compile errors; focus on the include removal and keeping other
includes intact.
docs/snippets/example/16_sharedMemory.cpp (1)
148-151: Don't size the dyn-shared cache from thread count unless that's the contract.

This currently assumes m_spec.getNumThreads().x() matches the number of cached elements. The kernel indexes cache[idx.x()] over range::frameExtent, so changing the launch to multiple frame elements per thread would under-allocate the shared buffer. Either derive the byte count from the cached frame extent or call out the 1:1 assumption explicitly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/snippets/example/16_sharedMemory.cpp` around lines 148 - 151, The
dyn-shared size calculation in operator()(auto const executor, auto const& out,
auto const& in, int factor) currently uses m_spec.getNumThreads().x() which
assumes a 1:1 mapping between threads and cached elements; change this to derive
the byte count from the cached frame extent (the number of elements indexed by
cache[idx.x()] over range::frameExtent) or, if the 1:1 mapping is intended, add
an explicit assertion/comment documenting the contract; update the size
expression to use the cached frame extent (or add an assert that
m_spec.getNumThreads().x() == cached_frame_extent) so the shared buffer is
correctly allocated when multiple frame elements per thread are launched.
docs/source/tutorial/backendDifferences.rst (1)
48-49: Consider adding an explicit cross-reference link.

The text mentions "The dedicated vendor-interop chapter" but doesn't include a :doc: link. For consistency with other tutorial pages and reader convenience, consider:
-The dedicated vendor-interop chapter shows the pattern.
+The dedicated :doc:`vendorInterop` chapter shows the pattern.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/tutorial/backendDifferences.rst` around lines 48 - 49, Add an
explicit Sphinx cross-reference to the vendor-interop chapter where the text
currently says "The dedicated vendor-interop chapter shows the pattern"; replace
or augment that phrase with a :doc: role linking to the vendor-interop page
(e.g. :doc:`vendor-interop` or the actual doc name used in the project) so
readers can click through directly from backendDifferences.rst.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/snippets/example/28_chunkedFrames.cpp`:
- Around line 79-81: The code currently computes numFrames by integer-dividing
hostOut.size() by frameExtent (via Vec{...} / frameExtent), which silently
truncates any remainder; before constructing numFrames and onHost::FrameSpec do
an explicit check that hostOut.size() is evenly divisible by the total number of
elements per frame (compute the frame element count from frameExtent), and if
not, fail fast (throw, assert, or log+exit) so trailing elements aren't silently
dropped; refer to frameExtent, hostOut, numFrames and onHost::FrameSpec when
adding the divisibility check and error path.

In `@docs/snippets/example/34_memFence.cpp`:
- Around line 79-81: The unbounded consumer spin using onAcc::atomicCas on
readyFlag can hang the test; replace the infinite while loop that spins on
onAcc::atomicCas(acc, &readyFlag[0u], 0u, 0u) with a bounded polling loop that
tries a fixed number of iterations (or polls with a timeout), and if the loop
exhausts without success increment mismatchCounter to report the timeout and
break out so the test fails instead of hanging; ensure you reference and update
the same readyFlag index and use mismatchCounter to record the failure.
- Around line 32-43: The second barrier (onAcc::syncBlockThreads(acc)) makes the
release/acquire memFence pair redundant; to demonstrate fence semantics replace
the barrier-based synchronization with fence-only ordering by removing the
second onAcc::syncBlockThreads(acc) call (the one immediately before reading
observedB) and keep the release fence in the tid==0 block and the acquire fence
before reading observedA; this preserves shared, tid, onAcc::memFence and the
fence semantics you intend to show.

In `@docs/snippets/example/38_vendorInterop.cpp`:
- Around line 75-79: The host-path currently uses std::transform over
input.getExtents().x(), which truncates multidimensional
alpaka::concepts::IMdSpan inputs; either restrict this overload to 1D spans or
compute the full element count by multiplying all dimensions from
input.getExtents() and use that count (and appropriate pointer arithmetic) in
std::transform; locate the host dispatch that uses input,
input.getExtents().x(), outPtr and the lambda and change it to validate/require
a 1D IMdSpan or replace input.getExtents().x() with the product of all extents
(or call alpaka::onHost::transform like the fallback) so all elements are
transformed.

In `@docs/source/tutorial/atomics.rst`:
- Around line 74-78: The docs incorrectly state the default scope as
onAcc::scope::Device (capital D); update the text to use the correct lowercase
constant onAcc::scope::device for consistency with the API and the other
examples (see onAcc::atomicAdd and onAcc::scope::block/onAcc::scope::device).

In `@docs/source/tutorial/memFence.rst`:
- Around line 45-48: Update the prose in the producer/consumer description and
the summary to explicitly state that the ready flag operations must be atomic:
mention that the producer must perform an atomic store/update (e.g., atomicExch)
to set the ready flag and the consumer must read/update it atomically (e.g.,
atomicCas or atomic load), and ensure the text near the examples that reference
atomicExch and atomicCas explicitly uses the word "atomic" so readers don’t miss
the requirement.

In `@docs/source/tutorial/random.rst`:
- Around line 31-33: Update the documentation snippet to use the lowercase
instance name used throughout the codebase: replace the type reference
`rand::interval::CO` with the constant instance `rand::interval::co` so the
examples (which use `rand::engine::Philox4x32x10` and
`rand::distribution::UniformReal<float>`) compile when copy-pasted; ensure any
other occurrences in the same file also use `rand::interval::co` instead of
`rand::interval::CO`.

---

Outside diff comments:
In `@docs/snippets/example/06_queue.cpp`:
- Line 45: Replace the typo in the inline comment inside
docs/snippets/example/06_queue.cpp where the comment reads "// no wait required,
enqueue will wait untile the task is finished" by changing "untile" to "until"
so it reads "// no wait required, enqueue will wait until the task is finished";
update the exact comment text in the file (search for the "enqueue will wait
untile" substring) to apply this simple spelling fix.

---

Nitpick comments:
In `@docs/snippets/example/14_algorithms.cpp`:
- Line 13: Remove the unused include directive by deleting the line that
contains `#include` <bit> from the file (it is not referenced anywhere in this
snippet), then rebuild/run tests to ensure no compile errors; focus on the
include removal and keeping other includes intact.

In `@docs/snippets/example/16_sharedMemory.cpp`:
- Around line 148-151: The dyn-shared size calculation in operator()(auto const
executor, auto const& out, auto const& in, int factor) currently uses
m_spec.getNumThreads().x() which assumes a 1:1 mapping between threads and
cached elements; change this to derive the byte count from the cached frame
extent (the number of elements indexed by cache[idx.x()] over
range::frameExtent) or, if the 1:1 mapping is intended, add an explicit
assertion/comment documenting the contract; update the size expression to use
the cached frame extent (or add an assert that m_spec.getNumThreads().x() ==
cached_frame_extent) so the shared buffer is correctly allocated when multiple
frame elements per thread are launched.

In `@docs/source/tutorial/backendDifferences.rst`:
- Around line 48-49: Add an explicit Sphinx cross-reference to the
vendor-interop chapter where the text currently says "The dedicated
vendor-interop chapter shows the pattern"; replace or augment that phrase with a
:doc: role linking to the vendor-interop page (e.g. :doc:`vendor-interop` or the
actual doc name used in the project) so readers can click through directly from
backendDifferences.rst.

In `@docs/source/tutorial/memory.rst`:
- Around line 8-12: Add an explicit RTD cross-reference to the memory operations
page by inserting a short :ref:`memory-operations` link (rendered as "memory
operations") into the paragraph that introduces allocation concepts — e.g.,
append or parenthetically add "see :ref:`memory-operations` for copy/fill/memset
details" after the sentence about allocation concepts so readers can jump
directly to the copy/fill/memset reference.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1d03d702-71d7-442b-aebc-c7776b784bef

📥 Commits

Reviewing files that changed from the base of the PR and between e24efe3 and 2b84202.

📒 Files selected for processing (61)

docs/snippets/example/02_execution.cpp
docs/snippets/example/05_device.cpp
docs/snippets/example/06_queue.cpp
docs/snippets/example/08_events.cpp
docs/snippets/example/10_memory.cpp
docs/snippets/example/11_views.cpp
docs/snippets/example/12_kernelIntro.cpp
docs/snippets/example/13_hierarchy.cpp
docs/snippets/example/14_algorithms.cpp
docs/snippets/example/15_kernel.cpp
docs/snippets/example/16_sharedMemory.cpp
docs/snippets/example/18_multidimKernel.cpp
docs/snippets/example/20_simdKernel.cpp
docs/snippets/example/22_atomics.cpp
docs/snippets/example/24_math.cpp
docs/snippets/example/26_warp.cpp
docs/snippets/example/28_chunkedFrames.cpp
docs/snippets/example/30_random.cpp
docs/snippets/example/31_monteCarloPi.cpp
docs/snippets/example/32_intrinsics.cpp
docs/snippets/example/34_memFence.cpp
docs/snippets/example/36_portingKernel.cpp
docs/snippets/example/38_vendorInterop.cpp
docs/snippets/example/40_imagePipeline.cpp
docs/snippets/example/CMakeLists.txt
docs/snippets/example/include/docsTest.hpp
docs/source/advanced/cmake.rst
docs/source/advanced/datastorage.rst
docs/source/basic/terms.rst
docs/source/conf.py
docs/source/index.rst
docs/source/tutorial/algorithms.rst
docs/source/tutorial/atomics.rst
docs/source/tutorial/backendDifferences.rst
docs/source/tutorial/chunked.rst
docs/source/tutorial/device.rst
docs/source/tutorial/events.rst
docs/source/tutorial/execution.rst
docs/source/tutorial/foundations.rst
docs/source/tutorial/hierarchy.rst
docs/source/tutorial/intrinsics.rst
docs/source/tutorial/intro.rst
docs/source/tutorial/kernel.rst
docs/source/tutorial/kernels.rst
docs/source/tutorial/math.rst
docs/source/tutorial/memFence.rst
docs/source/tutorial/memory.rst
docs/source/tutorial/memoryOperations.rst
docs/source/tutorial/mentalModel.rst
docs/source/tutorial/migration.rst
docs/source/tutorial/migrationMap.rst
docs/source/tutorial/miniProject.rst
docs/source/tutorial/multidim.rst
docs/source/tutorial/numerics.rst
docs/source/tutorial/portingKernel.rst
docs/source/tutorial/random.rst
docs/source/tutorial/sharedMemory.rst
docs/source/tutorial/tuning.rst
docs/source/tutorial/vendorInterop.rst
docs/source/tutorial/views.rst
docs/source/tutorial/warp.rst

💤 Files with no reviewable changes (1)

docs/snippets/example/CMakeLists.txt

coderabbitai · 2026-04-08T14:54:31Z

+            if(tid == 0u)
+            {
+                shared[0] = 10;
+                onAcc::memFence(acc, onAcc::scope::block, onAcc::order::release);
+                shared[1] = 20;
+            }
+
+            onAcc::syncBlockThreads(acc);
+
+            auto observedB = shared[1];
+            onAcc::memFence(acc, onAcc::scope::block, onAcc::order::acquire);
+            auto observedA = shared[0];


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n docs/snippets/example/34_memFence.cpp

Repository: psychocoderHPC/alpaka3

Length of output: 5362

The second syncBlockThreads() makes the release/acquire fence pair redundant.

The barrier at line 39 ensures all threads synchronize before reading, which means all of thread 0's writes are already visible to other threads regardless of the fence semantics. This renders the release fence at line 35 unnecessary for achieving the intended ordering in this example, turning what should demonstrate fence semantics into a barrier-based example instead.

🧰 Tools

🪛 Cppcheck (2.20.0)

[error] 40-40: Found an exit path from function with non-void return type that has missing return statement

(missingReturn)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/snippets/example/34_memFence.cpp` around lines 32 - 43, The second barrier (onAcc::syncBlockThreads(acc)) makes the release/acquire memFence pair redundant; to demonstrate fence semantics replace the barrier-based synchronization with fence-only ordering by removing the second onAcc::syncBlockThreads(acc) call (the one immediately before reading observedB) and keep the release fence in the tid==0 block and the acquire fence before reading observedA; this preserves shared, tid, onAcc::memFence and the fence semantics you intend to show.

coderabbitai · 2026-04-08T14:54:31Z

+            while(onAcc::atomicCas(acc, &readyFlag[0u], 0u, 0u) == 0u)
+            {
+            }


⚠️ Potential issue | 🟠 Major

Bound the consumer spin loop.

If readyFlag is never published on one backend, this test hangs the docs suite instead of failing. Turn the busy-wait into a bounded poll and report timeout through mismatchCounter.

🛡️ Proposed fix

- while(onAcc::atomicCas(acc, &readyFlag[0u], 0u, 0u) == 0u) - { - } + uint32_t spins = 0u; + while(onAcc::atomicCas(acc, &readyFlag[0u], 0u, 0u) == 0u) + { + if(++spins == 1000000u) + { + onAcc::atomicAdd(acc, &mismatchCounter[0u], 1u); + return; + } + }

🧰 Tools

🪛 Cppcheck (2.20.0)

[error] 80-80: Found an exit path from function with non-void return type that has missing return statement

(missingReturn)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/snippets/example/34_memFence.cpp` around lines 79 - 81, The unbounded consumer spin using onAcc::atomicCas on readyFlag can hang the test; replace the infinite while loop that spins on onAcc::atomicCas(acc, &readyFlag[0u], 0u, 0u) with a bounded polling loop that tries a fixed number of iterations (or polls with a timeout), and if the loop exhausts without success increment mismatchCounter to report the timeout and break out so the test fails instead of hanging; ensure you reference and update the same readyFlag index and use mismatchCounter to record the failure.

coderabbitai · 2026-04-08T14:54:31Z

+                std::transform(
+                    input.data(),
+                    input.data() + input.getExtents().x(),
+                    outPtr,
+                    [=](float value) { return scale * value + shift; });


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify current element-count logic and related usage context. rg -n -C2 'input\.getExtents\(\)\.x\(\)' docs/snippets/example/38_vendorInterop.cpp rg -n -C2 'AffineTransform::call\(' docs/snippets/example rg -n -C2 'fnDispatch\(' docs/snippets/example/38_vendorInterop.cpp

Repository: psychocoderHPC/alpaka3

Length of output: 1208

🏁 Script executed:

cat -n docs/snippets/example/38_vendorInterop.cpp | sed -n '60,85p'

Repository: psychocoderHPC/alpaka3

Length of output: 1023

🏁 Script executed:

cat -n docs/snippets/example/38_vendorInterop.cpp | sed -n '40,60p'

Repository: psychocoderHPC/alpaka3

Length of output: 844

🏁 Script executed:

cat -n docs/snippets/example/38_vendorInterop.cpp | sed -n '85,110p'

Repository: psychocoderHPC/alpaka3

Length of output: 1164

🏁 Script executed:

# Check if there are any other 2D/3D usage examples in the file rg -n 'getExtents\(\)' docs/snippets/example/38_vendorInterop.cpp # Also check alpaka::onHost::transform to understand how it handles multidimensional spans rg -n 'onHost::transform' docs/snippets/example/38_vendorInterop.cpp

Repository: psychocoderHPC/alpaka3

Length of output: 165

Host dispatch truncates multidimensional inputs.

The function at line 63-81 accepts any alpaka::concepts::IMdSpan (including 2D/3D), but at line 77 it only processes input.getExtents().x() elements, leaving the remaining dimensions untransformed. The Alpaka fallback at lines 44-57 correctly handles multidimensional spans via alpaka::onHost::transform. Constrain this overload to 1D spans explicitly or compute the total element count from all extents before std::transform.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/snippets/example/38_vendorInterop.cpp` around lines 75 - 79, The host-path currently uses std::transform over input.getExtents().x(), which truncates multidimensional alpaka::concepts::IMdSpan inputs; either restrict this overload to 1D spans or compute the full element count by multiplying all dimensions from input.getExtents() and use that count (and appropriate pointer arithmetic) in std::transform; locate the host dispatch that uses input, input.getExtents().x(), outPtr and the lambda and change it to validate/require a 1D IMdSpan or replace input.getExtents().x() with the product of all extents (or call alpaka::onHost::transform like the fallback) so all elements are transformed.

coderabbitai

🧹 Nitpick comments (1)

docs/snippets/example/16_sharedMemory.cpp (1)

149-157: Avoid assert-only sizing logic for dynamic shared memory.

At Line 154, divisibility is enforced only via assert. In release builds, this can under-allocate shared memory when extents aren’t evenly divisible by block count, which can cascade into OOB accesses in DynamicScaleKernel.

Proposed hardening

         uint32_t operator()(auto const executor, auto const& out, auto const& in, int factor) const
         {
             alpaka::unused(executor, out, in, factor);
             auto const totalCachedElements = in.getExtents().product();
             auto const numBlocks = m_spec.getNumBlocks().product();
-            assert(totalCachedElements % numBlocks == 0u);
-            auto const cachedFrameExtent = totalCachedElements / numBlocks;
+            if(numBlocks == 0u)
+                return 0u;
+            assert(totalCachedElements % numBlocks == 0u);
+            auto const cachedFrameExtent = (totalCachedElements + numBlocks - 1u) / numBlocks;
             return static_cast<uint32_t>(cachedFrameExtent * sizeof(int));
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/snippets/example/16_sharedMemory.cpp` around lines 149 - 157, The sizing
currently relies on an assert inside operator()(...) which is removed in release
builds and can under-allocate dynamic shared memory; replace the assert-based
divisibility assumption with a safe runtime computation: compute numBlocks =
m_spec.getNumBlocks().product() and cachedFrameExtent = (totalCachedElements +
numBlocks - 1) / numBlocks (i.e., round up) to guarantee enough shared memory,
and optionally check for numBlocks == 0 and handle/throw/log; update the return
to use the rounded-up cachedFrameExtent * sizeof(int) so DynamicScaleKernel
never under-allocates.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/snippets/example/16_sharedMemory.cpp`:
- Around line 149-157: The sizing currently relies on an assert inside
operator()(...) which is removed in release builds and can under-allocate
dynamic shared memory; replace the assert-based divisibility assumption with a
safe runtime computation: compute numBlocks = m_spec.getNumBlocks().product()
and cachedFrameExtent = (totalCachedElements + numBlocks - 1) / numBlocks (i.e.,
round up) to guarantee enough shared memory, and optionally check for numBlocks
== 0 and handle/throw/log; update the return to use the rounded-up
cachedFrameExtent * sizeof(int) so DynamicScaleKernel never under-allocates.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5bc28376-006a-4887-aaa5-a8b5a169d4d1

📥 Commits

Reviewing files that changed from the base of the PR and between 2b84202 and bf34dc3.

📒 Files selected for processing (38)

docs/snippets/example/06_queue.cpp
docs/snippets/example/14_algorithms.cpp
docs/snippets/example/16_sharedMemory.cpp
docs/snippets/example/28_chunkedFrames.cpp
docs/source/advanced/cmake.rst
docs/source/advanced/datastorage.rst
docs/source/basic/cheatsheet.rst
docs/source/basic/example.rst
docs/source/basic/install.rst
docs/source/basic/library.rst
docs/source/basic/terms.rst
docs/source/contribution/sphinx.rst
docs/source/contribution/tools.rst
docs/source/dev/logging.rst
docs/source/tutorial/algorithms.rst
docs/source/tutorial/atomics.rst
docs/source/tutorial/backendDifferences.rst
docs/source/tutorial/chunked.rst
docs/source/tutorial/device.rst
docs/source/tutorial/events.rst
docs/source/tutorial/execution.rst
docs/source/tutorial/hierarchy.rst
docs/source/tutorial/intrinsics.rst
docs/source/tutorial/kernel.rst
docs/source/tutorial/math.rst
docs/source/tutorial/memFence.rst
docs/source/tutorial/memory.rst
docs/source/tutorial/memoryOperations.rst
docs/source/tutorial/miniProject.rst
docs/source/tutorial/multidim.rst
docs/source/tutorial/portingKernel.rst
docs/source/tutorial/queue.rst
docs/source/tutorial/random.rst
docs/source/tutorial/sharedMemory.rst
docs/source/tutorial/vector.rst
docs/source/tutorial/vendorInterop.rst
docs/source/tutorial/views.rst
docs/source/tutorial/warp.rst

💤 Files with no reviewable changes (4)

docs/source/contribution/sphinx.rst
docs/source/dev/logging.rst
docs/source/contribution/tools.rst
docs/source/basic/library.rst

✅ Files skipped from review due to trivial changes (28)

docs/source/basic/example.rst
docs/source/basic/install.rst
docs/source/basic/cheatsheet.rst
docs/source/tutorial/device.rst
docs/source/tutorial/queue.rst
docs/source/tutorial/vector.rst
docs/source/advanced/datastorage.rst
docs/source/tutorial/events.rst
docs/source/tutorial/miniProject.rst
docs/source/tutorial/multidim.rst
docs/source/tutorial/hierarchy.rst
docs/source/tutorial/atomics.rst
docs/source/tutorial/math.rst
docs/source/tutorial/warp.rst
docs/source/tutorial/backendDifferences.rst
docs/source/tutorial/views.rst
docs/source/tutorial/vendorInterop.rst
docs/source/tutorial/memFence.rst
docs/source/tutorial/chunked.rst
docs/source/tutorial/execution.rst
docs/source/tutorial/kernel.rst
docs/source/tutorial/sharedMemory.rst
docs/source/tutorial/portingKernel.rst
docs/source/tutorial/algorithms.rst
docs/source/tutorial/intrinsics.rst
docs/source/tutorial/random.rst
docs/source/advanced/cmake.rst
docs/source/tutorial/memoryOperations.rst

🚧 Files skipped from review as they are similar to previous changes (2)

docs/source/basic/terms.rst
docs/source/tutorial/memory.rst

codex and others added 2 commits April 2, 2026 15:04

docs: expand and reorganize tutorial

3d270fe

style: apply pre-commit formatting

7db2242

psychocoderHPC force-pushed the docs/tutorial-refresh branch from e59ee25 to 7db2242 Compare April 2, 2026 15:43

codex added 9 commits April 2, 2026 20:01

docs: fix backend-compatible snippet examples

4c77bf4

docs: fix cuda vendor interop snippet

c501e9e

docs: prefer plain size_t in snippets

6a1b092

docs: split memory operations into its own page

a191496

docs: show helper functors in tutorial snippets

35c1620

docs: use explicit accelerator concept in kernels

dd390ce

Fix warp docs example lane indexing

b116081

Refresh docs snippet backend coverage

34aba38

Apply pre-commit fixes to docs snippets

2b84202

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

codex added 2 commits April 10, 2026 14:44

docs: address tutorial review follow-ups

4dd7294

docs: collapse full source examples

bf34dc3

coderabbitai bot reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: expand and reorganize tutorial#5

docs: expand and reorganize tutorial#5
psychocoderHPC wants to merge 13 commits intodevfrom
docs/tutorial-refresh

psychocoderHPC commented Apr 2, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 8, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Apr 8, 2026

Uh oh!

coderabbitai bot Apr 8, 2026

Uh oh!

coderabbitai bot Apr 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

psychocoderHPC commented Apr 2, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

psychocoderHPC commented Apr 2, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 8, 2026 •

edited

Loading