Conversation
e59ee25 to
7db2242
Compare
📝 WalkthroughWalkthroughThis pull request introduces a comprehensive tutorial overhaul for alpaka, adding 19 new C++ example/test snippets covering kernels, shared memory, algorithms, random numbers, and numerics, along with 30+ new Sphinx documentation pages that establish a structured learning path. Existing snippets are converted to backend-parameterized tests via Changes
Sequence Diagram(s)sequenceDiagram
participant Host
participant Device as onHost::Device
participant Queue as onHost::Queue
participant Kernel as Kernel Functor
Host->>Host: 1. Enumerate backends via<br/>onHost::allBackends(...)
Host->>Host: 2. Select available backend<br/>from TestBackends
Host->>Device: 3. Create device via<br/>selector.makeDevice(0)
Host->>Queue: 4. Create queue for device
Host->>Queue: 5. Allocate device buffers<br/>via onHost::malloc
Host->>Queue: 6. Copy host data to device<br/>via onHost::memcpy
Host->>Queue: 7. Enqueue kernel execution<br/>with FrameSpec
Queue->>Kernel: 8. Launch kernel on device<br/>with executor & accelerator
Kernel->>Kernel: 9. Each worker iterates<br/>via makeIdxMap over range
Kernel->>Device: 10. Perform work<br/>(atomics, shared mem, etc.)
Queue->>Host: 11. Return after enqueue<br/>(for non-blocking queue)
Host->>Queue: 12. onHost::wait(queue)<br/>to synchronize completion
Host->>Queue: 13. Copy result back<br/>via onHost::memcpy
Host->>Host: 14. Validate results<br/>via CHECK assertions
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/snippets/example/06_queue.cpp (1)
45-45:⚠️ Potential issue | 🟡 MinorFix typo in tutorial comment (Line 45).
“untile” should be “until”.
✏️ Proposed fix
- // no wait required, enqueue will wait untile the task is finished + // no wait required, enqueue will wait until the task is finished🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/snippets/example/06_queue.cpp` at line 45, Replace the typo in the inline comment inside docs/snippets/example/06_queue.cpp where the comment reads "// no wait required, enqueue will wait untile the task is finished" by changing "untile" to "until" so it reads "// no wait required, enqueue will wait until the task is finished"; update the exact comment text in the file (search for the "enqueue will wait untile" substring) to apply this simple spelling fix.
🧹 Nitpick comments (4)
docs/source/tutorial/memory.rst (1)
8-12: Consider adding an explicit cross-link to the memory-operations page.Since this chapter now focuses on allocation concepts, a short
:ref:link to the dedicated memory operations section would improve navigation for readers looking for copy/fill/memset details.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/source/tutorial/memory.rst` around lines 8 - 12, Add an explicit RTD cross-reference to the memory operations page by inserting a short :ref:`memory-operations` link (rendered as "memory operations") into the paragraph that introduces allocation concepts — e.g., append or parenthetically add "see :ref:`memory-operations` for copy/fill/memset details" after the sentence about allocation concepts so readers can jump directly to the copy/fill/memset reference.docs/snippets/example/14_algorithms.cpp (1)
13-13: Unused include<bit>.The
<bit>header doesn't appear to be used anywhere in this file. Consider removing it to keep the includes clean.Proposed fix
`#include` <array> -#include <bit> `#include` <functional>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/snippets/example/14_algorithms.cpp` at line 13, Remove the unused include directive by deleting the line that contains `#include` <bit> from the file (it is not referenced anywhere in this snippet), then rebuild/run tests to ensure no compile errors; focus on the include removal and keeping other includes intact.docs/snippets/example/16_sharedMemory.cpp (1)
148-151: Don't size the dyn-shared cache from thread count unless that's the contract.This currently assumes
m_spec.getNumThreads().x()matches the number of cached elements. The kernel indexescache[idx.x()]overrange::frameExtent, so changing the launch to multiple frame elements per thread would under-allocate the shared buffer. Either derive the byte count from the cached frame extent or call out the 1:1 assumption explicitly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/snippets/example/16_sharedMemory.cpp` around lines 148 - 151, The dyn-shared size calculation in operator()(auto const executor, auto const& out, auto const& in, int factor) currently uses m_spec.getNumThreads().x() which assumes a 1:1 mapping between threads and cached elements; change this to derive the byte count from the cached frame extent (the number of elements indexed by cache[idx.x()] over range::frameExtent) or, if the 1:1 mapping is intended, add an explicit assertion/comment documenting the contract; update the size expression to use the cached frame extent (or add an assert that m_spec.getNumThreads().x() == cached_frame_extent) so the shared buffer is correctly allocated when multiple frame elements per thread are launched.docs/source/tutorial/backendDifferences.rst (1)
48-49: Consider adding an explicit cross-reference link.The text mentions "The dedicated vendor-interop chapter" but doesn't include a
:doc:link. For consistency with other tutorial pages and reader convenience, consider:-The dedicated vendor-interop chapter shows the pattern. +The dedicated :doc:`vendorInterop` chapter shows the pattern.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/source/tutorial/backendDifferences.rst` around lines 48 - 49, Add an explicit Sphinx cross-reference to the vendor-interop chapter where the text currently says "The dedicated vendor-interop chapter shows the pattern"; replace or augment that phrase with a :doc: role linking to the vendor-interop page (e.g. :doc:`vendor-interop` or the actual doc name used in the project) so readers can click through directly from backendDifferences.rst.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/snippets/example/28_chunkedFrames.cpp`:
- Around line 79-81: The code currently computes numFrames by integer-dividing
hostOut.size() by frameExtent (via Vec{...} / frameExtent), which silently
truncates any remainder; before constructing numFrames and onHost::FrameSpec do
an explicit check that hostOut.size() is evenly divisible by the total number of
elements per frame (compute the frame element count from frameExtent), and if
not, fail fast (throw, assert, or log+exit) so trailing elements aren't silently
dropped; refer to frameExtent, hostOut, numFrames and onHost::FrameSpec when
adding the divisibility check and error path.
In `@docs/snippets/example/34_memFence.cpp`:
- Around line 79-81: The unbounded consumer spin using onAcc::atomicCas on
readyFlag can hang the test; replace the infinite while loop that spins on
onAcc::atomicCas(acc, &readyFlag[0u], 0u, 0u) with a bounded polling loop that
tries a fixed number of iterations (or polls with a timeout), and if the loop
exhausts without success increment mismatchCounter to report the timeout and
break out so the test fails instead of hanging; ensure you reference and update
the same readyFlag index and use mismatchCounter to record the failure.
- Around line 32-43: The second barrier (onAcc::syncBlockThreads(acc)) makes the
release/acquire memFence pair redundant; to demonstrate fence semantics replace
the barrier-based synchronization with fence-only ordering by removing the
second onAcc::syncBlockThreads(acc) call (the one immediately before reading
observedB) and keep the release fence in the tid==0 block and the acquire fence
before reading observedA; this preserves shared, tid, onAcc::memFence and the
fence semantics you intend to show.
In `@docs/snippets/example/38_vendorInterop.cpp`:
- Around line 75-79: The host-path currently uses std::transform over
input.getExtents().x(), which truncates multidimensional
alpaka::concepts::IMdSpan inputs; either restrict this overload to 1D spans or
compute the full element count by multiplying all dimensions from
input.getExtents() and use that count (and appropriate pointer arithmetic) in
std::transform; locate the host dispatch that uses input,
input.getExtents().x(), outPtr and the lambda and change it to validate/require
a 1D IMdSpan or replace input.getExtents().x() with the product of all extents
(or call alpaka::onHost::transform like the fallback) so all elements are
transformed.
In `@docs/source/tutorial/atomics.rst`:
- Around line 74-78: The docs incorrectly state the default scope as
onAcc::scope::Device (capital D); update the text to use the correct lowercase
constant onAcc::scope::device for consistency with the API and the other
examples (see onAcc::atomicAdd and onAcc::scope::block/onAcc::scope::device).
In `@docs/source/tutorial/memFence.rst`:
- Around line 45-48: Update the prose in the producer/consumer description and
the summary to explicitly state that the ready flag operations must be atomic:
mention that the producer must perform an atomic store/update (e.g., atomicExch)
to set the ready flag and the consumer must read/update it atomically (e.g.,
atomicCas or atomic load), and ensure the text near the examples that reference
atomicExch and atomicCas explicitly uses the word "atomic" so readers don’t miss
the requirement.
In `@docs/source/tutorial/random.rst`:
- Around line 31-33: Update the documentation snippet to use the lowercase
instance name used throughout the codebase: replace the type reference
`rand::interval::CO` with the constant instance `rand::interval::co` so the
examples (which use `rand::engine::Philox4x32x10` and
`rand::distribution::UniformReal<float>`) compile when copy-pasted; ensure any
other occurrences in the same file also use `rand::interval::co` instead of
`rand::interval::CO`.
---
Outside diff comments:
In `@docs/snippets/example/06_queue.cpp`:
- Line 45: Replace the typo in the inline comment inside
docs/snippets/example/06_queue.cpp where the comment reads "// no wait required,
enqueue will wait untile the task is finished" by changing "untile" to "until"
so it reads "// no wait required, enqueue will wait until the task is finished";
update the exact comment text in the file (search for the "enqueue will wait
untile" substring) to apply this simple spelling fix.
---
Nitpick comments:
In `@docs/snippets/example/14_algorithms.cpp`:
- Line 13: Remove the unused include directive by deleting the line that
contains `#include` <bit> from the file (it is not referenced anywhere in this
snippet), then rebuild/run tests to ensure no compile errors; focus on the
include removal and keeping other includes intact.
In `@docs/snippets/example/16_sharedMemory.cpp`:
- Around line 148-151: The dyn-shared size calculation in operator()(auto const
executor, auto const& out, auto const& in, int factor) currently uses
m_spec.getNumThreads().x() which assumes a 1:1 mapping between threads and
cached elements; change this to derive the byte count from the cached frame
extent (the number of elements indexed by cache[idx.x()] over
range::frameExtent) or, if the 1:1 mapping is intended, add an explicit
assertion/comment documenting the contract; update the size expression to use
the cached frame extent (or add an assert that m_spec.getNumThreads().x() ==
cached_frame_extent) so the shared buffer is correctly allocated when multiple
frame elements per thread are launched.
In `@docs/source/tutorial/backendDifferences.rst`:
- Around line 48-49: Add an explicit Sphinx cross-reference to the
vendor-interop chapter where the text currently says "The dedicated
vendor-interop chapter shows the pattern"; replace or augment that phrase with a
:doc: role linking to the vendor-interop page (e.g. :doc:`vendor-interop` or the
actual doc name used in the project) so readers can click through directly from
backendDifferences.rst.
In `@docs/source/tutorial/memory.rst`:
- Around line 8-12: Add an explicit RTD cross-reference to the memory operations
page by inserting a short :ref:`memory-operations` link (rendered as "memory
operations") into the paragraph that introduces allocation concepts — e.g.,
append or parenthetically add "see :ref:`memory-operations` for copy/fill/memset
details" after the sentence about allocation concepts so readers can jump
directly to the copy/fill/memset reference.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1d03d702-71d7-442b-aebc-c7776b784bef
📒 Files selected for processing (61)
docs/snippets/example/02_execution.cppdocs/snippets/example/05_device.cppdocs/snippets/example/06_queue.cppdocs/snippets/example/08_events.cppdocs/snippets/example/10_memory.cppdocs/snippets/example/11_views.cppdocs/snippets/example/12_kernelIntro.cppdocs/snippets/example/13_hierarchy.cppdocs/snippets/example/14_algorithms.cppdocs/snippets/example/15_kernel.cppdocs/snippets/example/16_sharedMemory.cppdocs/snippets/example/18_multidimKernel.cppdocs/snippets/example/20_simdKernel.cppdocs/snippets/example/22_atomics.cppdocs/snippets/example/24_math.cppdocs/snippets/example/26_warp.cppdocs/snippets/example/28_chunkedFrames.cppdocs/snippets/example/30_random.cppdocs/snippets/example/31_monteCarloPi.cppdocs/snippets/example/32_intrinsics.cppdocs/snippets/example/34_memFence.cppdocs/snippets/example/36_portingKernel.cppdocs/snippets/example/38_vendorInterop.cppdocs/snippets/example/40_imagePipeline.cppdocs/snippets/example/CMakeLists.txtdocs/snippets/example/include/docsTest.hppdocs/source/advanced/cmake.rstdocs/source/advanced/datastorage.rstdocs/source/basic/terms.rstdocs/source/conf.pydocs/source/index.rstdocs/source/tutorial/algorithms.rstdocs/source/tutorial/atomics.rstdocs/source/tutorial/backendDifferences.rstdocs/source/tutorial/chunked.rstdocs/source/tutorial/device.rstdocs/source/tutorial/events.rstdocs/source/tutorial/execution.rstdocs/source/tutorial/foundations.rstdocs/source/tutorial/hierarchy.rstdocs/source/tutorial/intrinsics.rstdocs/source/tutorial/intro.rstdocs/source/tutorial/kernel.rstdocs/source/tutorial/kernels.rstdocs/source/tutorial/math.rstdocs/source/tutorial/memFence.rstdocs/source/tutorial/memory.rstdocs/source/tutorial/memoryOperations.rstdocs/source/tutorial/mentalModel.rstdocs/source/tutorial/migration.rstdocs/source/tutorial/migrationMap.rstdocs/source/tutorial/miniProject.rstdocs/source/tutorial/multidim.rstdocs/source/tutorial/numerics.rstdocs/source/tutorial/portingKernel.rstdocs/source/tutorial/random.rstdocs/source/tutorial/sharedMemory.rstdocs/source/tutorial/tuning.rstdocs/source/tutorial/vendorInterop.rstdocs/source/tutorial/views.rstdocs/source/tutorial/warp.rst
💤 Files with no reviewable changes (1)
- docs/snippets/example/CMakeLists.txt
| if(tid == 0u) | ||
| { | ||
| shared[0] = 10; | ||
| onAcc::memFence(acc, onAcc::scope::block, onAcc::order::release); | ||
| shared[1] = 20; | ||
| } | ||
|
|
||
| onAcc::syncBlockThreads(acc); | ||
|
|
||
| auto observedB = shared[1]; | ||
| onAcc::memFence(acc, onAcc::scope::block, onAcc::order::acquire); | ||
| auto observedA = shared[0]; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n docs/snippets/example/34_memFence.cppRepository: psychocoderHPC/alpaka3
Length of output: 5362
The second syncBlockThreads() makes the release/acquire fence pair redundant.
The barrier at line 39 ensures all threads synchronize before reading, which means all of thread 0's writes are already visible to other threads regardless of the fence semantics. This renders the release fence at line 35 unnecessary for achieving the intended ordering in this example, turning what should demonstrate fence semantics into a barrier-based example instead.
🧰 Tools
🪛 Cppcheck (2.20.0)
[error] 40-40: Found an exit path from function with non-void return type that has missing return statement
(missingReturn)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/snippets/example/34_memFence.cpp` around lines 32 - 43, The second
barrier (onAcc::syncBlockThreads(acc)) makes the release/acquire memFence pair
redundant; to demonstrate fence semantics replace the barrier-based
synchronization with fence-only ordering by removing the second
onAcc::syncBlockThreads(acc) call (the one immediately before reading observedB)
and keep the release fence in the tid==0 block and the acquire fence before
reading observedA; this preserves shared, tid, onAcc::memFence and the fence
semantics you intend to show.
| while(onAcc::atomicCas(acc, &readyFlag[0u], 0u, 0u) == 0u) | ||
| { | ||
| } |
There was a problem hiding this comment.
Bound the consumer spin loop.
If readyFlag is never published on one backend, this test hangs the docs suite instead of failing. Turn the busy-wait into a bounded poll and report timeout through mismatchCounter.
🛡️ Proposed fix
- while(onAcc::atomicCas(acc, &readyFlag[0u], 0u, 0u) == 0u)
- {
- }
+ uint32_t spins = 0u;
+ while(onAcc::atomicCas(acc, &readyFlag[0u], 0u, 0u) == 0u)
+ {
+ if(++spins == 1000000u)
+ {
+ onAcc::atomicAdd(acc, &mismatchCounter[0u], 1u);
+ return;
+ }
+ }🧰 Tools
🪛 Cppcheck (2.20.0)
[error] 80-80: Found an exit path from function with non-void return type that has missing return statement
(missingReturn)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/snippets/example/34_memFence.cpp` around lines 79 - 81, The unbounded
consumer spin using onAcc::atomicCas on readyFlag can hang the test; replace the
infinite while loop that spins on onAcc::atomicCas(acc, &readyFlag[0u], 0u, 0u)
with a bounded polling loop that tries a fixed number of iterations (or polls
with a timeout), and if the loop exhausts without success increment
mismatchCounter to report the timeout and break out so the test fails instead of
hanging; ensure you reference and update the same readyFlag index and use
mismatchCounter to record the failure.
| std::transform( | ||
| input.data(), | ||
| input.data() + input.getExtents().x(), | ||
| outPtr, | ||
| [=](float value) { return scale * value + shift; }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify current element-count logic and related usage context.
rg -n -C2 'input\.getExtents\(\)\.x\(\)' docs/snippets/example/38_vendorInterop.cpp
rg -n -C2 'AffineTransform::call\(' docs/snippets/example
rg -n -C2 'fnDispatch\(' docs/snippets/example/38_vendorInterop.cppRepository: psychocoderHPC/alpaka3
Length of output: 1208
🏁 Script executed:
cat -n docs/snippets/example/38_vendorInterop.cpp | sed -n '60,85p'Repository: psychocoderHPC/alpaka3
Length of output: 1023
🏁 Script executed:
cat -n docs/snippets/example/38_vendorInterop.cpp | sed -n '40,60p'Repository: psychocoderHPC/alpaka3
Length of output: 844
🏁 Script executed:
cat -n docs/snippets/example/38_vendorInterop.cpp | sed -n '85,110p'Repository: psychocoderHPC/alpaka3
Length of output: 1164
🏁 Script executed:
# Check if there are any other 2D/3D usage examples in the file
rg -n 'getExtents\(\)' docs/snippets/example/38_vendorInterop.cpp
# Also check alpaka::onHost::transform to understand how it handles multidimensional spans
rg -n 'onHost::transform' docs/snippets/example/38_vendorInterop.cppRepository: psychocoderHPC/alpaka3
Length of output: 165
Host dispatch truncates multidimensional inputs.
The function at line 63-81 accepts any alpaka::concepts::IMdSpan (including 2D/3D), but at line 77 it only processes input.getExtents().x() elements, leaving the remaining dimensions untransformed. The Alpaka fallback at lines 44-57 correctly handles multidimensional spans via alpaka::onHost::transform. Constrain this overload to 1D spans explicitly or compute the total element count from all extents before std::transform.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/snippets/example/38_vendorInterop.cpp` around lines 75 - 79, The
host-path currently uses std::transform over input.getExtents().x(), which
truncates multidimensional alpaka::concepts::IMdSpan inputs; either restrict
this overload to 1D spans or compute the full element count by multiplying all
dimensions from input.getExtents() and use that count (and appropriate pointer
arithmetic) in std::transform; locate the host dispatch that uses input,
input.getExtents().x(), outPtr and the lambda and change it to validate/require
a 1D IMdSpan or replace input.getExtents().x() with the product of all extents
(or call alpaka::onHost::transform like the fallback) so all elements are
transformed.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
docs/snippets/example/16_sharedMemory.cpp (1)
149-157: Avoid assert-only sizing logic for dynamic shared memory.At Line 154, divisibility is enforced only via
assert. In release builds, this can under-allocate shared memory when extents aren’t evenly divisible by block count, which can cascade into OOB accesses inDynamicScaleKernel.Proposed hardening
uint32_t operator()(auto const executor, auto const& out, auto const& in, int factor) const { alpaka::unused(executor, out, in, factor); auto const totalCachedElements = in.getExtents().product(); auto const numBlocks = m_spec.getNumBlocks().product(); - assert(totalCachedElements % numBlocks == 0u); - auto const cachedFrameExtent = totalCachedElements / numBlocks; + if(numBlocks == 0u) + return 0u; + assert(totalCachedElements % numBlocks == 0u); + auto const cachedFrameExtent = (totalCachedElements + numBlocks - 1u) / numBlocks; return static_cast<uint32_t>(cachedFrameExtent * sizeof(int)); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/snippets/example/16_sharedMemory.cpp` around lines 149 - 157, The sizing currently relies on an assert inside operator()(...) which is removed in release builds and can under-allocate dynamic shared memory; replace the assert-based divisibility assumption with a safe runtime computation: compute numBlocks = m_spec.getNumBlocks().product() and cachedFrameExtent = (totalCachedElements + numBlocks - 1) / numBlocks (i.e., round up) to guarantee enough shared memory, and optionally check for numBlocks == 0 and handle/throw/log; update the return to use the rounded-up cachedFrameExtent * sizeof(int) so DynamicScaleKernel never under-allocates.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@docs/snippets/example/16_sharedMemory.cpp`:
- Around line 149-157: The sizing currently relies on an assert inside
operator()(...) which is removed in release builds and can under-allocate
dynamic shared memory; replace the assert-based divisibility assumption with a
safe runtime computation: compute numBlocks = m_spec.getNumBlocks().product()
and cachedFrameExtent = (totalCachedElements + numBlocks - 1) / numBlocks (i.e.,
round up) to guarantee enough shared memory, and optionally check for numBlocks
== 0 and handle/throw/log; update the return to use the rounded-up
cachedFrameExtent * sizeof(int) so DynamicScaleKernel never under-allocates.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 5bc28376-006a-4887-aaa5-a8b5a169d4d1
📒 Files selected for processing (38)
docs/snippets/example/06_queue.cppdocs/snippets/example/14_algorithms.cppdocs/snippets/example/16_sharedMemory.cppdocs/snippets/example/28_chunkedFrames.cppdocs/source/advanced/cmake.rstdocs/source/advanced/datastorage.rstdocs/source/basic/cheatsheet.rstdocs/source/basic/example.rstdocs/source/basic/install.rstdocs/source/basic/library.rstdocs/source/basic/terms.rstdocs/source/contribution/sphinx.rstdocs/source/contribution/tools.rstdocs/source/dev/logging.rstdocs/source/tutorial/algorithms.rstdocs/source/tutorial/atomics.rstdocs/source/tutorial/backendDifferences.rstdocs/source/tutorial/chunked.rstdocs/source/tutorial/device.rstdocs/source/tutorial/events.rstdocs/source/tutorial/execution.rstdocs/source/tutorial/hierarchy.rstdocs/source/tutorial/intrinsics.rstdocs/source/tutorial/kernel.rstdocs/source/tutorial/math.rstdocs/source/tutorial/memFence.rstdocs/source/tutorial/memory.rstdocs/source/tutorial/memoryOperations.rstdocs/source/tutorial/miniProject.rstdocs/source/tutorial/multidim.rstdocs/source/tutorial/portingKernel.rstdocs/source/tutorial/queue.rstdocs/source/tutorial/random.rstdocs/source/tutorial/sharedMemory.rstdocs/source/tutorial/vector.rstdocs/source/tutorial/vendorInterop.rstdocs/source/tutorial/views.rstdocs/source/tutorial/warp.rst
💤 Files with no reviewable changes (4)
- docs/source/contribution/sphinx.rst
- docs/source/dev/logging.rst
- docs/source/contribution/tools.rst
- docs/source/basic/library.rst
✅ Files skipped from review due to trivial changes (28)
- docs/source/basic/example.rst
- docs/source/basic/install.rst
- docs/source/basic/cheatsheet.rst
- docs/source/tutorial/device.rst
- docs/source/tutorial/queue.rst
- docs/source/tutorial/vector.rst
- docs/source/advanced/datastorage.rst
- docs/source/tutorial/events.rst
- docs/source/tutorial/miniProject.rst
- docs/source/tutorial/multidim.rst
- docs/source/tutorial/hierarchy.rst
- docs/source/tutorial/atomics.rst
- docs/source/tutorial/math.rst
- docs/source/tutorial/warp.rst
- docs/source/tutorial/backendDifferences.rst
- docs/source/tutorial/views.rst
- docs/source/tutorial/vendorInterop.rst
- docs/source/tutorial/memFence.rst
- docs/source/tutorial/chunked.rst
- docs/source/tutorial/execution.rst
- docs/source/tutorial/kernel.rst
- docs/source/tutorial/sharedMemory.rst
- docs/source/tutorial/portingKernel.rst
- docs/source/tutorial/algorithms.rst
- docs/source/tutorial/intrinsics.rst
- docs/source/tutorial/random.rst
- docs/source/advanced/cmake.rst
- docs/source/tutorial/memoryOperations.rst
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/source/basic/terms.rst
- docs/source/tutorial/memory.rst
Summary by CodeRabbit
New Features
Documentation
Chores