Fix CL_INVALID_WORK_ITEM_SIZE for "reduce" test#143
Fix CL_INVALID_WORK_ITEM_SIZE for "reduce" test#143bashbaug merged 2 commits intoKhronosGroup:mainfrom
Conversation
bashbaug
left a comment
There was a problem hiding this comment.
Interesting. Are you able to disclose which device needs this change?
Regardless, could you please make the same change to the C++ version of this sample also, in main.cpp? I think it would have the same problem. Thanks!
The work-group size (WGS) may exceed the maximum allowed per-dimension limit reported by CL_DEVICE_MAX_WORK_ITEM_SIZES[0], leading to clEnqueueNDRangeKernel failing with CL_INVALID_WORK_GROUP_SIZE. This patch queries the device's max work-item sizes and clamps WGS to the valid maximum for the x-dimension. This ensures correct behavior on devices with smaller limits and improves portability. Signed-off-by: Xin Jin <xin.jin@arm.com>
The work-group size (WGS) may exceed the maximum allowed per-dimension limit reported by CL_DEVICE_MAX_WORK_ITEM_SIZES[0], leading to clEnqueueNDRangeKernel failing with CL_INVALID_WORK_GROUP_SIZE. The reduce sample launches kernels with a 1-D NDRange over a flat array of integers, so only the x-dimension limit is relevant. This patch queries the device's max work-item sizes and clamps WGS to the valid maximum for the x-dimension. This ensures correct behavior on devices with smaller limits and improves portability. Signed-off-by: Xin Jin <xin.jin@arm.com>
007315a to
4728311
Compare
Thanks for the review! and sorry for the delay: This issue was observed on a device where the requested WGS exceeded the per-dimension limit (CL_DEVICE_MAX_WORK_ITEM_SIZES[0]). To improve portability across all devices, the fix clamps the WGS to the valid maximum, independent of the specific device. I’ve also applied the same change to the C++ version (reduce.cpp), as suggested. This is included in the second commit of this PR. |
bashbaug
left a comment
There was a problem hiding this comment.
Thanks, sorry for taking so long! LGTM.
This PR fixes issues in the
reducesample where the chosen work-group size (WGS)could exceed the device’s per-dimension maximum (
CL_DEVICE_MAX_WORK_ITEM_SIZES[0]),causing
clEnqueueNDRangeKernelto fail withCL_INVALID_WORK_GROUP_SIZE.The
reducesample has two implementations (C and C++), and both required fixes:clamping against the device’s maximum.
reduce.cppapplication, ensuring consistent behavior.
Together, these patches make both variants of the
reducesample more robust andportable across a wider range of OpenCL devices.
Signed-off-by: Xin Jin xin.jin@arm.com