feat(qwen3, cpu): add support for Qwen3 model on x86 architecture by HayzelHan · Pull Request #561 · UbiquitousLearning/mllm

HayzelHan · 2025-12-05T14:36:17Z

Update Qwen3 model implementation with FlashAttention2 support and enable compilation options
Fix rmsnorm and softmax operators and refine quantization parameters
Add support for elementwise operator
Add NYI messages for unsupported operations

Summary by CodeRabbit

New Features
- Enabled additional example models (Qwen3, Qwen2 variants, LLaMA, DeepSeek OCR).
- Added runtime-dispatched FP32 elementwise operations (vector and scalar) with x86 support.
Improvements
- Added x86 build/test workflow.
- Optimized elementwise and softmax/rmsnorm memory access for unaligned loads/stores and x86 paths.
- Introduced x86-specific execution paths and NYI fallbacks where applicable.
Documentation
- Added comprehensive x86 CPU backend guide.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-05T14:36:30Z

Walkthrough

Introduces HWY-based dynamic dispatch and x86-focused FP32 element-wise kernels (vector and scalar), adds kernel dispatch headers/impl, switches some x86 memory accesses to unaligned variants, updates quantize FP16 type, gates several ops by architecture, enables example builds, and adds an x86 CI workflow.

Changes

Cohort / File(s)	Summary
Build System / Examples `examples/CMakeLists.txt`, `examples/qwen3/main.cpp`	Uncommented several example subdirs and switched Qwen3 include from `modeling_qwen3.hpp` to `modeling_qwen3_fa2.hpp`.
Kernel Dispatch API `mllm/backends/cpu/kernels/common/kernel_dispatch.hpp`, `mllm/backends/cpu/kernels/common/kernel_dispatch.cpp`	New HWY dynamic-dispatch header and implementation exposing/exporting FP32 elementwise vector and scalar dispatch wrappers (`call_elewise_*_fp32`) in `mllm::cpu::common`.
Element-wise Implementations `mllm/backends/cpu/kernels/common/elewise-inl.hpp`	Renamed internal routines, added FP32 out-parameter elementwise functions and scalar-right-hand-side implementations and operator functors; added FP32 scalar wrappers.
Kernel Includes `mllm/backends/cpu/kernels/Kernels.hpp`	Added includes for `kernel_dispatch.hpp` in platform-free and other include regions.
Elewise Dispatch/Usage `mllm/backends/cpu/ops/ElewiseOps.cpp`	Added x86/x86_64 branches preferring HWY-accelerated FP32 elementwise and scalar paths, with ARM as fallback; NYI stubs for several ops on x86.
Kernel Integration `mllm/backends/cpu/kernels/common/*`	Integrated `elewise-inl.hpp` via the new dispatch mechanism; influenced kernel compilation per-target.
x86 Memory Access `mllm/backends/cpu/kernels/x86/rmsnorm.cpp`, `mllm/backends/cpu/kernels/x86/softmax.cpp`	Replaced aligned Load/Store intrinsics with unaligned variants (LoadU/StoreU).
Quantize Type Update `mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp`	Changed `lookup_fp16_to_fp32` parameter from `uint16_t` to `mllm_fp16_t`.
Linear & Transpose Ops `mllm/backends/cpu/ops/LinearOp.cpp`, `mllm/backends/cpu/ops/TransposeOp.cpp`	Added architecture guards: x86 emits NYI for some impl types/variants; ARM paths preserved.
CI/CD `.github/workflows/build-x86.yml`	New GitHub Actions workflow to test Linux x86_64 compilation using tasks/build_x86.yaml.
Docs `docs/cpu_backend/x86/index.rst`	Added comprehensive documentation for the X86 CPU backend (overview, build/run, performance).

Sequence Diagram(s)

sequenceDiagram
    participant Op as ElewiseOps (caller)
    participant Disp as kernel_dispatch (call_elewise_*_fp32)
    participant Impl as HWY target impl
    participant Mem as Memory/input buffers

    Op->>Disp: call_elewise_add_fp32(out, x, y, n)
    Disp->>Impl: dispatch to selected HWY target (per-target include)
    Impl->>Mem: LoadU x/y segments
    Impl->>Impl: vectorized compute (add / scalar op)
    Impl->>Mem: StoreU results to out
    Disp-->>Op: return

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas needing extra attention:
- elewise-inl.hpp: scalar operator correctness, tail handling, and API changes (out-parameter semantics).
- kernel_dispatch.*: HWY per-target re-inclusion pattern and exported symbol correctness.
- Call sites across the codebase: ensure all callers use new call_elewise_*_fp32 signatures.
- Architecture guards in ElewiseOps.cpp, LinearOp.cpp, TransposeOp.cpp and NYI placement.
- Softmax/rmsnorm LoadU/StoreU changes for correctness on unaligned inputs.

Possibly related PRs

feat(cpu, smollm3-tokenizer): add KAI SGEMM NEON implementation for ARM #503 — Related changes to CPU linear/BLAS paths and architecture-specific implementations; likely overlaps in architecture guards and impl-type handling.

Suggested reviewers

xumengwei
liang1232018
yirongjie
oreomaker

Poem

🐇 Hop, hop — a highway made of code,

Vectors dance where scalars strode,
Dispatch calls, the kernels hum,
x86 joins, the rabbits drum,
Build and run — the SIMD road.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Linked Issues check	❓ Inconclusive	PR 561 references changes aligned with adding x86 architecture support, but no explicit issue links or tracking references are documented.	Link relevant issues or epic tracking x86 architecture support to improve traceability and context.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main feature addition: Qwen3 model support on x86 architecture, which is the primary focus of this changeset.
Description check	✅ Passed	The description provides a clear bullet-point summary of key changes (Qwen3 with FlashAttention2, operator fixes, elementwise support, NYI messages), but lacks the structured template format recommended in the repository guidelines.
Out of Scope Changes check	✅ Passed	Changes are well-scoped: Qwen3 model updates, operator fixes, elementwise support, and x86 compilation enablement align with the stated PR objectives.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

mllm/backends/cpu/ops/TransposeOp.cpp (3)
26-53: Align x86 behaviour and int64 handling across transpose special-cases

The new x86/x86_64 branches for kFloat32/kFloat16 in CASE 1–3 correctly surface unsupported patterns via NYI, while ARM keeps using the optimized kernels. However, for kInt64 in these cases there is still only an ARM #if and no explicit x86 handling. On an x86 build, those kInt64 cases effectively become:
case kInt64: {
  // no call
  break;
}
which silently turns the transpose into a no-op for int64 tensors instead of failing fast.

To avoid silent data corruption when an int64 transpose is accidentally used on x86, consider adding explicit x86 branches (or wrapping the whole case kInt64 in the same #if/#elif structure) so that x86 either:

Calls a generic non-ARM fallback, or

Emits a clear NYI("Transpose op(...) int64 not supported in x86").

This same pattern applies to CASE 1 (HW→WH), CASE 2 (BSHD→BHSD) and CASE 3 (last-two-dims) kInt64 branches.

Also applies to: 55-85, 87-121

87-91: Fix batch accumulation bug in CASE 3 (last-dims transpose)

In CASE 3 you compute batch as:
int batch = 0;
for (int i = 0; i < input_shape.size() - 2; i++) { batch *= input_shape[i]; }
Because batch starts at 0 and is only multiplied, batch is always 0, so the transpose_last_dims_* kernels will never process any batch element. This makes the entire CASE 3 path a no-op for all dtypes/architectures.

Initialize batch to 1 instead so it correctly accumulates the product of the leading dimensions, e.g.:
-    int batch = 0;
-    for (int i = 0; i < input_shape.size() - 2; i++) { batch *= input_shape[i]; }
+    int batch = 1;
+    for (int i = 0; i < input_shape.size() - 2; i++) { batch *= input_shape[i]; }
This is independent of the existing FIXME about (dim0 + dim1 == 1) and should be fixed regardless.

123-153: General permute CASE 4 is a silent no-op on x86; add explicit x86 handling / NYI

In CASE 4 (general permute), the dtype branches are guarded only by ARM macros, for example:
case kFloat32: {
#if defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
  arm::permute_fp32(...);
#endif
  break;
}
On x86/x86_64, the #if body is compiled out, so the code effectively just hits break; with no kernel call and no NYI, meaning the operation becomes a silent no-op and leaves output unpermuted. That’s more dangerous than the explicit NYI behaviour you added in CASE 1–3.

To align behaviour and avoid silent incorrect results on x86, consider restructuring these branches similar to CASE 1–3, e.g.:
       case kFloat32: {
-#if defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
-        arm::permute_fp32(input.ptr<mllm_fp32_t>(), output.ptr<mllm_fp32_t>(), input_shape.data(), permute_axis.data(),
-                          permute_axis.size());
-#endif
+ #if defined(MLLM_HOST_ARCH_X86_64) || defined(MLLM_HOST_ARCH_X86)
+        NYI("Transpose op(general permute) fp32 not supported in x86");
+ #elif defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
+        arm::permute_fp32(input.ptr<mllm_fp32_t>(), output.ptr<mllm_fp32_t>(), input_shape.data(), permute_axis.data(),
+                          permute_axis.size());
+ #endif
         break;
       }
       case kFloat16: {
-#if defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
-        arm::permute_fp16(input.ptr<mllm_fp16_t>(), output.ptr<mllm_fp16_t>(), input_shape.data(), permute_axis.data(),
-                          permute_axis.size());
-#endif
+ #if defined(MLLM_HOST_ARCH_X86_64) || defined(MLLM_HOST_ARCH_X86)
+        NYI("Transpose op(general permute) fp16 not supported in x86");
+ #elif defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
+        arm::permute_fp16(input.ptr<mllm_fp16_t>(), output.ptr<mllm_fp16_t>(), input_shape.data(), permute_axis.data(),
+                          permute_axis.size());
+ #endif
         break;
       }
       case kInt64: {
-#if defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
-        arm::permute_generic<mllm_int64_t>(input.ptr<mllm_int64_t>(), output.ptr<mllm_int64_t>(), input_shape.data(),
-                                           permute_axis.data(), permute_axis.size());
-#endif
+ #if defined(MLLM_HOST_ARCH_X86_64) || defined(MLLM_HOST_ARCH_X86)
+        NYI("Transpose op(general permute) int64 not supported in x86");
+ #elif defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
+        arm::permute_generic<mllm_int64_t>(input.ptr<mllm_int64_t>(), output.ptr<mllm_int64_t>(), input_shape.data(),
+                                           permute_axis.data(), permute_axis.size());
+ #endif
         break;
       }
(or route x86 to a generic non-ARM permute implementation if/when one exists). This keeps x86 semantics explicit and consistent with the earlier transpose special-cases.

🧹 Nitpick comments (2)

mllm/backends/cpu/ops/ElewiseOps.cpp (1)
143-157: Consider adding a fallback for unsupported platforms.

The code has branches for x86 and ARM but no fallback else clause. On platforms that are neither x86 nor ARM (e.g., RISC-V, WebAssembly), these operations will silently do nothing, leading to incorrect results without any warning.

Consider adding a fallback path or NYI message:
 #elif defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
         cpu::arm::ew_add_fp32(output.ptr<mllm_fp32_t>(), input0.ptr<mllm_fp32_t>(), input1.ptr<mllm_fp32_t>(), output.numel(),
                               options_.getThreads());
+#else
+        NYI("AddOp not supported on this platform.");
 #endif
mllm/backends/cpu/kernels/common/elewise-inl.hpp (1)
113-139: Consider reusing existing operator structs to eliminate duplication.

The scalar operator structs (AddScalarOp, SubScalarOp, MulScalarOp, DivScalarOp) are identical to the non-scalar operators (AddOp, SubOp, MulOp, DivOp) defined at lines 39-65. Since the operators work on vector types and don't distinguish between scalar-broadcast and element-wise operations, you could directly reuse the existing operators.

For example, update the scalar functions to reuse existing operators:
-template<typename T>
-HWY_NOINLINE HWY_MAYBE_UNUSED void element_wise_add_scalar(T* out, const T* x, const T* y, size_t n) {
-  __elementwise_scalar(out, x, y, n, AddScalarOp{});
-}
+template<typename T>
+HWY_NOINLINE HWY_MAYBE_UNUSED void element_wise_add_scalar(T* out, const T* x, const T* y, size_t n) {
+  __elementwise_scalar(out, x, y, n, AddOp{});
+}
Then remove the duplicate scalar operator structs (lines 113-139).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e917a70 and 2eb1586.

📒 Files selected for processing (11)

examples/CMakeLists.txt (1 hunks)
examples/qwen3/main.cpp (1 hunks)
mllm/backends/cpu/kernels/Kernels.hpp (1 hunks)
mllm/backends/cpu/kernels/common/elewise-inl.hpp (1 hunks)
mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp (1 hunks)
mllm/backends/cpu/kernels/x86/rmsnorm.cpp (1 hunks)
mllm/backends/cpu/kernels/x86/softmax.cpp (3 hunks)
mllm/backends/cpu/ops/ElewiseOps.cpp (11 hunks)
mllm/backends/cpu/ops/LinearOp.cpp (1 hunks)
mllm/backends/cpu/ops/TransposeOp.cpp (3 hunks)
mllm/models/qwen3/modeling_qwen3_fa2.hpp (1 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp
mllm/backends/cpu/ops/ElewiseOps.cpp
mllm/backends/cpu/kernels/Kernels.hpp
mllm/models/qwen3/modeling_qwen3_fa2.hpp
mllm/backends/cpu/kernels/x86/rmsnorm.cpp
mllm/backends/cpu/kernels/common/elewise-inl.hpp
mllm/backends/cpu/ops/LinearOp.cpp
mllm/backends/cpu/ops/TransposeOp.cpp
mllm/backends/cpu/kernels/x86/softmax.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp
mllm/backends/cpu/ops/ElewiseOps.cpp
mllm/backends/cpu/kernels/Kernels.hpp
mllm/models/qwen3/modeling_qwen3_fa2.hpp
mllm/backends/cpu/kernels/x86/rmsnorm.cpp
mllm/backends/cpu/kernels/common/elewise-inl.hpp
mllm/backends/cpu/ops/LinearOp.cpp
mllm/backends/cpu/ops/TransposeOp.cpp
mllm/backends/cpu/kernels/x86/softmax.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp
mllm/backends/cpu/ops/ElewiseOps.cpp
mllm/backends/cpu/kernels/Kernels.hpp
mllm/models/qwen3/modeling_qwen3_fa2.hpp
mllm/backends/cpu/kernels/x86/rmsnorm.cpp
mllm/backends/cpu/kernels/common/elewise-inl.hpp
mllm/backends/cpu/ops/LinearOp.cpp
mllm/backends/cpu/ops/TransposeOp.cpp
mllm/backends/cpu/kernels/x86/softmax.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/cpu/ops/ElewiseOps.cpp
mllm/backends/cpu/kernels/x86/rmsnorm.cpp
mllm/backends/cpu/ops/LinearOp.cpp
mllm/backends/cpu/ops/TransposeOp.cpp
mllm/backends/cpu/kernels/x86/softmax.cpp

🧬 Code graph analysis (3)

mllm/backends/cpu/ops/ElewiseOps.cpp (2)

mllm/backends/cpu/kernels/common/elewise-inl.hpp (8)

element_wise_add (68-68)

element_wise_add_scalar (142-142)

element_wise_sub (73-73)

element_wise_sub_scalar (147-147)

element_wise_mul (78-78)

element_wise_mul_scalar (152-152)

element_wise_div (83-83)

element_wise_div_scalar (157-157)

mllm/backends/cpu/kernels/arm/elementwise.hpp (8)

ew_add_fp32 (262-263)

ew_add_fp32_scalar (324-325)

ew_sub_fp32 (265-266)

ew_sub_fp32_scalar (327-328)

ew_mul_fp32 (268-269)

ew_mul_fp32_scalar (330-331)

ew_div_fp32 (271-272)

ew_div_fp32_scalar (333-334)

mllm/backends/cpu/ops/TransposeOp.cpp (1)

mllm/backends/cpu/kernels/arm/transpose.cpp (6)

transpose_hw_wh_fp32 (13-46)

transpose_hw_wh_fp32 (13-13)

transpose_bshd_bhsd_fp32 (48-73)

transpose_bshd_bhsd_fp32 (48-49)

transpose_last_dims_fp32 (160-200)

transpose_last_dims_fp32 (160-161)

mllm/backends/cpu/kernels/x86/softmax.cpp (1)

mllm/backends/cpu/kernels/x86/math.hpp (1)

vexpq_fast_f32 (19-19)

🪛 Clang (14.0.6)

mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp

[error] 114-114: parameter name 'f' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/cpu/kernels/common/elewise-inl.hpp

[error] 68-68: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 68-68: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 68-68: 2 adjacent parameters of 'element_wise_add' of similar type ('const int *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 68-68: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 68-68: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 73-73: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 73-73: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 73-73: 2 adjacent parameters of 'element_wise_sub' of similar type ('const int *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 73-73: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 73-73: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 78-78: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 78-78: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 78-78: 2 adjacent parameters of 'element_wise_mul' of similar type ('const int *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 78-78: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 78-78: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 83-83: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 83-83: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 83-83: 2 adjacent parameters of 'element_wise_div' of similar type ('const int *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 83-83: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 83-83: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 92-92: declaration uses identifier '__elementwise_scalar', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)

[error] 93-93: variable 'd' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 93-93: variable name 'd' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 94-94: variable 'N' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 94-94: variable name 'N' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 94-94: invalid case style for variable 'N'

(readability-identifier-naming,-warnings-as-errors)

[error] 95-95: variable 'idx' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 98-98: variable 'sVec' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 98-98: invalid case style for variable 'sVec'

(readability-identifier-naming,-warnings-as-errors)

[error] 107-107: variable 'vx' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 107-107: variable name 'vx' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 108-108: variable 'result' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 142-142: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 142-142: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 142-142: 2 adjacent parameters of 'element_wise_add_scalar' of similar type ('const int *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 142-142: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 142-142: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 147-147: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 147-147: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 147-147: 2 adjacent parameters of 'element_wise_sub_scalar' of similar type ('const int *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 147-147: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 147-147: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 152-152: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 152-152: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 152-152: 2 adjacent parameters of 'element_wise_mul_scalar' of similar type ('const int *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 152-152: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 152-152: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 157-157: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 157-157: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 157-157: 2 adjacent parameters of 'element_wise_div_scalar' of similar type ('const int *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 157-157: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 157-157: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

🔇 Additional comments (13)

mllm/backends/cpu/ops/LinearOp.cpp (1)
293-309: Architecture-guarded NYI for x86 is acceptable, but verify default fallback behavior.

The explicit NYI for x86 is a reasonable approach to mark unsupported implementations. However, note that when MLLM_USE_BLAS is not defined and K < 4 (lines 119-124), kDefault auto-selects kMllmBlas, which will hit this NYI on x86 at runtime.

Consider whether x86 without BLAS should default to kGGUF regardless of K size to avoid the unexpected NYI:
 #else
     if (K >= 4) {
       impl_type = aops::LinearImplTypes::kGGUF;
-    } else
-    // All fallback to mllm blas
-    {
+    } else {
+#if defined(MLLM_HOST_ARCH_X86_64) || defined(MLLM_HOST_ARCH_X86)
+      impl_type = aops::LinearImplTypes::kGGUF;  // x86 lacks kMllmBlas support
+#else
       impl_type = aops::LinearImplTypes::kMllmBlas;
+#endif
     }
 #endif
Alternatively, if hitting NYI for small-K x86 scenarios is intentional as a known limitation, this is fine as-is.
examples/CMakeLists.txt (1)

8-9: LGTM!

Enabling qwen3 and qwen3_service build targets aligns with the PR objective to add Qwen3 model support on x86 architecture.

mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp (1)

114-127: Signature update looks correct.

The parameter type change from uint16_t to mllm_fp16_t improves type consistency with the FP16 abstraction used elsewhere in the codebase.

Note: The lazy initialization of table_f32_f16 (lines 115-122) has a pre-existing race condition if called from multiple threads simultaneously. Consider using std::call_once or static initialization if thread safety becomes a concern.

examples/qwen3/main.cpp (1)

4-4: LGTM!

Header updated to use the FlashAttention2 variant (modeling_qwen3_fa2.hpp), aligning with the PR objective to add FA2 support for Qwen3.

mllm/backends/cpu/kernels/Kernels.hpp (1)

40-40: LGTM!

Adding the elewise-inl.hpp include for non-ARM platforms enables the HWY-accelerated element-wise operations used by the x86 path in ElewiseOps.cpp.

mllm/backends/cpu/kernels/x86/softmax.cpp (1)

38-65: LGTM!

Switching to LoadU/StoreU for unaligned memory access is the correct approach when alignment of input/output buffers cannot be guaranteed. This improves robustness and prevents potential segfaults on x86 when data is not aligned to SIMD register boundaries.

mllm/backends/cpu/kernels/x86/rmsnorm.cpp (1)

30-45: LGTM!

Consistent with the softmax changes, switching to unaligned LoadU/StoreU operations ensures robustness for input buffers that may not be aligned to SIMD boundaries. Both the add_unit_offset branches are updated consistently.

mllm/models/qwen3/modeling_qwen3_fa2.hpp (1)

262-266: Verify model weight compatibility with the new registration name.

The registration name changed from "lm_head_out" to "lm_head". This is a more conventional name, but it's a breaking change for any existing model weight files that use the old key name.

Ensure that model weight files for Qwen3 use "lm_head" as the weight key, or that the model loader handles this mapping appropriately. If existing weights use "lm_head_out", they will fail to load correctly with this change.

mllm/backends/cpu/ops/ElewiseOps.cpp (2)

143-149: LGTM!

The x86 HWY-accelerated element-wise add implementation correctly uses the element_wise_add function from elewise-inl.hpp with the proper out-of-place parameter ordering (out, x, y, n).

847-851: Good use of NYI for unsupported operations.

Explicitly marking AbsOp (and similar ops like LogOp, ExpOp, ClipOp, SinOp, CosOp) as not-yet-implemented on x86 is better than silent failure. These can be tracked and implemented in future work.

mllm/backends/cpu/kernels/common/elewise-inl.hpp (3)

91-111: LGTM! Scalar element-wise implementation follows Highway patterns.

The implementation correctly broadcasts the scalar value to a vector and applies the operation efficiently using Highway SIMD primitives. The tail handling with LoadN/StoreN properly handles non-multiple-of-lane counts.

142-159: LGTM! Scalar wrapper functions correctly delegate to the scalar implementation.

The four scalar element-wise wrapper functions follow the established pattern and correctly delegate to __elementwise_scalar with the appropriate operator.

68-85: Verify all call sites are updated for the new out-of-place API signature.

This change converts the element-wise operation functions from in-place to explicit out-of-place semantics, with out now as the first parameter. Confirm that all callers throughout the codebase have been updated to pass arguments in the new order: element_wise_add(out, x, y, n) instead of the previous signature. This is a breaking change that affects all usage sites.

chenghuaWang · 2025-12-07T06:41:28Z

mllm/models/qwen3/modeling_qwen3_fa2.hpp

pls using lm_head_out for BC. Arm CPU use lm_head_out.

chenghuaWang · 2025-12-07T06:43:59Z

mllm/backends/cpu/ops/ElewiseOps.cpp

+            int out_offset = batch * broadcast_naive_loops * vector_size + l * vector_size;

-#if defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
+            cpu::common::HWY_NAMESPACE::element_wise_div(out + out_offset, a + a_offset, b + b_offset, vector_size);


Is cpu::common::HWY_NAMESPACE the right way to use highway?

IIRC, We should include head file of highway operator for multiple times and use highway's dispatcher to dispatch one op.

chenghuaWang · 2025-12-07T06:45:42Z

examples/CMakeLists.txt

 #add_subdirectory(llama)
 add_subdirectory(minicpm_o)
 add_subdirectory(minicpm4)
-#add_subdirectory(qwen3)


Pls uncomment all target. I think this is involved by @KKkai0315 .

chenghuaWang · 2025-12-07T06:47:46Z

Can you add a CI/CD test for the X86 backend? Just referencing the macOS test scripts is fine.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

mllm/backends/cpu/kernels/common/elewise-inl.hpp (1)
15-34: Reserved identifier usage: __elementwise.

Identifiers starting with double underscore (__) are reserved in C++. Rename to avoid potential conflicts with compiler/library internals.
 template<typename T, typename Op>
-HWY_INLINE void __elementwise(const T* HWY_RESTRICT x, const T* HWY_RESTRICT y, T* HWY_RESTRICT out, size_t count, Op&& op) {
+HWY_INLINE void elementwise_impl(const T* HWY_RESTRICT x, const T* HWY_RESTRICT y, T* HWY_RESTRICT out, size_t count, Op&& op) {
Also update the call sites (lines 65, 69, 73, 77).

🧹 Nitpick comments (2)

mllm/backends/cpu/kernels/common/kernel_dispatch.cpp (1)

67-75: Remove or document commented-out GELU code.

Commented-out code without explanation adds noise. Either remove it or add a TODO comment explaining why it's preserved and when it will be implemented.
-//===----------------------------------------------------------------------===//
-// GELU
-//===----------------------------------------------------------------------===//
-// HWY_EXPORT(gelu_fp32);
-// 
-// HWY_DLLEXPORT void call_gelu_fp32(mllm_fp32_t* out, const mllm_fp32_t* in, size_t n) {
-//   HWY_DYNAMIC_DISPATCH(gelu_fp32)(out, in, n);
-// }
+// TODO: Add GELU dispatch when elewise-inl.hpp implements gelu_fp32

mllm/backends/cpu/kernels/common/elewise-inl.hpp (1)

106-132: Duplicate functors: *ScalarOp are identical to *Op.

AddScalarOp, SubScalarOp, MulScalarOp, and DivScalarOp have identical implementations to their non-scalar counterparts. Reuse the existing functors to reduce duplication.

-struct AddScalarOp {
-  template<class D, class V>
-  HWY_INLINE V operator()(D d, V a, V b) const {
-    return hn::Add(a, b);
-  }
-};
-
-struct SubScalarOp {
-  template<class D, class V>
-  HWY_INLINE V operator()(D d, V a, V b) const {
-    return hn::Sub(a, b);
-  }
-};
-
-struct MulScalarOp {
-  template<class D, class V>
-  HWY_INLINE V operator()(D d, V a, V b) const {
-    return hn::Mul(a, b);
-  }
-};
-
-struct DivScalarOp {
-  template<class D, class V>
-  HWY_INLINE V operator()(D d, V a, V b) const {
-    return hn::Div(a, b);
-  }
-};
-
 HWY_NOINLINE HWY_MAYBE_UNUSED void elewise_add_scalar_fp32(mllm_fp32_t* out, const mllm_fp32_t* x, const mllm_fp32_t y, size_t n) {
-  __elementwise_scalar(out, x, y, n, AddScalarOp{});
+  elementwise_scalar_impl(out, x, y, n, AddOp{});
 }
 
 HWY_NOINLINE HWY_MAYBE_UNUSED void elewise_sub_scalar_fp32(mllm_fp32_t* out, const mllm_fp32_t* x, const mllm_fp32_t y, size_t n) {
-  __elementwise_scalar(out, x, y, n, SubScalarOp{});
+  elementwise_scalar_impl(out, x, y, n, SubOp{});
 }
 
 HWY_NOINLINE HWY_MAYBE_UNUSED void elewise_mul_scalar_fp32(mllm_fp32_t* out, const mllm_fp32_t* x, const mllm_fp32_t y, size_t n) {
-  __elementwise_scalar(out, x, y, n, MulScalarOp{});
+  elementwise_scalar_impl(out, x, y, n, MulOp{});
 }
 
 HWY_NOINLINE HWY_MAYBE_UNUSED void elewise_div_scalar_fp32(mllm_fp32_t* out, const mllm_fp32_t* x, const mllm_fp32_t y, size_t n) {
-  __elementwise_scalar(out, x, y, n, DivScalarOp{});
+  elementwise_scalar_impl(out, x, y, n, DivOp{});
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2eb1586 and 8514471.

📒 Files selected for processing (7)

.github/workflows/build-x86.yml (1 hunks)
examples/CMakeLists.txt (1 hunks)
mllm/backends/cpu/kernels/Kernels.hpp (1 hunks)
mllm/backends/cpu/kernels/common/elewise-inl.hpp (2 hunks)
mllm/backends/cpu/kernels/common/kernel_dispatch.cpp (1 hunks)
mllm/backends/cpu/kernels/common/kernel_dispatch.hpp (1 hunks)
mllm/backends/cpu/ops/ElewiseOps.cpp (11 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

examples/CMakeLists.txt
mllm/backends/cpu/kernels/Kernels.hpp

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/cpu/kernels/common/kernel_dispatch.hpp
mllm/backends/cpu/ops/ElewiseOps.cpp
mllm/backends/cpu/kernels/common/elewise-inl.hpp
mllm/backends/cpu/kernels/common/kernel_dispatch.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/cpu/kernels/common/kernel_dispatch.hpp
mllm/backends/cpu/ops/ElewiseOps.cpp
mllm/backends/cpu/kernels/common/elewise-inl.hpp
mllm/backends/cpu/kernels/common/kernel_dispatch.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/cpu/kernels/common/kernel_dispatch.hpp
mllm/backends/cpu/ops/ElewiseOps.cpp
mllm/backends/cpu/kernels/common/elewise-inl.hpp
mllm/backends/cpu/kernels/common/kernel_dispatch.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/cpu/ops/ElewiseOps.cpp
mllm/backends/cpu/kernels/common/kernel_dispatch.cpp

🧬 Code graph analysis (3)

mllm/backends/cpu/kernels/common/kernel_dispatch.hpp (2)

mllm/backends/cpu/kernels/common/elewise-inl.hpp (1)

void (16-34)

mllm/backends/cpu/kernels/common/kernel_dispatch.cpp (8)

void (35-37)

void (39-41)

void (43-45)

void (47-49)

void (51-53)

void (55-57)

void (59-61)

void (63-65)

mllm/backends/cpu/kernels/common/elewise-inl.hpp (1)

mllm/backends/cpu/kernels/common/kernel_dispatch.cpp (8)

void (35-37)

void (39-41)

void (43-45)

void (47-49)

void (51-53)

void (55-57)

void (59-61)

void (63-65)

mllm/backends/cpu/kernels/common/kernel_dispatch.cpp (1)

mllm/backends/cpu/kernels/common/elewise-inl.hpp (9)

elewise_add_fp32 (64-64)

elewise_sub_fp32 (68-68)

elewise_mul_fp32 (72-72)

elewise_div_fp32 (76-76)

elewise_add_scalar_fp32 (134-134)

elewise_sub_scalar_fp32 (138-138)

elewise_mul_scalar_fp32 (142-142)

elewise_div_scalar_fp32 (146-146)

void (16-34)

🪛 actionlint (1.7.9)

.github/workflows/build-x86.yml

39-39: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 Clang (14.0.6)

mllm/backends/cpu/kernels/common/elewise-inl.hpp

[error] 64-64: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 64-64: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 64-64: 2 adjacent parameters of 'elewise_add_fp32' of similar type ('const int *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 64-64: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 64-64: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 68-68: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 68-68: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 68-68: 2 adjacent parameters of 'elewise_sub_fp32' of similar type ('const int *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 68-68: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 68-68: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 72-72: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 72-72: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 72-72: 2 adjacent parameters of 'elewise_mul_fp32' of similar type ('const int *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 72-72: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 72-72: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 76-76: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 76-76: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 76-76: 2 adjacent parameters of 'elewise_div_fp32' of similar type ('const int *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 76-76: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 76-76: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 85-85: declaration uses identifier '__elementwise_scalar', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)

[error] 86-86: variable 'd' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 86-86: variable name 'd' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 87-87: variable 'N' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 87-87: variable name 'N' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 87-87: invalid case style for variable 'N'

(readability-identifier-naming,-warnings-as-errors)

[error] 88-88: variable 'idx' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 91-91: variable 'sVec' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 91-91: invalid case style for variable 'sVec'

(readability-identifier-naming,-warnings-as-errors)

[error] 100-100: variable 'vx' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 100-100: variable name 'vx' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 101-101: variable 'result' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 134-134: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 134-134: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 134-134: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 134-134: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 138-138: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 138-138: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 138-138: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 138-138: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 142-142: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 142-142: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 142-142: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 142-142: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 146-146: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 146-146: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 146-146: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 146-146: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/cpu/kernels/common/kernel_dispatch.cpp

[error] 12-12: macro 'HWY_TARGET_INCLUDE' used to declare a constant; consider using a 'constexpr' constant

(cppcoreguidelines-macro-usage,-warnings-as-errors)

🔇 Additional comments (7)

.github/workflows/build-x86.yml (1)

1-51: Workflow structure and design are solid.

The CI/CD workflow properly addresses the reviewer's request for X86 backend compilation testing. The path-based triggering, repository guard, and build steps are well-configured. Python 3.12 with pip caching is appropriate for build speed.

mllm/backends/cpu/ops/ElewiseOps.cpp (3)

143-149: x86 HWY dispatch integration looks correct.

The dynamic dispatch pattern using cpu::common::call_elewise_*_fp32 is appropriate for Highway's runtime dispatch mechanism. This addresses the previous review comment about proper Highway usage.

144-148: x86 implementation lacks thread count parameter.

The ARM implementations accept options_.getThreads() for multi-threading, but the x86 HWY-based implementations don't expose threading control. Verify this is intentional or consider adding threading support for parity.

847-851: NYI placeholders for x86 are appropriate.

Clear indication that these operations need x86 implementations. Consider adding TODO comments with tracking issue references if these are planned for near-term implementation.

mllm/backends/cpu/kernels/common/kernel_dispatch.hpp (1)

1-28: Clean header structure for HWY dispatch declarations.

The header is well-organized with appropriate includes, namespace usage, and function declarations. The separation of vectorized and scalar variants is clear.

mllm/backends/cpu/kernels/common/kernel_dispatch.cpp (1)

7-15: HWY dynamic dispatch pattern is correctly implemented.

The HWY_TARGET_INCLUDE and foreach_target.h pattern is the standard Highway approach for runtime CPU feature detection and dispatch. The static analysis warning about preferring constexpr is a false positive—this macro pattern is required by Highway's design.

mllm/backends/cpu/kernels/common/elewise-inl.hpp (1)

64-78: FP32 elementwise implementations look correct.

The implementations correctly use the HWY vectorized operations with proper function signatures matching the dispatch declarations.

.github/workflows/build-x86.yml

mllm/backends/cpu/kernels/common/elewise-inl.hpp

mllm/backends/cpu/ops/ElewiseOps.cpp

coderabbitai · 2025-12-10T05:35:09Z

mllm/backends/cpu/ops/ElewiseOps.cpp

+#if defined(MLLM_HOST_ARCH_X86_64) || defined(MLLM_HOST_ARCH_X86)
+        // Process each batch separately
+        for (int batch = 0; batch < batch_dims; ++batch) {
+          // Each batch processes broadcast_naive_loops iterations of vector_size elements
+          for (int l = 0; l < broadcast_naive_loops; ++l) {
+            int a_offset = batch * broadcast_naive_loops * vector_size + l * vector_size;
+            int b_offset = batch * vector_size;  // b doesn't broadcast over loops dimension
+            int out_offset = batch * broadcast_naive_loops * vector_size + l * vector_size;

-#if defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
+            cpu::common::call_elewise_add_fp32(out + out_offset, a + a_offset, b + b_offset, vector_size);
+          }
+        }


⚠️ Potential issue | 🟠 Major

Potential integer overflow in offset calculations.

When batch, broadcast_naive_loops, and vector_size are large, the expression batch * broadcast_naive_loops * vector_size can overflow int. Consider using size_t or int64_t for offset calculations.

- for (int batch = 0; batch < batch_dims; ++batch) { + for (int64_t batch = 0; batch < batch_dims; ++batch) { // Each batch processes broadcast_naive_loops iterations of vector_size elements - for (int l = 0; l < broadcast_naive_loops; ++l) { - int a_offset = batch * broadcast_naive_loops * vector_size + l * vector_size; - int b_offset = batch * vector_size; // b doesn't broadcast over loops dimension - int out_offset = batch * broadcast_naive_loops * vector_size + l * vector_size; + for (int64_t l = 0; l < broadcast_naive_loops; ++l) { + size_t a_offset = batch * broadcast_naive_loops * vector_size + l * vector_size; + size_t b_offset = batch * vector_size; // b doesn't broadcast over loops dimension + size_t out_offset = batch * broadcast_naive_loops * vector_size + l * vector_size;

Note: This same issue exists in the ARM paths and in all other broadcast_naive loops (Sub, Mul, Div ops).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#if defined(MLLM_HOST_ARCH_X86_64) || defined(MLLM_HOST_ARCH_X86)

// Process each batch separately

for (int batch = 0; batch < batch_dims; ++batch) {

// Each batch processes broadcast_naive_loops iterations of vector_size elements

for (int l = 0; l < broadcast_naive_loops; ++l) {

int a_offset = batch * broadcast_naive_loops * vector_size + l * vector_size;

int b_offset = batch * vector_size; // b doesn't broadcast over loops dimension

int out_offset = batch * broadcast_naive_loops * vector_size + l * vector_size;

#if defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)

cpu::common::call_elewise_add_fp32(out + out_offset, a + a_offset, b + b_offset, vector_size);

}

}

#if defined(MLLM_HOST_ARCH_X86_64) || defined(MLLM_HOST_ARCH_X86)

// Process each batch separately

for (int64_t batch = 0; batch < batch_dims; ++batch) {

// Each batch processes broadcast_naive_loops iterations of vector_size elements

for (int64_t l = 0; l < broadcast_naive_loops; ++l) {

size_t a_offset = batch * broadcast_naive_loops * vector_size + l * vector_size;

size_t b_offset = batch * vector_size; // b doesn't broadcast over loops dimension

size_t out_offset = batch * broadcast_naive_loops * vector_size + l * vector_size;

cpu::common::call_elewise_add_fp32(out + out_offset, a + a_offset, b + b_offset, vector_size);

}

}

- Updated qwen3 model runtime using `modeling_qwen3_fa2.hpp` - Enabled qwen3 compilation options - Fixed rmsnorm and softmax operators - Fixed parameters for quantize - Added `NYI` messages for some unsupported ops

Added support for common CPU elementwise operators (fp32)

- Replace cpu::common::HWY_NAMESPACE with proper Highway dynamic dispatcher - Create general kernel_dispatch.hpp/cpp with HWY_DYNAMIC_DISPATCH pattern

- Use lm_head_out in `modeling_qwen3_fa2.hpp` - Uncomment all compilation targets in `examples/CMakeLists.txt`

- Add GitHub Actions workflow file for Linux (Ubuntu) x86_64 build

- Fix python version in `.github/workflows/build-x86.yml` - Rename function name to avoid using reserved identifiers - Add #else fallback for unsupported architectures - Fix potential integer overflow in offset calculations

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

mllm/backends/cpu/ops/TransposeOp.cpp (1)
87-100: batch is computed incorrectly (always 0).
int batch = 0; then multiplying keeps it at 0, so arm::transpose_last_dims_* will run with batch == 0. This looks like a functional bug.
-  int batch = 0;
+  int batch = 1;
   for (int i = 0; i < input_shape.size() - 2; i++) { batch *= input_shape[i]; }
mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp (1)

114-127: Fix data race in FP16 lookup table initialization—avoid std::call_once in hot path.

The table_f32_f16_init flag has a data race: multiple threads calling lookup_fp16_to_fp32() during parallel inference can race on the flag without synchronization. Since MLLM_FP16_TO_FP32 is invoked in hot loops (vec_dot, quantization, GEMM kernels), this is a critical issue.

Do not use std::call_once for the fix—it adds atomic fences and potential mutex overhead on every call, which is unacceptable in hot paths. Instead, use one of:

Eager initialization: initialize the table at program or model load time (best for inference servers).

Atomic fast-path + mutex slow-path: load std::atomic<bool> with acquire semantics; only acquire mutex on miss.

Thread-local initialization: per-thread tables (less efficient, avoid if possible).

Also add static_assert(sizeof(mllm_fp16_t) == sizeof(uint16_t), "...") to guard the memcpy assumption. Rename parameter f to fp16 for clarity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8514471 and bd046b9.

📒 Files selected for processing (14)

.github/workflows/build-x86.yml (1 hunks)
docs/cpu_backend/x86/index.rst (1 hunks)
examples/CMakeLists.txt (1 hunks)
examples/qwen3/main.cpp (1 hunks)
mllm/backends/cpu/kernels/Kernels.hpp (1 hunks)
mllm/backends/cpu/kernels/common/elewise-inl.hpp (3 hunks)
mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp (1 hunks)
mllm/backends/cpu/kernels/common/kernel_dispatch.cpp (1 hunks)
mllm/backends/cpu/kernels/common/kernel_dispatch.hpp (1 hunks)
mllm/backends/cpu/kernels/x86/rmsnorm.cpp (1 hunks)
mllm/backends/cpu/kernels/x86/softmax.cpp (3 hunks)
mllm/backends/cpu/ops/ElewiseOps.cpp (11 hunks)
mllm/backends/cpu/ops/LinearOp.cpp (1 hunks)
mllm/backends/cpu/ops/TransposeOp.cpp (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

mllm/backends/cpu/ops/LinearOp.cpp
.github/workflows/build-x86.yml
mllm/backends/cpu/kernels/common/kernel_dispatch.hpp
mllm/backends/cpu/kernels/Kernels.hpp
examples/CMakeLists.txt

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp
mllm/backends/cpu/ops/TransposeOp.cpp
mllm/backends/cpu/ops/ElewiseOps.cpp
mllm/backends/cpu/kernels/x86/rmsnorm.cpp
mllm/backends/cpu/kernels/x86/softmax.cpp
mllm/backends/cpu/kernels/common/elewise-inl.hpp
mllm/backends/cpu/kernels/common/kernel_dispatch.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp
mllm/backends/cpu/ops/TransposeOp.cpp
mllm/backends/cpu/ops/ElewiseOps.cpp
mllm/backends/cpu/kernels/x86/rmsnorm.cpp
mllm/backends/cpu/kernels/x86/softmax.cpp
mllm/backends/cpu/kernels/common/elewise-inl.hpp
mllm/backends/cpu/kernels/common/kernel_dispatch.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp
mllm/backends/cpu/ops/TransposeOp.cpp
mllm/backends/cpu/ops/ElewiseOps.cpp
mllm/backends/cpu/kernels/x86/rmsnorm.cpp
mllm/backends/cpu/kernels/x86/softmax.cpp
mllm/backends/cpu/kernels/common/elewise-inl.hpp
mllm/backends/cpu/kernels/common/kernel_dispatch.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/cpu/ops/TransposeOp.cpp
mllm/backends/cpu/ops/ElewiseOps.cpp
mllm/backends/cpu/kernels/x86/rmsnorm.cpp
mllm/backends/cpu/kernels/x86/softmax.cpp
mllm/backends/cpu/kernels/common/kernel_dispatch.cpp

🧠 Learnings (2)

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Suggest adding unit tests for untested complex logic or edge cases.

Applied to files:

mllm/backends/cpu/ops/ElewiseOps.cpp

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Applied to files:

mllm/backends/cpu/ops/ElewiseOps.cpp

🧬 Code graph analysis (5)

mllm/backends/cpu/ops/TransposeOp.cpp (2)

mllm/backends/cpu/kernels/arm/transpose.hpp (3)

transpose_hw_wh_fp32 (16-16)

transpose_bshd_bhsd_fp32 (18-19)

transpose_last_dims_fp32 (21-22)

mllm/backends/cpu/kernels/arm/transpose.cpp (6)

transpose_hw_wh_fp32 (13-46)

transpose_hw_wh_fp32 (13-13)

transpose_bshd_bhsd_fp32 (48-73)

transpose_bshd_bhsd_fp32 (48-49)

transpose_last_dims_fp32 (160-200)

transpose_last_dims_fp32 (160-161)

mllm/backends/cpu/ops/ElewiseOps.cpp (2)

mllm/backends/cpu/kernels/common/kernel_dispatch.hpp (8)

call_elewise_add_fp32 (20-20)

call_elewise_add_scalar_fp32 (28-28)

call_elewise_sub_fp32 (21-21)

call_elewise_sub_scalar_fp32 (29-29)

call_elewise_mul_fp32 (22-22)

call_elewise_mul_scalar_fp32 (30-30)

call_elewise_div_fp32 (23-23)

call_elewise_div_scalar_fp32 (31-31)

mllm/backends/cpu/kernels/arm/elementwise.hpp (8)

ew_add_fp32 (262-263)

ew_add_fp32_scalar (324-325)

ew_sub_fp32 (265-266)

ew_sub_fp32_scalar (327-328)

ew_mul_fp32 (268-269)

ew_mul_fp32_scalar (330-331)

ew_div_fp32 (271-272)

ew_div_fp32_scalar (333-334)

mllm/backends/cpu/kernels/x86/softmax.cpp (1)

mllm/backends/cpu/kernels/x86/math.hpp (1)

vexpq_fast_f32 (19-19)

mllm/backends/cpu/kernels/common/elewise-inl.hpp (2)

mllm/backends/cpu/kernels/common/kernel_dispatch.cpp (8)

void (36-38)

void (40-42)

void (44-46)

void (48-50)

void (52-54)

void (56-58)

void (60-62)

void (64-66)

mllm/backends/cpu/kernels/common/gelu-inl.hpp (2)

HWY_MAYBE_UNUSED (49-69)

HWY_MAYBE_UNUSED (71-75)

mllm/backends/cpu/kernels/common/kernel_dispatch.cpp (1)

mllm/backends/cpu/kernels/common/elewise-inl.hpp (9)

elewise_add_fp32 (64-64)

elewise_sub_fp32 (68-68)

elewise_mul_fp32 (72-72)

elewise_div_fp32 (76-76)

elewise_add_scalar_fp32 (134-134)

elewise_sub_scalar_fp32 (138-138)

elewise_mul_scalar_fp32 (142-142)

elewise_div_scalar_fp32 (146-146)

void (16-34)

🪛 Clang (14.0.6)

mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp

[error] 114-114: parameter name 'f' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/cpu/kernels/common/elewise-inl.hpp

[error] 64-64: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 64-64: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 64-64: 2 adjacent parameters of 'elewise_add_fp32' of similar type ('const int *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 64-64: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 64-64: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 68-68: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 68-68: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 68-68: 2 adjacent parameters of 'elewise_sub_fp32' of similar type ('const int *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 68-68: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 68-68: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 72-72: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 72-72: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 72-72: 2 adjacent parameters of 'elewise_mul_fp32' of similar type ('const int *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 72-72: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 72-72: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 76-76: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 76-76: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 76-76: 2 adjacent parameters of 'elewise_div_fp32' of similar type ('const int *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 76-76: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 76-76: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 86-86: variable 'd' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 86-86: variable name 'd' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 87-87: variable 'N' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 87-87: variable name 'N' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 87-87: invalid case style for variable 'N'

(readability-identifier-naming,-warnings-as-errors)

[error] 88-88: variable 'idx' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 91-91: variable 'sVec' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 91-91: invalid case style for variable 'sVec'

(readability-identifier-naming,-warnings-as-errors)

[error] 100-100: variable 'vx' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 100-100: variable name 'vx' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 101-101: variable 'result' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 134-134: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 134-134: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 134-134: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 134-134: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 138-138: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 138-138: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 138-138: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 138-138: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 142-142: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 142-142: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 142-142: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 142-142: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 146-146: variable 'HWY_MAYBE_UNUSED' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 146-146: invalid case style for variable 'HWY_MAYBE_UNUSED'

(readability-identifier-naming,-warnings-as-errors)

[error] 146-146: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 146-146: parameter name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/cpu/kernels/common/kernel_dispatch.cpp

[error] 13-13: macro 'HWY_TARGET_INCLUDE' used to declare a constant; consider using a 'constexpr' constant

(cppcoreguidelines-macro-usage,-warnings-as-errors)

🔇 Additional comments (7)

examples/qwen3/main.cpp (1)

4-4: The header switch to modeling_qwen3_fa2.hpp is correct and safe. The FA2 implementation maintains full API compatibility with the original—both have identical public interfaces (constructor, forward() method, kvCache() method). The only difference is internal: the FA2 variant passes true to the StaticCache constructor's use_fa2 parameter (vs. false in the original), which doesn't affect the client code in this example file.

mllm/backends/cpu/kernels/x86/softmax.cpp (1)

37-66: Good switch to unaligned HWY memory ops (safer, same math).
Using hn::LoadU/StoreU here removes any hidden alignment requirement on X/Y without changing the reduction / exp / scale logic.

mllm/backends/cpu/kernels/x86/rmsnorm.cpp (1)

26-46: LGTM: LoadU/StoreU is the right default unless you can prove alignment.
This should prevent misaligned access issues without affecting results.

mllm/backends/cpu/ops/ElewiseOps.cpp (1)

142-162: Nice: explicit #else NYI(...) prevents silent output corruption on unsupported arch.
This addresses the “do nothing and still return” failure mode for fp32 elementwise ops.

Also applies to: 325-345, 507-527, 689-709

mllm/backends/cpu/kernels/common/elewise-inl.hpp (2)

15-148: Elementwise + scalar FP32 kernels look correct; nice fix vs reserved identifier.
Vector + tail handling is coherent (LoadU/StoreU + LoadN/StoreN), and the scalar RHS path is implemented consistently.

1-10: No ODR risk in elewise-inl.hpp — proper Highway dynamic dispatch pattern. This file is correctly included only in kernel_dispatch.cpp as part of Highway's multi-target code generation mechanism. The functions are defined within HWY_NAMESPACE blocks that are re-instantiated with different namespace values for each enabled target (via foreach_target.h). HWY_EXPORT and HWY_DYNAMIC_DISPATCH macros properly handle symbol resolution across target variants. No ODR violation occurs.

mllm/backends/cpu/kernels/common/kernel_dispatch.cpp (1)

4-80: Pattern is correct and production-ready.

The dynamic dispatch scaffold properly implements Google Highway 1.3.0's multi-target dispatch mechanism: foreach_target.h re-includes the source for each enabled target, HWY_NAMESPACE wraps target-specific implementations, and HWY_DYNAMIC_DISPATCH() selects the right variant at runtime. Architecture guards are correctly applied—x86 code is isolated from ARM with no competing definitions. All implementation functions are present in elewise-inl.hpp, and call sites in ElewiseOps.cpp properly branch between x86 (cpu::common) and ARM (cpu::arm) codepaths.

coderabbitai · 2025-12-13T06:00:23Z

docs/cpu_backend/x86/index.rst

+3. Download pre-converted models from **our HuggingFace organization** (recommended on x86):
+
+Due to current compatibility issues with the mllm-converter on x86 architecture, we recommend downloading pre-converted quantized models from our HuggingFace organization `mllmTeam <https://huggingface.co/UbiquitousLearning>`_:
+
+Example command:
+
+.. code-block:: bash
+
+   wget https://huggingface.co/mllmTeam/qwen-3-0.6b-mllm/blob/main/qwen-3-0.6b-q4_k.mllm
+


⚠️ Potential issue | 🟠 Major

Fix the Hugging Face download command (blob URL won’t fetch the model file).
Use a raw/resolve URL so wget downloads the artifact, not an HTML page.

- wget https://huggingface.co/mllmTeam/qwen-3-0.6b-mllm/blob/main/qwen-3-0.6b-q4_k.mllm + wget -O qwen-3-0.6b-q4_k.mllm https://huggingface.co/mllmTeam/qwen-3-0.6b-mllm/resolve/main/qwen-3-0.6b-q4_k.mllm

🤖 Prompt for AI Agents

In docs/cpu_backend/x86/index.rst around lines 107 to 116, the wget example uses a Hugging Face "blob" HTML URL which will download an HTML page instead of the model artifact; update the example to point to the repository file raw/resolve endpoint (e.g. use the Hugging Face raw or /resolve/main file URL or append ?raw=true) so wget retrieves the actual .mllm file instead of an HTML page.

coderabbitai · 2025-12-13T06:00:23Z

mllm/backends/cpu/ops/ElewiseOps.cpp

+#else
+        NYI("AddOp not supported on this architecture.");                              
 #endif


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail files=( "mllm/backends/cpu/ops/ElewiseOps.cpp" "mllm/backends/cpu/ops/TransposeOp.cpp" "mllm/backends/cpu/kernels/x86/softmax.cpp" "mllm/backends/cpu/kernels/x86/rmsnorm.cpp" "mllm/backends/cpu/kernels/common/kernel_dispatch.cpp" "mllm/backends/cpu/kernels/common/elewise-inl.hpp" "mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp" ) python - <<'PY' import sys, pathlib, re BAD_BYTES = set(list(range(0x00,0x09)) + list(range(0x0B,0x0D)) + list(range(0x0E,0x20)) + [0x7F] + list(range(0x80,0xA0))) files = sys.argv[1:] fail = False for p in files: b = pathlib.Path(p).read_bytes() if b.startswith(b"\xef\xbb\xbf"): print(f"[FAIL] BOM found: {p}") fail = True bad = [i for i, bb in enumerate(b) if bb in BAD_BYTES] if bad: print(f"[FAIL] Control code bytes found: {p} (first at byte offset {bad[0]})") fail = True if b"\r\n" in b or b"\r" in b: print(f"[FAIL] CRLF/CR found (expected LF only): {p}") fail = True text = b.decode("utf-8", errors="strict") for i, line in enumerate(text.split("\n"), 1): if re.search(r"[ \t]+$", line): print(f"[FAIL] Trailing whitespace: {p}:{i}") fail = True if fail: sys.exit(1) print("[OK] No BOM/control-codes/CRLF/trailing-whitespace in checked files.") PY "${files[@]}"

Repository: UbiquitousLearning/mllm

Length of output: 255

🏁 Script executed:

#!/bin/bash set -euo pipefail # First, check if the file exists if [ ! -f "mllm/backends/cpu/ops/ElewiseOps.cpp" ]; then echo "[FAIL] File not found: mllm/backends/cpu/ops/ElewiseOps.cpp" exit 1 fi # Check for trailing whitespace at the specified lines echo "Checking lines 149-151, 159-161, 191-193 for trailing whitespace:" # Get line count first total_lines=$(wc -l < "mllm/backends/cpu/ops/ElewiseOps.cpp") echo "Total lines in file: $total_lines" # Check specific line ranges for line_num in 149 150 151 159 160 161 191 192 193; do if [ "$line_num" -le "$total_lines" ]; then line_content=$(sed -n "${line_num}p" "mllm/backends/cpu/ops/ElewiseOps.cpp") # Check if line ends with space or tab if [[ "$line_content" =~ [[:space:]]$ ]]; then echo "[FAIL] Line $line_num has trailing whitespace: $(sed -n "${line_num}p" "mllm/backends/cpu/ops/ElewiseOps.cpp" | cat -A)" else echo "[OK] Line $line_num: no trailing whitespace" fi fi done # Also do a comprehensive check for any trailing whitespace in the entire file echo "" echo "Comprehensive check for trailing whitespace in entire file:" trailing_lines=$(grep -n '[[:space:]]$' "mllm/backends/cpu/ops/ElewiseOps.cpp" || true) if [ -n "$trailing_lines" ]; then echo "[FAIL] Lines with trailing whitespace found:" echo "$trailing_lines" else echo "[OK] No trailing whitespace found in file" fi

Repository: UbiquitousLearning/mllm

Length of output: 2141

Remove trailing whitespace (repo guideline violation).

Lines 150, 160, and 192 contain trailing spaces after the NYI() calls. Files under mllm/** must not have trailing whitespace at line ends.

- NYI("AddOp not supported on this architecture."); + NYI("AddOp not supported on this architecture.");

🤖 Prompt for AI Agents

In mllm/backends/cpu/ops/ElewiseOps.cpp around lines 149-151, 158-162 and 190-194, there are trailing spaces after the NYI("...") calls (specifically noted at lines 150, 160 and 192); remove the trailing whitespace at the end of those lines so the NYI calls end immediately before the line break, and ensure no other trailing spaces remain in the file (run your editor's "trim trailing whitespace" or a linter/formatter to verify).

coderabbitai · 2025-12-13T06:00:23Z

mllm/backends/cpu/ops/ElewiseOps.cpp

+#if defined(MLLM_HOST_ARCH_X86_64) || defined(MLLM_HOST_ARCH_X86)
+        // Process each batch separately
+        for (int batch = 0; batch < batch_dims; ++batch) {
+          // Each batch processes broadcast_naive_loops iterations of vector_size elements
+          for (int l = 0; l < broadcast_naive_loops; ++l) {
+            size_t a_offset = batch * broadcast_naive_loops * vector_size + l * vector_size;
+            size_t b_offset = batch * vector_size;  // b doesn't broadcast over loops dimension
+            size_t out_offset = batch * broadcast_naive_loops * vector_size + l * vector_size;

-#if defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
+            cpu::common::call_elewise_add_fp32(out + out_offset, a + a_offset, b + b_offset, vector_size);
+          }
+        }
+
+#elif defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
        // Process each batch separately
        for (int batch = 0; batch < batch_dims; ++batch) {
          // Each batch processes broadcast_naive_loops iterations of vector_size elements
          for (int l = 0; l < broadcast_naive_loops; ++l) {
-            int a_offset = batch * broadcast_naive_loops * vector_size + l * vector_size;
-            int b_offset = batch * vector_size;  // b doesn't broadcast over loops dimension
-            int out_offset = batch * broadcast_naive_loops * vector_size + l * vector_size;
+            size_t a_offset = batch * broadcast_naive_loops * vector_size + l * vector_size;
+            size_t b_offset = batch * vector_size;  // b doesn't broadcast over loops dimension
+            size_t out_offset = batch * broadcast_naive_loops * vector_size + l * vector_size;

            cpu::arm::ew_add_fp32(out + out_offset, a + a_offset, b + b_offset, vector_size, options_.getThreads());
          }
        }
+#else
+        NYI("AddOp not supported on this architecture.");         
 #endif


⚠️ Potential issue | 🟠 Major

size_t offsets still risk overflow due to int intermediate math.
batch * broadcast_naive_loops * vector_size is computed as int then converted to size_t. Cast inputs before multiplying (apply similarly in Add/Sub/Mul/Div).

- size_t a_offset = batch * broadcast_naive_loops * vector_size + l * vector_size; - size_t b_offset = batch * vector_size; // b doesn't broadcast over loops dimension - size_t out_offset = batch * broadcast_naive_loops * vector_size + l * vector_size; + const size_t batch_u = static_cast<size_t>(batch); + const size_t loops_u = static_cast<size_t>(broadcast_naive_loops); + const size_t vec_u = static_cast<size_t>(vector_size); + const size_t l_u = static_cast<size_t>(l); + const size_t a_offset = batch_u * loops_u * vec_u + l_u * vec_u; + const size_t b_offset = batch_u * vec_u; // b doesn't broadcast over loops dimension + const size_t out_offset = batch_u * loops_u * vec_u + l_u * vec_u;

Also applies to: 349-376, 531-558, 713-740

🤖 Prompt for AI Agents

In mllm/backends/cpu/ops/ElewiseOps.cpp around lines 166 to 193 (and similarly for ranges 349-376, 531-558, 713-740), the offsets are computed using int arithmetic then assigned to size_t which can overflow; cast the loop indices or operands to size_t before performing multiplications (e.g., use (size_t)batch, (size_t)broadcast_naive_loops, (size_t)vector_size) so all intermediate math is done in size_t, and apply the same fix to the corresponding Add/Sub/Mul/Div blocks to ensure safe offset calculation across architectures.

coderabbitai · 2025-12-13T06:00:23Z

mllm/backends/cpu/ops/TransposeOp.cpp

+#if defined(MLLM_HOST_ARCH_X86_64) || defined(MLLM_HOST_ARCH_X86)
+        NYI("Transpose op(HW -> WH) fp32 not supported in x86");
+#elif defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
        arm::transpose_hw_wh_fp32(input.ptr<mllm_fp32_t>(), output.ptr<mllm_fp32_t>(), input_shape[0], input_shape[1]);
 #endif
        break;
      }
      case kFloat16: {
-#if defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
+#if defined(MLLM_HOST_ARCH_X86_64) || defined(MLLM_HOST_ARCH_X86)
+        NYI("Transpose op(HW -> WH) fp16 not supported in x86");
+#elif defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
        arm::transpose_hw_wh_fp16(input.ptr<mllm_fp16_t>(), output.ptr<mllm_fp16_t>(), input_shape[0], input_shape[1]);
 #endif
        break;


⚠️ Potential issue | 🟠 Major

Add #else NYI(...) to avoid silent no-op on non-{x86,arm}.
Right now, if someone builds on another arch (e.g., WASM/RISC-V), these branches can fall through without writing output. Recommend making the unsupported case explicit.

#if defined(MLLM_HOST_ARCH_X86_64) || defined(MLLM_HOST_ARCH_X86) NYI("Transpose op(HW -> WH) fp32 not supported in x86"); #elif defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM) arm::transpose_hw_wh_fp32(...); +#else + NYI("Transpose op(HW -> WH) fp32 not supported on this architecture"); #endif

Also applies to: 59-74, 95-110

🤖 Prompt for AI Agents

In mllm/backends/cpu/ops/TransposeOp.cpp around lines 30-43 (also apply same change to 59-74 and 95-110), the architecture-specific #if/#elif blocks currently omit an #else path so building on non-x86/non-ARM targets results in a silent no-op; add an #else branch that calls NYI("Transpose op(...) not supported on this arch") (with the appropriate dtype/shape message) before the #endif to make unsupported architectures explicit and ensure the code reports/notifies instead of falling through.

chenghuaWang

LGTM

HayzelHan requested review from chenghuaWang, oreomaker and yirongjie as code owners December 5, 2025 14:36

coderabbitai bot reviewed Dec 5, 2025

View reviewed changes

chenghuaWang requested changes Dec 7, 2025

View reviewed changes

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

HayzelHan added 8 commits December 12, 2025 09:57

fix(x86): update qwen3 model runtime

289bf04

- Updated qwen3 model runtime using `modeling_qwen3_fa2.hpp` - Enabled qwen3 compilation options - Fixed rmsnorm and softmax operators - Fixed parameters for quantize - Added `NYI` messages for some unsupported ops

feat(x86): add support for elementwise operators

00c7534

Added support for common CPU elementwise operators (fp32)

refactor(ops): adopt Highway dynamic dispatch for elementwise operators

3078f5a

- Replace cpu::common::HWY_NAMESPACE with proper Highway dynamic dispatcher - Create general kernel_dispatch.hpp/cpp with HWY_DYNAMIC_DISPATCH pattern

feat(qwen3): align lm_head_out usage and enable all compilation targets

1f8a4a6

- Use lm_head_out in `modeling_qwen3_fa2.hpp` - Uncomment all compilation targets in `examples/CMakeLists.txt`

ci(x86): add GitHub Actions workflow for x86 backend build

ab4a1d3

- Add GitHub Actions workflow file for Linux (Ubuntu) x86_64 build

fix(cpu): add conditional compilation for common kernel dispatch

11c0caa

fix(ci, ops): fix build issues and improve code robustness

877e17b

- Fix python version in `.github/workflows/build-x86.yml` - Rename function name to avoid using reserved identifiers - Add #else fallback for unsupported architectures - Fix potential integer overflow in offset calculations

docs(x86): add docs for x86 CPU backend

bd046b9

HayzelHan force-pushed the feat/x86-qwen3 branch from 0a6466f to bd046b9 Compare December 13, 2025 05:54

HayzelHan requested review from liang1232018 and xumengwei as code owners December 13, 2025 05:54

coderabbitai bot reviewed Dec 13, 2025

View reviewed changes

chenghuaWang approved these changes Dec 13, 2025

View reviewed changes

chenghuaWang merged commit 0c945f3 into UbiquitousLearning:main Dec 13, 2025
5 checks passed

HayzelHan deleted the feat/x86-qwen3 branch December 14, 2025 01:57

coderabbitai bot mentioned this pull request Dec 23, 2025

feat(ascend): initial Ascend backend and add elementwise add op #564

Merged

coderabbitai bot mentioned this pull request Feb 15, 2026

feat(x86,cpu) add support for Qwen3 MoE model on x86 #619

Merged

Conversation

HayzelHan commented Dec 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

chenghuaWang Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

chenghuaWang Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

chenghuaWang Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

chenghuaWang commented Dec 7, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

chenghuaWang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HayzelHan commented Dec 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 5, 2025 •

edited

Loading