feat(ascend): initial Ascend backend and add elementwise add op by lywbarca · Pull Request #564 · UbiquitousLearning/mllm

lywbarca · 2025-12-12T06:47:15Z

Please check Guidelines for Contributing.

Summary by CodeRabbit

New Features
- Added full Ascend backend: memory pooling, device/context/stream management, dispatcher, allocator, and operator support (elementwise add and host-device transfers).
Documentation
- Added comprehensive Ascend backend design docs.
Build System
- Enabled Ascend backend build and cross-compile configs; exposed Ascend-related CMake options.
Tests
- Added Ascend kernel tests and test harness for FP16 validation.
Chores
- Registered Ascend device types/dispatcher and minor CPU math adjustments.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-12T06:47:26Z

Walkthrough

Adds a complete Ascend backend: device enumeration, global ATB/ACL context and stream, per-device memory pools and manager, pool-backed allocator, dispatcher, ATB-based ops (Add, X2X), tests, CMake build integration, and documentation.

Changes

Cohort / File(s)	Summary
Allocator & Registration `mllm/backends/ascend/AscendAllocator.hpp`, `mllm/backends/ascend/AscendAllocator.cpp`, `mllm/backends/ascend/Register.cpp`	New pool-backed AscendAllocator with storage->block-id tracking, constructor/destructor logging, alloc/free mapping; initAscendBackend() creates 100MB pools, registers backend, allocator, and dispatcher.
Memory Manager & Pool `mllm/backends/ascend/memory/AscendMemoryManager.hpp`, `mllm/backends/ascend/memory/AscendMemoryManager.cpp`, `mllm/backends/ascend/memory/AscendMemoryPool.hpp`, `mllm/backends/ascend/memory/AscendMemoryPool.cpp`, `mllm/backends/ascend/memory/MemoryBlock.hpp`	New AscendMemoryManager singleton and per-device AscendMemoryPool: block allocation/free with 64-byte alignment, free/used registries, thread-safety, block-id APIs and pool lifecycle.
Common Utilities & Device Meta `mllm/backends/ascend/AscendCommon.hpp`, `mllm/backends/ascend/AscendCommon.cpp`	Global ATB/ACL init and stream accessors, AscendDeviceMetaInfo device enumeration, AscendTensorHandle RAII, prepare/copy/verify tensor helpers, AscendTimer RAII, ACL/ATB check macros.
Backend, Dispatcher & Registration `mllm/backends/ascend/AscendBackend.cpp`, `mllm/backends/ascend/AscendDispatcher.hpp`, `mllm/backends/ascend/AscendDispatcher.cpp`	Backend registers op factories and logs device memory; AscendDispatcher added to route/async tasks (kExecuteOp/kExecuteModule), optional Perfetto tracing, process/asyncReceive implementation and factory.
Ascend Ops `mllm/backends/ascend/ops/AscendElewiseOps.hpp`, `mllm/backends/ascend/ops/AscendElewiseOps.cpp`, `mllm/backends/ascend/ops/AscendX2XOp.hpp`, `mllm/backends/ascend/ops/AscendX2XOp.cpp`	Adds AscendAddOp (ATB ELEWISE_ADD setup/forward, workspace allocation via memory manager) and AscendX2XOp (CPU↔Ascend and Ascend↔Ascend memcpy paths) plus factories.
Build & CMake `mllm/backends/ascend/CMakeLists.txt`, `CMakeLists.txt`, `tests/CMakeLists.txt`, `tests/ascend/CMakeLists.txt`	New MLLM_ASCEND_SOURCES, unified public linkage to ascendcl/opapi/nnopbase/atb/MllmRT, MLLM_USE_ASCEND_MEMPOOL define, ASCEND/ATB env include/link exposure, MLLM_RUN_MODE/MLLM_SOC_VERSION cache vars, conditional tests inclusion.
Tests & Test Helpers `tests/ascend/KernelTest.cpp`, `tests/ascend/AscendKernelTest.hpp`, `tests/ascend/KernelTestHelper.hpp`	GoogleTest suite for Add FP16: harness that init context/backend, creates AscendKernelTest with AddFloat16Test generating FP16 inputs, runs Ascend op and validates against CPU reference.
Docs & Build Tasks `docs/ascend_backend/core_design.rst`, `docs/ascend_backend/index.rst`, `docs/index.rst`, `tasks/build_arm_ascend.yaml`	New Ascend backend design doc and index entries; ARM Ascend cross-build YAML added.
Core enums/IDs `mllm/core/DeviceTypes.hpp`, `mllm/engine/Dispatcher.hpp`, `mllm/utils/Common.hpp`	Added DeviceTypes strings for kAscend/kAscendHost, Dispatcher::ascend_dispatcher_id constant, and ExitCode::kAscendError.
Minor CPU kernels (precision) changes `mllm/backends/cpu/kernels/arm/rmsnorm.cpp`, `mllm/backends/cpu/kernels/arm/softmax.cpp`	Replaced std::sqrtf→std::sqrt and std::expf→std::exp for numeric calls (precision/consistency).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Dispatcher as AscendDispatcher
    participant Backend as AscendBackend
    participant MemMgr as AscendMemoryManager
    participant MemPool as AscendMemoryPool
    participant Alloc as AscendAllocator
    participant ATB as ATB/ACL

    Client->>Dispatcher: submit Task (kExecuteOp / kExecuteModule)
    activate Dispatcher
    Dispatcher->>Dispatcher: route & optional trace
    Dispatcher->>Backend: forward(inputs, outputs)
    activate Backend
    Backend->>MemMgr: allocateBlock(workspace_size)
    activate MemMgr
    MemMgr->>MemPool: allocateBlock(size)
    activate MemPool
    MemPool-->>MemMgr: block_id + ptr
    deactivate MemPool
    MemMgr-->>Backend: workspace ptr + id
    deactivate MemMgr

    Backend->>ATB: create op, fill tensor descs, setup
    ATB-->>Backend: op ready
    Backend->>ATB: execute op
    ATB-->>Backend: done
    Backend->>MemMgr: freeBlock(block_id)
    Backend-->>Dispatcher: complete
    deactivate Backend
    Dispatcher-->>Client: result
    deactivate Dispatcher

sequenceDiagram
    participant Init
    participant Register as initAscendBackend()
    participant MemMgr as AscendMemoryManager
    participant MemPool as AscendMemoryPool
    participant Alloc as AscendAllocator
    participant Dispatcher as AscendDispatcher
    participant Context as GlobalContext

    Init->>Register: call initAscendBackend()
    activate Register
    Register->>MemMgr: createMemoryPool(100MB) per device
    activate MemMgr
    loop per device
        MemMgr->>MemPool: construct pool (aclrtMalloc base region)
        MemPool-->>MemMgr: pool ready
    end
    deactivate MemMgr
    Register->>Context: register AscendBackend
    Register->>MemMgr: register allocator to memory manager
    Register->>Dispatcher: create & register AscendDispatcher
    Register-->>Init: done
    deactivate Register

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

xumengwei
liang1232018
yirongjie
oreomaker

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description only contains the contribution template boilerplate without any actual details about the changes, objectives, or implementation notes.	Add a detailed description explaining the Ascend backend implementation, memory management design, dispatcher architecture, supported operations, and testing approach. Include motivation and key design decisions.
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.61% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: adding an initial Ascend backend implementation with an elementwise add operator support.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 13

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

mllm/backends/ascend/AscendAllocator.hpp (1)

15-44: Delete copy/move for allocator (mutex member + resource tracking) to satisfy Rule of Five

 class AscendAllocator final : public Allocator {
  public:
   AscendAllocator();
-  ~AscendAllocator();
+  ~AscendAllocator() override;
+
+  AscendAllocator(const AscendAllocator&) = delete;
+  AscendAllocator& operator=(const AscendAllocator&) = delete;
+  AscendAllocator(AscendAllocator&&) = delete;
+  AscendAllocator& operator=(AscendAllocator&&) = delete;

🟠 Major comments (18)

examples/CMakeLists.txt-18-18 (1)
18-18: Add conditional compilation guard for Ascend demo.

The Ascend demo is added unconditionally, which will cause build failures on systems without the Ascend SDK. Following the pattern of other backends (e.g., QNN with MLLM_BUILD_QNN_BACKEND), this should be conditional.

Apply this diff to add a conditional guard:
+if(MLLM_BUILD_ASCEND_BACKEND)
   add_subdirectory(ascend_add_demo)
+endif()
mllm/backends/ascend/ops/AscnedElewiseOps.cpp-1-4 (1)

1-4: Fix filename typo (Ascned → Ascend) to avoid build/grep confusion.
Even if CMake references it correctly today, this will cause ongoing maintenance issues (and is easy to miss in future edits).
mllm/backends/ascend/memory/AscendMemoryPool.cpp-73-104 (1)
73-104: Harden free/get semantics (and fix clang-tidy iterator naming).
getBlockPtr() should set addr=nullptr on failure; otherwise callers can accidentally reuse stale pointers. Also clang-tidy is flagging it naming length.
 void AscendMemoryPool::freeBlock(int block_id) {
     std::unique_lock<std::mutex> lock(block_mutex_);
@@
-    auto it = used_blocks_.find(block_id);
-    if (it != used_blocks_.end()) {
-        free_blocks_.insert(*it);
-        used_blocks_.erase(it);
+    auto used_it = used_blocks_.find(block_id);
+    if (used_it != used_blocks_.end()) {
+        free_blocks_.insert(*used_it);
+        used_blocks_.erase(used_it);
     } else {
         MLLM_ERROR("Double free block id {}", block_id);
     }
 }
 
 void AscendMemoryPool::getBlockPtr(int block_id, void *&addr) {
     std::unique_lock<std::mutex> lock(block_mutex_);
+    addr = nullptr;
@@
-    auto it = used_blocks_.find(block_id);
-    if (it != used_blocks_.end()) {
-        addr = it->second.address_;
+    auto used_it = used_blocks_.find(block_id);
+    if (used_it != used_blocks_.end()) {
+        addr = used_it->second.address_;
     } else {
         MLLM_ERROR("Get block address error, block id {}", block_id);
     }
 }
mllm/backends/ascend/AscendAllocator.cpp-23-47 (1)
23-47: Prevent size truncation when allocating blocks (size_t → uint32_t).
AscendMemoryManager::allocateBlock takes uint32_t; storage->size_ is typically size_t. Please validate/cast explicitly and fail cleanly if size_ > UINT32_MAX.
 bool AscendAllocator::alloc(Storage* storage) {
     auto& mem_manager = getAscendMemoryManager();
     int block_id = -1;
-    mem_manager.allocateBlock(storage->size_, block_id);
+    if (storage->size_ > std::numeric_limits<uint32_t>::max()) {
+        MLLM_ERROR("Storage size too large for Ascend memory pool: {} bytes", storage->size_);
+        return false;
+    }
+    mem_manager.allocateBlock(static_cast<uint32_t>(storage->size_), block_id);
mllm/backends/ascend/ops/AscendElewiseOps.hpp-6-25 (1)
6-25: Make header self-contained (+ add brief API docs).
This file uses std::shared_ptr/std::make_shared but doesn’t include <memory> (and similarly relies on transitive <vector>). Also, public op/factory declarations should have a short comment per guidelines.
 #pragma once
 
+#include <memory>
+#include <vector>
+
 #include "mllm/core/BaseOp.hpp"
 #include "mllm/core/aops/ElewiseOps.hpp"
 #include "mllm/core/OpTypes.hpp"
 
 namespace mllm::ascend {
 
+// Ascend implementation of elementwise add (currently no-broadcast, same-dtype).
 class AscendAddOp final : public aops::AddOp {
  public:
   explicit AscendAddOp(const aops::AddOpOptions& options);
@@
 class AscendAddOpFactory final : public TypedOpFactory<OpTypes::kAdd, aops::AddOpOptions> {
  public:
   std::shared_ptr<BaseOp> createOpImpl(const aops::AddOpOptions& options) override {
     return std::make_shared<AscendAddOp>(options);
   }
 };
Committable suggestion skipped: line range outside the PR's diff.
mllm/backends/ascend/memory/AscendMemoryManager.hpp-6-35 (1)
6-35: Tighten API surface (includes + docs + avoid returning mutable ref).
Add <cstddef> for size_t, add short comments for public APIs, and prefer returning std::shared_ptr<AscendMemoryPool> (by value) or const std::shared_ptr<...>& to avoid external mutation of memory_pools_.
 #pragma once
 
+#include <cstddef>
 #include <memory>
 #include <vector>
 #include <cstdint>
 #include "AscendMemoryPool.hpp"
@@
 class AscendMemoryManager {
 public:  
     AscendMemoryManager();
 
+    // Create a pool for the current device (or specified device if supported by impl).
     void createMemoryPool(size_t pool_size);
@@
-    std::shared_ptr<AscendMemoryPool> &getMemoryPool();
+    std::shared_ptr<AscendMemoryPool> getMemoryPool();
Committable suggestion skipped: line range outside the PR's diff.
mllm/backends/ascend/AscendAllocator.hpp-40-43 (1)
40-43: Fix map key/comment mismatch (void* vs “Storage ptr”)

Either change the key to Storage* (if that’s what you mean), or update the comment to match what the pointer actually represents (e.g., storage->data() pointer / device ptr).
-  std::unordered_map<void*, int> storage_to_block_id_;  // Storage ptr -> block ID
+  std::unordered_map<void*, int> storage_to_block_id_;  // device ptr (e.g., storage->data()) -> block ID
mllm/backends/ascend/memory/AscendMemoryManager.cpp-21-34 (1)
21-34: createMemoryPool: check aclrtSetDevice, restore previous device, and prevent double-init leaks

Today it changes global device context and doesn’t restore it; also repeated calls will keep pushing pools.
 void AscendMemoryManager::createMemoryPool(size_t pool_size)
 {
-    uint32_t device_count = 0;
+    uint32_t device_count{};
     auto ret = aclrtGetDeviceCount(&device_count);
     MLLM_ACL_CHECK(ret);
+
+    int32_t prev_device_id{};
+    ret = aclrtGetDevice(&prev_device_id);
+    MLLM_ACL_CHECK(ret);
+
+    if (!memory_pools_.empty()) {
+        MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "createMemoryPool called more than once");
+    }
+
     for (size_t i = 0; i < device_count; i++) {
-
-        aclrtSetDevice(i);
+        ret = aclrtSetDevice(static_cast<int32_t>(i));
+        MLLM_ACL_CHECK(ret);
 
         std::shared_ptr<AscendMemoryPool> memory_pool = std::make_shared<AscendMemoryPool>(pool_size);
         memory_pools_.push_back(memory_pool);
         MLLM_INFO("create mempool for device {} success", i);
     }
+
+    ret = aclrtSetDevice(prev_device_id);
+    MLLM_ACL_CHECK(ret);
 }
mllm/backends/ascend/AscendDispatcher.cpp-34-44 (1)

34-44: Remove dead MLLM_EMPTY_SCOPE statement in switch block, and consider guarding this capture with shared_from_this() for async safety

The MLLM_EMPTY_SCOPE; statement in the switch case is a no-op and can be removed. More importantly, the lambda [this, task] captures a raw pointer to the dispatcher. While the design assumes DispatcherManager keeps the dispatcher alive via shared_ptr, using shared_from_this() would provide defensive safety: if the dispatcher is ever removed from the manager while async work is pending, the raw this pointer would become invalid. Consider using [self = shared_from_this(), task] instead, which requires adding enable_shared_from_this<AscendDispatcher> to the class hierarchy.

mllm/backends/ascend/AscendCommon.cpp-124-185 (1)

124-185: Revisit aclInit/aclFinalize inside AscendDeviceMetaInfo ctor
Calling aclInit()/aclFinalize() inside a singleton constructor (Line 127, 174) can conflict with other runtime init (including getGlobalAtbContext() which does ACL calls but doesn’t init). Prefer a single, well-defined ACL lifetime owner (or make enumeration reuse the existing initialized runtime and avoid finalizing here).
mllm/backends/ascend/AscendCommon.hpp-44-50 (1)
44-50: Default-initialize AscendDeviceInfo fields to satisfy member-init checks
 struct AscendDeviceInfo {
-  std::string name;
-  unsigned int id;
-  size_t total_memory;  // bytes
-  size_t free_memory;   // bytes
-  std::string soc_version;
+  std::string name{};
+  unsigned int id{0};
+  size_t total_memory{0};  // bytes
+  size_t free_memory{0};   // bytes
+  std::string soc_version{};
 };
mllm/backends/ascend/AscendCommon.cpp-14-41 (1)
14-41: Guard against null Tensor internals and potential shared-storage corruption in release()
tensor_.impl()->storage()->ptr_ = nullptr; (Line 39) assumes impl/storage are non-null and that you “own” that storage uniquely. If Tensor storage is shared, nulling the pointer can break other tensors.

At minimum, add null checks; ideally, avoid writing Tensor internals directly (provide a Tensor API / storage owner API).
 void AscendTensorHandle::release() {
   if (block_id_ >= 0) {
     auto& mem_mgr = getAscendMemoryManager();
     mem_mgr.freeBlock(block_id_);
     block_id_ = -1;
-    tensor_.impl()->storage()->ptr_ = nullptr;
+    if (auto* impl = tensor_.impl(); impl && impl->storage()) {
+      impl->storage()->ptr_ = nullptr;
+    }
   }
 }
mllm/backends/ascend/AscendCommon.hpp-69-89 (1)
69-89: Initialize tensor_ and add [[nodiscard]] to query/accessors
Addresses clang-tidy findings and avoids accidentally ignoring ownership state.
 struct AscendTensorHandle {
   AscendTensorHandle() = default;
@@
-  bool valid() const { return block_id_ >= 0; }  // Check if handle owns a valid memory block
+  [[nodiscard]] bool valid() const { return block_id_ >= 0; }
 
-  Tensor& tensor() { return tensor_; }  // Access tensor
-  const Tensor& tensor() const { return tensor_; }  // Access tensor (const)
-  int blockId() const { return block_id_; }  // Get memory block ID
+  [[nodiscard]] Tensor& tensor() { return tensor_; }
+  [[nodiscard]] const Tensor& tensor() const { return tensor_; }
+  [[nodiscard]] int blockId() const { return block_id_; }
 
  private:
-  Tensor tensor_;
+  Tensor tensor_{};
   int block_id_{-1};
 };
mllm/backends/ascend/CMakeLists.txt-1-2 (1)

1-2: Add conditional guards for Ascend library linking based on RUN_MODE

RUN_MODE is defined (Line 1) but only used to conditionally apply the -g flag (Line 59); the Ascend libraries (ascendcl, opapi, nnopbase, atb—Lines 45–51) are linked unconditionally regardless of RUN_MODE. If cpu and sim modes should build without NPU-specific libraries, wrap the target_link_libraries() call (Lines 45–51) and related include/link directory setup (Lines 35–42) with if(RUN_MODE STREQUAL "npu") ... endif(). Additionally, the unconditional -O2 flag (Line 60) should align with the build type; consider applying it conditionally or only for Release builds.
mllm/backends/ascend/AscendCommon.hpp-53-66 (1)
53-66: Make special member intent explicit for the singleton (AscendDeviceMetaInfo)
Clang-tidy will keep complaining unless you explicitly delete moves / default dtor.
 class AscendDeviceMetaInfo {
  public:
   AscendDeviceMetaInfo();
+  ~AscendDeviceMetaInfo() = default;
@@
   AscendDeviceMetaInfo(const AscendDeviceMetaInfo&) = delete;
   AscendDeviceMetaInfo& operator=(const AscendDeviceMetaInfo&) = delete;
+  AscendDeviceMetaInfo(AscendDeviceMetaInfo&&) = delete;
+  AscendDeviceMetaInfo& operator=(AscendDeviceMetaInfo&&) = delete;
mllm/backends/ascend/AscendCommon.cpp-43-77 (1)
43-77: Avoid uint32_t sizing / add allocation failure handling / reduce “swappable params” risk

bytes is uint32_t (Line 58): can overflow for large tensors; prefer size_t.

allocateBlock(bytes, block_id) (Line 60): no visible error handling if allocation fails/throws.

Signature (..., int batch, int size) (Line 43–45) is easy to swap.
-  const uint32_t bytes = static_cast<uint32_t>(expected_elements * sizeof(half_float::half));
+  const size_t bytes = expected_elements * sizeof(half_float::half);
 
   mem_mgr.allocateBlock(bytes, block_id);
+  MLLM_RT_ASSERT(block_id >= 0);
Consider changing API to something like prepareAscendTensor(const std::vector<float>& host_data, std::array<int, 2> shape) or prepareAscendTensor(..., int batch, int features).
mllm/backends/ascend/AscendCommon.cpp-113-122 (1)

113-122: Add bounds validation and use tensor dtype mapping

The function unconditionally writes to desc.shape.dims[i] without checking the array capacity—add a bounds check to ensure shape.size() doesn't exceed ATB's dimension limit (assert or validate the rank).

The hardcoded ACL_FLOAT16 dtype ignores t.dtype() which is available on the Tensor class. Implement dtype mapping from DataTypes to ACL dtype constants (e.g., kFloat32 → ACL_FLOAT, kFloat16 → ACL_FLOAT16, kInt8 → ACL_INT8).
mllm/backends/ascend/AscendCommon.hpp-17-28 (1)
17-28: Replace function-like error-check macros with inline functions

Both macros can trigger clang-tidy warnings (macro-usage) and reduce type safety. Replace with inline functions:
-#define MLLM_ACL_CHECK(err)                                                                                    \
-  if (err != ACL_SUCCESS) {                                                                                    \
-    MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "ACL error code {}: {}", int(err), aclGetRecentErrMsg()); \
-  }
-
-#define MLLM_ATB_CHECK(err)                                                                                    \
-  if (err != atb::NO_ERROR) {                                                                                  \
-    MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "ATB error code {}", int(err));                           \
-  }
+namespace mllm::ascend {
+inline void aclCheck(aclError err) {
+  if (err != ACL_SUCCESS) {
+    MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "ACL error code {}: {}", int(err), aclGetRecentErrMsg());
+  }
+}
+inline void atbCheck(atb::ErrorCode err) {
+  if (err != atb::NO_ERROR) {
+    MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "ATB error code {}", int(err));
+  }
+}
+}  // namespace mllm::ascend
Update call sites: MLLM_ACL_CHECK(ret) → mllm::ascend::aclCheck(ret) (same for ATB).

🟡 Minor comments (4)

examples/ascend_add_demo/main.cpp-117-117 (1)
117-117: Replace Chinese comment with English.

Comments in source files should be in English per the coding guidelines.

Apply this diff:
-    // 清理内存池中的块
+    // Release memory pool blocks
     x_handle.release();
examples/ascend_add_demo/main.cpp-52-52 (1)
52-52: Remove debug print statement.

This appears to be a leftover debug statement from development.

Apply this diff:
     auto& ctx = Context::instance();
-    std::cout << "context over" <<std::endl;
     auto z_ascend = ctx.buildOpAndSubmitTask(
examples/ascend_add_demo/CMakeLists.txt-11-11 (1)
11-11: Replace Chinese comment with English.

Comments in source files should be in English per the coding guidelines.

Apply this diff:
-  ascendcl  # 添加 ACL 库，因为 main.cpp 中直接使用了 aclrtMemcpy
+  ascendcl  # ACL library required for aclrtMemcpy used in main.cpp
mllm/backends/ascend/AscendCommon.cpp-1-187 (1)

1-187: Remove trailing whitespace at line 81 of AscendCommon.cpp

Verification found trailing whitespace at line 81 that violates the repository's encoding and formatting rules.

🧹 Nitpick comments (11)

tasks/build_arm_ascend.yaml (2)
9-9: Android API level 28 is outdated; document the rationale or consider updating.

Android API level 28 (Android 9.0, released in 2018) is quite old. Modern Android projects typically target API 31+. If API 28 is intentional for broad compatibility, document this decision. If not, consider updating to a more recent level.

12-12: Hard-coded install prefix may limit portability.

The install prefix /root/mllm-install-android-arm64-v8a assumes /root/ is accessible and writable. While the current CI/CD workflow runs with elevated permissions, this will be problematic if the build configuration is used in non-root environments or in containers with restricted permissions.

Consider using a relative path for better portability:
-        - "-DCMAKE_INSTALL_PREFIX=/root/mllm-install-android-arm64-v8a"
+        - "-DCMAKE_INSTALL_PREFIX=./mllm-install-android-arm64-v8a"
This aligns with the pattern used in other build configurations like tasks/build_sdk_android.yaml and allows the install directory to be placed relative to the build directory regardless of the execution environment.
mllm/backends/ascend/Register.cpp (2)
19-19: Consider making memory pool size configurable.

The 100MB pool size is hardcoded. While the comment mentions "can be adjusted as needed," exposing this as a configuration parameter (environment variable or function argument) would provide better flexibility for different deployment scenarios.

Consider this approach:
// Add parameter with default
void initAscendBackend(size_t pool_size = 100 * 1024 * 1024) {
  auto& ctx = Context::instance();
  
  ascend::getAscendMemoryManager().createMemoryPool(pool_size);
  MLLM_INFO("Ascend memory pool initialized with {} MB", pool_size / (1024 * 1024));
  // ... rest of function
}
36-38: Remove or document commented code scaffolding.

The commented custom ops registration is placeholder code. Either remove it or add a TODO comment explaining when/how it should be used.

If keeping for documentation purposes:
// TODO: Example of registering custom ops:
// ctx.registerCustomizedOp(kAscend, "CustomOpName", 
//                          std::make_shared<CustomOpFactory>());
mllm/backends/ascend/AscendDispatcher.hpp (1)

17-38: Add brief API comments for new public types (required by repo guidelines)

Public header introduces new API surface (options, dispatcher, factory) but has no purpose/behavior comments (params/returns/errors).

mllm/backends/ascend/memory/AscendMemoryPool.hpp (2)

20-24: Prefer size_t for allocation sizes to avoid truncation

uint32_t size can truncate large allocations on 64-bit builds; consider size_t.

15-25: Add brief comments for public methods (purpose / params / errors)

This is a new public header API and should describe allocation policy, failure behavior, and thread-safety expectations.
mllm/backends/ascend/CMakeLists.txt (2)
17-29: Avoid bare file(GLOB ...) for sources (or add CONFIGURE_DEPENDS)
Without CONFIGURE_DEPENDS, adding/removing files won’t reliably trigger a CMake reconfigure.
-file(GLOB MLLM_ASCEND_CORE_FILES ${CMAKE_CURRENT_SOURCE_DIR}/Ascend*.cpp)
-file(GLOB MLLM_ASCEND_MEMORY_FILES ${CMAKE_CURRENT_SOURCE_DIR}/memory/*.cpp)
-file(GLOB MLLM_ASCEND_OPS_FILES ${CMAKE_CURRENT_SOURCE_DIR}/ops/*.cpp)
+file(GLOB CONFIGURE_DEPENDS MLLM_ASCEND_CORE_FILES "${CMAKE_CURRENT_SOURCE_DIR}/Ascend*.cpp")
+file(GLOB CONFIGURE_DEPENDS MLLM_ASCEND_MEMORY_FILES "${CMAKE_CURRENT_SOURCE_DIR}/memory/*.cpp")
+file(GLOB CONFIGURE_DEPENDS MLLM_ASCEND_OPS_FILES "${CMAKE_CURRENT_SOURCE_DIR}/ops/*.cpp")
30-42: Quote env-derived paths; consider avoiding PUBLIC exposure of machine-local include dirs
Unquoted $ENV{...} breaks if the path contains spaces. Also, PUBLIC include dirs propagate into dependents (and can leak build-machine paths).
-  target_include_directories(MllmAscendBackend PUBLIC $ENV{ASCEND_HOME_PATH}/include)
-  target_link_directories(MllmAscendBackend PRIVATE $ENV{ASCEND_HOME_PATH}/lib64)
+  target_include_directories(MllmAscendBackend PUBLIC "$ENV{ASCEND_HOME_PATH}/include")
+  target_link_directories(MllmAscendBackend PRIVATE "$ENV{ASCEND_HOME_PATH}/lib64")
@@
-  target_include_directories(MllmAscendBackend PUBLIC $ENV{ATB_HOME_PATH}/include)
-  target_link_directories(MllmAscendBackend PRIVATE $ENV{ATB_HOME_PATH}/lib)
+  target_include_directories(MllmAscendBackend PUBLIC "$ENV{ATB_HOME_PATH}/include")
+  target_link_directories(MllmAscendBackend PRIVATE "$ENV{ATB_HOME_PATH}/lib")
mllm/backends/ascend/AscendCommon.hpp (2)

41-42: Rename t → tensor (identifier-length rule)
fillAtbTensorDesc(const Tensor& t, ...) (Line 41) is flagged; rename for readability + clang-tidy compliance.

92-95: Reduce “easily swappable parameters” for prepareAscendTensor
(host_data, int batch, int size) is easy to call incorrectly. Prefer prepareAscendTensor(host_data, /*shape=*/{batch, size}) or rename size → features (or similar).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 439e3cc and 6b743b7.

📒 Files selected for processing (23)

examples/CMakeLists.txt (1 hunks)
examples/ascend_add_demo/CMakeLists.txt (1 hunks)
examples/ascend_add_demo/main.cpp (1 hunks)
mllm/backends/ascend/AscendAllocator.cpp (1 hunks)
mllm/backends/ascend/AscendAllocator.hpp (2 hunks)
mllm/backends/ascend/AscendBackend.cpp (1 hunks)
mllm/backends/ascend/AscendCommon.cpp (1 hunks)
mllm/backends/ascend/AscendCommon.hpp (1 hunks)
mllm/backends/ascend/AscendDispatcher.cpp (1 hunks)
mllm/backends/ascend/AscendDispatcher.hpp (1 hunks)
mllm/backends/ascend/CMakeLists.txt (2 hunks)
mllm/backends/ascend/Register.cpp (1 hunks)
mllm/backends/ascend/memory/AscendMemoryManager.cpp (1 hunks)
mllm/backends/ascend/memory/AscendMemoryManager.hpp (1 hunks)
mllm/backends/ascend/memory/AscendMemoryPool.cpp (1 hunks)
mllm/backends/ascend/memory/AscendMemoryPool.hpp (1 hunks)
mllm/backends/ascend/memory/MemoryBlock.hpp (1 hunks)
mllm/backends/ascend/ops/AscendElewiseOps.hpp (1 hunks)
mllm/backends/ascend/ops/AscnedElewiseOps.cpp (1 hunks)
mllm/core/DeviceTypes.hpp (2 hunks)
mllm/engine/Dispatcher.hpp (1 hunks)
mllm/utils/Common.hpp (1 hunks)
tasks/build_arm_ascend.yaml (1 hunks)

🧰 Additional context used

📓 Path-based instructions (5)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/engine/Dispatcher.hpp
mllm/backends/ascend/ops/AscendElewiseOps.hpp
mllm/backends/ascend/memory/AscendMemoryManager.hpp
mllm/backends/ascend/AscendBackend.cpp
mllm/backends/ascend/memory/AscendMemoryPool.hpp
mllm/backends/ascend/AscendCommon.hpp
mllm/backends/ascend/ops/AscnedElewiseOps.cpp
mllm/backends/ascend/AscendDispatcher.cpp
mllm/backends/ascend/AscendDispatcher.hpp
mllm/backends/ascend/memory/AscendMemoryPool.cpp
mllm/backends/ascend/memory/MemoryBlock.hpp
mllm/backends/ascend/memory/AscendMemoryManager.cpp
mllm/utils/Common.hpp
mllm/core/DeviceTypes.hpp
mllm/backends/ascend/Register.cpp
mllm/backends/ascend/AscendAllocator.cpp
mllm/backends/ascend/AscendCommon.cpp
mllm/backends/ascend/AscendAllocator.hpp
mllm/backends/ascend/CMakeLists.txt

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/engine/Dispatcher.hpp
mllm/backends/ascend/ops/AscendElewiseOps.hpp
mllm/backends/ascend/memory/AscendMemoryManager.hpp
mllm/backends/ascend/AscendBackend.cpp
mllm/backends/ascend/memory/AscendMemoryPool.hpp
mllm/backends/ascend/AscendCommon.hpp
mllm/backends/ascend/ops/AscnedElewiseOps.cpp
mllm/backends/ascend/AscendDispatcher.cpp
mllm/backends/ascend/AscendDispatcher.hpp
mllm/backends/ascend/memory/AscendMemoryPool.cpp
mllm/backends/ascend/memory/MemoryBlock.hpp
mllm/backends/ascend/memory/AscendMemoryManager.cpp
mllm/utils/Common.hpp
mllm/core/DeviceTypes.hpp
mllm/backends/ascend/Register.cpp
mllm/backends/ascend/AscendAllocator.cpp
mllm/backends/ascend/AscendCommon.cpp
mllm/backends/ascend/AscendAllocator.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/engine/Dispatcher.hpp
mllm/backends/ascend/ops/AscendElewiseOps.hpp
mllm/backends/ascend/memory/AscendMemoryManager.hpp
mllm/backends/ascend/AscendBackend.cpp
mllm/backends/ascend/memory/AscendMemoryPool.hpp
mllm/backends/ascend/AscendCommon.hpp
mllm/backends/ascend/ops/AscnedElewiseOps.cpp
mllm/backends/ascend/AscendDispatcher.cpp
mllm/backends/ascend/AscendDispatcher.hpp
mllm/backends/ascend/memory/AscendMemoryPool.cpp
mllm/backends/ascend/memory/MemoryBlock.hpp
mllm/backends/ascend/memory/AscendMemoryManager.cpp
mllm/utils/Common.hpp
mllm/core/DeviceTypes.hpp
mllm/backends/ascend/Register.cpp
mllm/backends/ascend/AscendAllocator.cpp
mllm/backends/ascend/AscendCommon.cpp
mllm/backends/ascend/AscendAllocator.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/ascend/AscendBackend.cpp
mllm/backends/ascend/ops/AscnedElewiseOps.cpp
mllm/backends/ascend/AscendDispatcher.cpp
mllm/backends/ascend/memory/AscendMemoryPool.cpp
mllm/backends/ascend/memory/AscendMemoryManager.cpp
mllm/backends/ascend/Register.cpp
mllm/backends/ascend/AscendAllocator.cpp
mllm/backends/ascend/AscendCommon.cpp

{mllm,mllm-cli,pymllm}/**/*.{toml,json,txt,md}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

When suggesting new dependencies, evaluate their maturity, licensing, and maintenance status. Prefer well-established and actively maintained libraries.

Files:

mllm/backends/ascend/CMakeLists.txt

🧬 Code graph analysis (7)

mllm/backends/ascend/memory/AscendMemoryManager.hpp (1)

mllm/backends/ascend/memory/AscendMemoryPool.hpp (3)

size (20-20)

block_id (22-22)

block_id (24-24)

mllm/backends/ascend/AscendBackend.cpp (4)

mllm/backends/ascend/AscendBackend.hpp (1)

AscendBackend (13-13)

mllm/backends/ascend/AscendAllocator.cpp (2)

createAscendAllocator (104-104)

createAscendAllocator (104-104)

mllm/backends/ascend/AscendAllocator.hpp (1)

createAscendAllocator (46-46)

mllm/backends/ascend/AscendCommon.hpp (1)

instance (57-60)

mllm/backends/ascend/memory/AscendMemoryPool.hpp (1)

mllm/backends/ascend/memory/AscendMemoryPool.cpp (2)

AscendMemoryPool (14-26)

AscendMemoryPool (28-36)

mllm/backends/ascend/AscendCommon.hpp (1)

mllm/backends/ascend/AscendCommon.cpp (5)

AscendTensorHandle (14-15)

AscendTensorHandle (17-17)

AscendTensorHandle (19-22)

other (24-32)

other (24-24)

mllm/backends/ascend/ops/AscnedElewiseOps.cpp (1)

mllm/backends/ascend/ops/AscendElewiseOps.hpp (5)

AscendAddOp (14-14)

options (22-24)

options (22-22)

inputs (16-16)

inputs (17-17)

mllm/backends/ascend/Register.cpp (5)

mllm/backends/ascend/AscendCommon.hpp (1)

instance (57-60)

mllm/backends/ascend/memory/AscendMemoryManager.hpp (1)

pool_size (18-18)

mllm/engine/DispatcherManager.hpp (1)

dispatcher (49-49)

mllm/backends/ascend/AscendDispatcher.cpp (2)

createAscendDispatcher (84-87)

createAscendDispatcher (84-85)

mllm/backends/ascend/AscendDispatcher.hpp (1)

createAscendDispatcher (36-37)

mllm/backends/ascend/AscendAllocator.hpp (1)

mllm/backends/ascend/AscendAllocator.cpp (2)

AscendAllocator (11-13)

AscendAllocator (15-21)

🪛 Clang (14.0.6)

mllm/engine/Dispatcher.hpp

[error] 28-28: invalid case style for global constant 'ascend_dispatcher_id'

(readability-identifier-naming,-warnings-as-errors)

examples/ascend_add_demo/main.cpp

[error] 98-98: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 111-111: if with identical then and else branches

(bugprone-branch-clone,-warnings-as-errors)

mllm/backends/ascend/memory/AscendMemoryPool.hpp

[error] 15-15: class 'AscendMemoryPool' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/ascend/AscendCommon.hpp

[error] 18-18: function-like macro 'MLLM_ACL_CHECK' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

[error] 19-19: macro argument should be enclosed in parentheses

(bugprone-macro-parentheses,-warnings-as-errors)

[error] 24-24: function-like macro 'MLLM_ATB_CHECK' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

[error] 25-25: macro argument should be enclosed in parentheses

(bugprone-macro-parentheses,-warnings-as-errors)

[error] 41-41: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 44-44: constructor does not initialize these fields: name, id, total_memory, free_memory, soc_version

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 53-53: class 'AscendDeviceMetaInfo' defines a copy constructor and a copy assignment operator but does not define a destructor, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

[error] 69-69: constructor does not initialize these fields: tensor_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 80-80: function 'valid' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)

[error] 83-83: function 'tensor' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)

[error] 84-84: function 'blockId' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)

mllm/backends/ascend/AscendDispatcher.hpp

[error] 13-13: constructor does not initialize these fields: MLLM_EMPTY_SCOPE

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 21-21: parameter name 'id' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/ascend/memory/AscendMemoryPool.cpp

[error] 14-14: constructor does not initialize these fields: block_mutex_, used_blocks_, free_blocks_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 42-42: method 'allocateBlock' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

[error] 45-45: variable 'align_size' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 59-59: variable 'cur_mem_ptr_align' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 73-73: method 'freeBlock' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

[error] 81-81: variable name 'it' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 90-90: method 'getBlockPtr' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

[error] 98-98: variable name 'it' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/ascend/memory/MemoryBlock.hpp

[error] 11-11: constructor does not initialize these fields: block_id_, block_size_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

mllm/backends/ascend/memory/AscendMemoryManager.cpp

[error] 13-13: variable 'g_ascendMemoryManager' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 13-13: invalid case style for variable 'g_ascendMemoryManager'

(readability-identifier-naming,-warnings-as-errors)

[error] 15-15: use '= default' to define a trivial default constructor

(modernize-use-equals-default,-warnings-as-errors)

[error] 21-21: method 'createMemoryPool' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

[error] 23-23: variable 'device_count' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 38-38: variable 'device_id' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 46-46: variable 'device_id' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

mllm/backends/ascend/Register.cpp

[error] 19-19: variable 'pool_size' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

mllm/backends/ascend/AscendCommon.cpp

[error] 12-12: variable 'g_atb_stream' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 43-43: 3 adjacent parameters of 'prepareAscendTensor' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 46-46: variable 'expected_elements' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 58-58: variable 'bytes' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 80-80: variable 'ctx' provides global access to a non-const object; consider making the pointed-to data 'const'

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 113-113: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/ascend/AscendAllocator.hpp

[error] 15-15: class 'AscendAllocator' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

🔇 Additional comments (9)

mllm/utils/Common.hpp (1)

34-34: LGTM! Well-integrated enum value addition.

The kAscendError enum value is correctly placed alongside other backend-specific error codes and follows the established naming convention.

mllm/core/DeviceTypes.hpp (1)

36-37: LGTM! Ascend device type integration follows existing patterns.

The additions to both conversion functions are consistent with existing device types and properly handle the new Ascend and AscendHost device types.

Also applies to: 52-55

mllm/engine/Dispatcher.hpp (1)

28-28: LGTM! Dispatcher ID follows existing naming conventions.

The naming style is consistent with all other dispatcher IDs in the file (cpu_dispatcher_id, cuda_dispatcher_id, etc.). While the static analysis tool flags the naming convention, changing only this identifier would create inconsistency with the established codebase pattern.

mllm/backends/ascend/AscendBackend.cpp (1)

13-21: LGTM! Backend initialization follows established patterns.

The constructor properly registers the operation factory and logs device information. The lambda for byte-to-MB conversion is clear and the logging will be helpful for users to verify their Ascend hardware configuration.

examples/ascend_add_demo/main.cpp (1)

17-21: LGTM! Excellent error handling and resource management.

The demo demonstrates good practices:

Comprehensive error handling with try-catch

Proper cleanup on both success and error paths (lines 70-72, 118-119)

Clear verification logic with tolerance checking

Informative output for users

Also applies to: 63-73, 111-121, 123-126

mllm/backends/ascend/ops/AscnedElewiseOps.cpp (2)

22-35: The file mllm/backends/ascend/ops/AscnedElewiseOps.cpp does not exist in the repository. The review comment references a code location that cannot be found in the codebase. Verify the correct file path and file name before proceeding with this review.

Likely an incorrect or invalid review comment.

37-116: The file mllm/backends/ascend/ops/AscnedElewiseOps.cpp does not exist in the repository, and the code snippet shown in the review comment cannot be located through any search. This means either:

The code is being added in a pending change not yet in the repository, or

The file path is incorrect (note the typo "Ascned" instead of "Ascend")

Without access to the actual file being reviewed, the substantive concerns cannot be definitively verified. However, the review comment raises legitimate C++ safety concerns that are generally valid:

null-pointer checks for context objects are prudent defensive programming

RAII guards for resource management (operation handles) align with C++ best practices per the coding guidelines

Explicit handling of memory allocation failures is required for production-ready code

To properly verify this review comment, please:

Confirm the correct file path

Provide context on whether this is code in an uncommitted change or pending PR

Ensure the file is accessible in the current repository state

mllm/backends/ascend/memory/AscendMemoryPool.cpp (1)

12-26: ****

The file mllm/backends/ascend/memory/AscendMemoryPool.cpp mentioned in this review comment could not be located in the current repository state. Manual verification is required to:

Confirm the file exists at the specified path or locate its actual location

Verify the header file AscendMemoryPool.hpp and whether it declares a default argument for the constructor

Validate the presence of trailing whitespace on the lines mentioned

Confirm clang-tidy member-initialization warnings with -Werror flag in the CI configuration

Review the actual class member declarations to ensure the proposed initializer list in the diff matches all members

The review comment's claims regarding default argument placement, clang-tidy warnings, and whitespace violations cannot be verified without access to the actual source files.

mllm/backends/ascend/AscendAllocator.hpp (1)

20-20: Double-check ctrlByMemManager() semantics now that allocations are “memory pool support”

If this flag is used by the engine to decide who owns lifecycle/free behavior, returning false may be wrong after switching to a global memory manager + block IDs.

mllm/backends/ascend/AscendAllocator.cpp

mllm/backends/ascend/AscendCommon.cpp

mllm/backends/ascend/AscendDispatcher.cpp

coderabbitai · 2025-12-12T06:55:02Z

mllm/backends/ascend/AscendDispatcher.cpp

+    case TaskTypes::kExecuteModule: {
+      auto moduleName = static_cast<nn::Module*>(task->custom_context_ptr)->getModuleName();
+#ifdef MLLM_PERFETTO_ENABLE
+      MLLM_PERF_TRACE_EVENT("mllm.ascend.execute.", perfetto::DynamicString{moduleName},
+                            [&](perfetto::EventContext ctx) {
+                              int cnt = 0;
+                              for (auto& i : task->inputs) {
+                                ctx.AddDebugAnnotation(perfetto::DynamicString{"inputs-" + std::to_string(cnt++)},
+                                                       i.shape());
+                              }
+                            });
+#endif
+      auto ascendBackend = std::static_pointer_cast<AscendBackend>(Context::instance().getBackend(kAscend));
+
+      task->outputs = ((nn::Module*)(task->custom_context_ptr))->forward(task->inputs, task->args);
+
+      // TODO:
+      // ascendBackend->graphExecute(moduleName, task->inputs, task->outputs);
+      break;
+    }
+    default: NYI("AscendDispatcher::process not supported task type");
+  }


⚠️ Potential issue | 🔴 Critical

Avoid unchecked custom_context_ptr cast + fix unused ascendBackend (-Werror risk)

task->custom_context_ptr is used as nn::Module* without null/type validation.

ascendBackend is currently unused (likely -Wunused-variable).

case TaskTypes::kExecuteModule: { - auto moduleName = static_cast<nn::Module*>(task->custom_context_ptr)->getModuleName(); + auto* module = static_cast<nn::Module*>(task->custom_context_ptr); + if (module == nullptr) { + MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Task custom_context_ptr is null (expected nn::Module*)"); + } + auto moduleName = module->getModuleName(); #ifdef MLLM_PERFETTO_ENABLE MLLM_PERF_TRACE_EVENT("mllm.ascend.execute.", perfetto::DynamicString{moduleName}, [&](perfetto::EventContext ctx) { int cnt = 0; for (auto& i : task->inputs) { ctx.AddDebugAnnotation(perfetto::DynamicString{"inputs-" + std::to_string(cnt++)}, i.shape()); } }); #endif - auto ascendBackend = std::static_pointer_cast<AscendBackend>(Context::instance().getBackend(kAscend)); + [[maybe_unused]] auto ascend_backend = + std::static_pointer_cast<AscendBackend>(Context::instance().getBackend(kAscend)); - task->outputs = ((nn::Module*)(task->custom_context_ptr))->forward(task->inputs, task->args); + task->outputs = module->forward(task->inputs, task->args); - // TODO: - // ascendBackend->graphExecute(moduleName, task->inputs, task->outputs); + // TODO: Wire graph execution path (if/when supported on Ascend backend). + // ascend_backend->graphExecute(moduleName, task->inputs, task->outputs); break; }

🤖 Prompt for AI Agents

In mllm/backends/ascend/AscendDispatcher.cpp around lines 56 to 77, avoid the unchecked static cast of task->custom_context_ptr to nn::Module* and resolve the unused ascendBackend variable: first validate that task->custom_context_ptr is non-null and is a valid nn::Module instance (use a safe cast such as dynamic_cast on a properly-typed pointer or check a runtime type field) and handle the error path (log and bail out) if the pointer is null or the cast fails; replace the static_cast with that safe cast and use the resulting nn::Module* to call forward; for ascendBackend either remove the variable if not needed or mark it used (e.g., call the TODO graphExecute path or explicitly suppress unused-warning) so there is no -Wunused-variable.

mllm/backends/ascend/AscendDispatcher.cpp

coderabbitai · 2025-12-12T06:55:03Z

mllm/backends/ascend/memory/AscendMemoryManager.cpp

+std::shared_ptr<AscendMemoryPool> &AscendMemoryManager::getMemoryPool()
+{
+    size_t device_id = static_cast<size_t>(getDeviceId());
+    if (device_id >= memory_pools_.size()) {
+        MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Invalid device id {}", device_id);
+    }
+    return memory_pools_[device_id];
+}


⚠️ Potential issue | 🔴 Critical

getMemoryPool: handle negative device id and “not initialized” state explicitly

Casting -1 to size_t creates a huge index and hides the real error.

std::shared_ptr<AscendMemoryPool> &AscendMemoryManager::getMemoryPool() { - size_t device_id = static_cast<size_t>(getDeviceId()); + if (memory_pools_.empty()) { + MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Ascend memory pools not initialized"); + } + const int32_t device_id_i32 = getDeviceId(); + if (device_id_i32 < 0) { + MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Invalid device id {}", device_id_i32); + } + const size_t device_id = static_cast<size_t>(device_id_i32); if (device_id >= memory_pools_.size()) { MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Invalid device id {}", device_id); } return memory_pools_[device_id]; }

🧰 Tools

🪛 Clang (14.0.6)

[error] 46-46: variable 'device_id' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

🤖 Prompt for AI Agents

In mllm/backends/ascend/memory/AscendMemoryManager.cpp around lines 44 to 51, the current code casts getDeviceId() to size_t which hides negative values (e.g., -1) by producing a huge unsigned index; change to read the device id as a signed integer (auto or int), explicitly check for device_id < 0 and call MLLM_ERROR_EXIT with a clear "device not initialized" or "invalid device id" message, then check the non-negative device_id against memory_pools_.size() before returning memory_pools_[device_id]; this ensures negative/uninitialized ids are handled distinctly from out-of-range indices.

coderabbitai · 2025-12-12T06:55:03Z

mllm/backends/ascend/memory/AscendMemoryPool.cpp

+void AscendMemoryPool::allocateBlock(uint32_t size, int &block_id) {
+    std::unique_lock<std::mutex> lock(block_mutex_);
+
+    size_t align_size = ((size + 31) & ~31) + 32;  
+
+    for (auto it = free_blocks_.begin(); it != free_blocks_.end(); it++) {
+        if (it->second.block_size_ >= align_size) {
+            block_id = it->second.block_id_;
+            used_blocks_.insert(*it);
+            free_blocks_.erase(it);
+            MLLM_INFO("find free block id {} to allocate", block_id);
+            return;
+        }
+    }
+
+    if (remain_size_ > align_size) {
+        block_id = generateBlocksId();
+        uint64_t cur_mem_ptr_align = (reinterpret_cast<uint64_t>(cur_mem_ptr_) + 63) & ~63;  
+        remain_size_ -= (cur_mem_ptr_align - reinterpret_cast<uint64_t>(cur_mem_ptr_));
+        cur_mem_ptr_ = reinterpret_cast<void *>(cur_mem_ptr_align);
+
+        MemoryBlock block = {block_id, align_size, cur_mem_ptr_};
+        used_blocks_.insert({block_id, block});
+        remain_size_ -= align_size;
+        cur_mem_ptr_ = reinterpret_cast<uint8_t *>(cur_mem_ptr_) + align_size;
+        MLLM_INFO("allocate block id {} for size {}", block_id, align_size);
+        return;
+    }
+    MLLM_ERROR("allocate block fail");
+}


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # 1. Find AscendMemoryPool and related files echo "=== Finding Ascend memory-related files ===" find . -type f -name "*AscendMemory*" -o -name "*AscendAllocator*" 2>/dev/null | head -20 # 2. Look for AscendAllocator::alignSize echo -e "\n=== Searching for AscendAllocator::alignSize ===" rg "alignSize" --type cpp --type h -B 2 -A 5 2>/dev/null | head -60 # 3. Check current memory pool implementation echo -e "\n=== Current AscendMemoryPool.cpp (lines around 42-71) ===" wc -l mllm/backends/ascend/memory/AscendMemoryPool.cpp 2>/dev/null

Repository: UbiquitousLearning/mllm

Length of output: 4294

Fix pool capacity check + alignment math (padding underflow / exact-fit bug).

The allocation check uses remain_size_ > align_size but fails to account for pointer alignment padding, causing exact-fit allocations to be rejected and potential underflow of remain_size_ in edge cases. For example, with remain_size_ = 96 and align_size = 96, the check 96 > 96 fails even though the allocation is possible. Additionally, with padding to a 64B boundary, a value like remain_size_ = 144 passes the check but underflows after subtracting both padding (64 bytes) and the block size (96 bytes), resulting in -16. The alignment formula ((size + 31) & ~31) + 32 also uses an inconsistent approach compared to pointer alignment (ptr + 63) & ~63, and the uint32_t size parameter can overflow in the arithmetic.

Use uintptr_t and unify to 64B alignment with constexpr size_t kAlign = 64. Cast the size parameter to size_t to prevent overflow. Change the capacity check to remain_size_ >= padding + align_size to account for alignment overhead and allow exact-fit allocations. Rename the iterator variable to free_it for clarity. Set block_id = -1 on allocation failure and enhance the error log with allocation parameters for debugging.

void AscendMemoryPool::allocateBlock(uint32_t size, int &block_id) { std::unique_lock<std::mutex> lock(block_mutex_); - size_t align_size = ((size + 31) & ~31) + 32; + constexpr size_t kAlign = 64; + const size_t req = static_cast<size_t>(size); + const size_t align_size = (req + (kAlign - 1)) & ~(kAlign - 1); - for (auto it = free_blocks_.begin(); it != free_blocks_.end(); it++) { - if (it->second.block_size_ >= align_size) { - block_id = it->second.block_id_; - used_blocks_.insert(*it); - free_blocks_.erase(it); + for (auto free_it = free_blocks_.begin(); free_it != free_blocks_.end(); ++free_it) { + if (free_it->second.block_size_ >= align_size) { + block_id = free_it->second.block_id_; + used_blocks_.insert(*free_it); + free_blocks_.erase(free_it); MLLM_INFO("find free block id {} to allocate", block_id); return; } } - if (remain_size_ > align_size) { + const uintptr_t cur = reinterpret_cast<uintptr_t>(cur_mem_ptr_); + const uintptr_t aligned = (cur + (kAlign - 1)) & ~(static_cast<uintptr_t>(kAlign - 1)); + const size_t padding = static_cast<size_t>(aligned - cur); + if (remain_size_ >= padding + align_size) { block_id = generateBlocksId(); - uint64_t cur_mem_ptr_align = (reinterpret_cast<uint64_t>(cur_mem_ptr_) + 63) & ~63; - remain_size_ -= (cur_mem_ptr_align - reinterpret_cast<uint64_t>(cur_mem_ptr_)); - cur_mem_ptr_ = reinterpret_cast<void *>(cur_mem_ptr_align); + remain_size_ -= padding; + cur_mem_ptr_ = reinterpret_cast<void *>(aligned); MemoryBlock block = {block_id, align_size, cur_mem_ptr_}; used_blocks_.insert({block_id, block}); remain_size_ -= align_size; cur_mem_ptr_ = reinterpret_cast<uint8_t *>(cur_mem_ptr_) + align_size; MLLM_INFO("allocate block id {} for size {}", block_id, align_size); return; } - MLLM_ERROR("allocate block fail"); + block_id = -1; + MLLM_ERROR("allocate block fail: req={}, aligned={}, remain={}", req, align_size, remain_size_); }

🧰 Tools

🪛 Clang (14.0.6)

[error] 42-42: method 'allocateBlock' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

[error] 45-45: variable 'align_size' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 59-59: variable 'cur_mem_ptr_align' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

🤖 Prompt for AI Agents

In mllm/backends/ascend/memory/AscendMemoryPool.cpp around lines 42 to 71, the allocation logic does incorrect alignment math and capacity checking: change to use constexpr size_t kAlign = 64 and cast the uint32_t size parameter to size_t to avoid overflow, compute align_size as a 64-byte-aligned size (use the same 64B alignment rule you use for the pointer), compute padding = ((uintptr_t(cur_mem_ptr_) + (kAlign-1)) & ~(kAlign-1)) - uintptr_t(cur_mem_ptr_), and then check remain_size_ >= padding + align_size (allowing exact-fit and preventing underflow); when advancing pointer use uintptr_t for alignment and update remain_size_ by subtracting padding before subtracting align_size; rename the free-block iterator to free_it for clarity; on failure set block_id = -1 and make the error log include current size, align_size, padding and remain_size_ for debugging.

coderabbitai · 2025-12-12T06:55:03Z

mllm/backends/ascend/memory/AscendMemoryPool.hpp

+class AscendMemoryPool {
+public:
+    explicit AscendMemoryPool(size_t pool_size);
+    ~AscendMemoryPool();
+
+    void allocateBlock(uint32_t size, int& block_id);
+
+    void freeBlock(int block_id);
+
+    void getBlockPtr(int block_id, void*& addr);
+
+private:
+    uint64_t generateBlocksId();
+
+    std::atomic<uint64_t> id_ = 0;  
+    std::mutex block_mutex_;              
+
+    void* base_mem_ptr_ = nullptr;      
+    void* cur_mem_ptr_ = nullptr;   
+    int64_t remain_size_ = 0;       
+
+    std::unordered_map<int, MemoryBlock> used_blocks_;  
+    std::unordered_map<int, MemoryBlock> free_blocks_;  
+};


⚠️ Potential issue | 🔴 Critical

Define copy/move semantics (Rule of Five) for pool owning raw device memory

Current type is copyable by default but owns base_mem_ptr_ and has a mutex/maps → copying is unsafe.

class AscendMemoryPool { public: explicit AscendMemoryPool(size_t pool_size); ~AscendMemoryPool(); + + AscendMemoryPool(const AscendMemoryPool&) = delete; + AscendMemoryPool& operator=(const AscendMemoryPool&) = delete; + AscendMemoryPool(AscendMemoryPool&&) = delete; + AscendMemoryPool& operator=(AscendMemoryPool&&) = delete;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class AscendMemoryPool {

public:

explicit AscendMemoryPool(size_t pool_size);

~AscendMemoryPool();

void allocateBlock(uint32_t size, int& block_id);

void freeBlock(int block_id);

void getBlockPtr(int block_id, void*& addr);

private:

uint64_t generateBlocksId();

std::atomic<uint64_t> id_ = 0;

std::mutex block_mutex_;

void* base_mem_ptr_ = nullptr;

void* cur_mem_ptr_ = nullptr;

int64_t remain_size_ = 0;

std::unordered_map<int, MemoryBlock> used_blocks_;

std::unordered_map<int, MemoryBlock> free_blocks_;

};

class AscendMemoryPool {

public:

explicit AscendMemoryPool(size_t pool_size);

~AscendMemoryPool();

AscendMemoryPool(const AscendMemoryPool&) = delete;

AscendMemoryPool& operator=(const AscendMemoryPool&) = delete;

AscendMemoryPool(AscendMemoryPool&&) = delete;

AscendMemoryPool& operator=(AscendMemoryPool&&) = delete;

void allocateBlock(uint32_t size, int& block_id);

void freeBlock(int block_id);

void getBlockPtr(int block_id, void*& addr);

private:

uint64_t generateBlocksId();

std::atomic<uint64_t> id_ = 0;

std::mutex block_mutex_;

void* base_mem_ptr_ = nullptr;

void* cur_mem_ptr_ = nullptr;

int64_t remain_size_ = 0;

std::unordered_map<int, MemoryBlock> used_blocks_;

std::unordered_map<int, MemoryBlock> free_blocks_;

};

🧰 Tools

🪛 Clang (14.0.6)

[error] 15-15: class 'AscendMemoryPool' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/ascend/memory/AscendMemoryPool.hpp

mllm/backends/ascend/memory/MemoryBlock.hpp

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (4)

mllm/backends/ascend/AscendDispatcher.cpp (1)
55-73: Unchecked custom_context_ptr cast and unused ascendBackend variable.

The task->custom_context_ptr is cast to nn::Module* without null validation (lines 56, 69), which could cause undefined behavior. Additionally, ascendBackend on line 67 is unused, which will trigger -Wunused-variable warnings.

Apply this diff to add validation and suppress the unused warning:
     case TaskTypes::kExecuteModule: {   //TODO: execute module
-      auto moduleName = static_cast<nn::Module*>(task->custom_context_ptr)->getModuleName();
+      auto* module = static_cast<nn::Module*>(task->custom_context_ptr);
+      if (module == nullptr) {
+        MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Task custom_context_ptr is null");
+      }
+      auto moduleName = module->getModuleName();
 #ifdef MLLM_PERFETTO_ENABLE
       MLLM_PERF_TRACE_EVENT("mllm.ascend.execute.", perfetto::DynamicString{moduleName},
                             [&](perfetto::EventContext ctx) {
                               int cnt = 0;
                               for (auto& i : task->inputs) {
                                 ctx.AddDebugAnnotation(perfetto::DynamicString{"inputs-" + std::to_string(cnt++)},
                                                        i.shape());
                               }
                             });
 #endif
-      auto ascendBackend = std::static_pointer_cast<AscendBackend>(Context::instance().getBackend(kAscend));
+      [[maybe_unused]] auto ascendBackend =
+          std::static_pointer_cast<AscendBackend>(Context::instance().getBackend(kAscend));
 
-      task->outputs = ((nn::Module*)(task->custom_context_ptr))->forward(task->inputs, task->args);
+      task->outputs = module->forward(task->inputs, task->args);
mllm/backends/ascend/AscendDispatcher.hpp (1)
13-15: AscendDispatcherOptions using MLLM_EMPTY_SCOPE causes uninitialized field warning.

Static analysis reports that the implicit constructor doesn't initialize fields introduced by MLLM_EMPTY_SCOPE. If the struct is currently empty, prefer a truly empty struct body or a comment placeholder.
 struct AscendDispatcherOptions {
-  MLLM_EMPTY_SCOPE;
+  // Reserved for future options.
 };
mllm/backends/ascend/memory/AscendMemoryPool.hpp (1)
15-38: Define copy/move semantics (Rule of Five) - class owns raw device memory.

The class defines a non-default destructor that frees base_mem_ptr_, but copy/move operations are implicitly generated. Copying this class would cause double-free of device memory. The mutex and maps also make copying semantically incorrect.
 class AscendMemoryPool {
 public:
     explicit AscendMemoryPool(size_t pool_size);
     ~AscendMemoryPool();
+
+    AscendMemoryPool(const AscendMemoryPool&) = delete;
+    AscendMemoryPool& operator=(const AscendMemoryPool&) = delete;
+    AscendMemoryPool(AscendMemoryPool&&) = delete;
+    AscendMemoryPool& operator=(AscendMemoryPool&&) = delete;
 
     void allocateBlock(uint32_t size, int& block_id);
mllm/backends/ascend/memory/AscendMemoryManager.cpp (1)
43-50: CRITICAL: Handle negative device ID before casting to size_t.

This issue was flagged in a previous review but remains unaddressed. Casting a negative device ID (e.g., -1) to size_t produces a very large unsigned value that could bypass the bounds check or cause undefined behavior.

Apply the fix suggested in the previous review:
 std::shared_ptr<AscendMemoryPool> &AscendMemoryManager::getMemoryPool()
 {
-    size_t device_id = static_cast<size_t>(getDeviceId());
+    if (memory_pools_.empty()) {
+        MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Ascend memory pools not initialized");
+    }
+    const int32_t device_id_i32 = getDeviceId();
+    if (device_id_i32 < 0) {
+        MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Invalid device id {}", device_id_i32);
+    }
+    const size_t device_id = static_cast<size_t>(device_id_i32);
     if (device_id >= memory_pools_.size()) {
         MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Invalid device id {}", device_id);
     }
     return memory_pools_[device_id];
 }

🧹 Nitpick comments (6)

examples/ascend_add_demo/main.cpp (3)
22-41: Consider extracting repetitive print logic into a helper function.

The loops for printing vectors (lines 29-34 and 36-41) follow the same pattern and could be refactored into a reusable helper:
void printVector(const std::string& label, const std::vector<float>& vec) {
  std::cout << "   " << label << ": [";
  for (size_t i = 0; i < vec.size(); ++i) {
    std::cout << vec[i];
    if (i < vec.size() - 1) std::cout << ", ";
  }
  std::cout << "]" << std::endl;
}
This would reduce duplication and improve readability.

82-109: Consider extracting verification logic for reusability.

The verification loop (lines 100-109) and the print loops (lines 83-88, 90-95) could be extracted into helper functions to reduce duplication and improve testability:
bool verifyResults(const std::vector<float>& actual, 
                   const std::vector<float>& expected, 
                   float tolerance) {
  bool correct = true;
  for (size_t i = 0; i < actual.size(); ++i) {
    float diff = std::abs(actual[i] - expected[i]);
    if (diff > tolerance) {
      correct = false;
      std::cout << "   ✗ Mismatch at index " << i 
                << ": expected " << expected[i] 
                << ", got " << actual[i] 
                << " (diff: " << diff << ")" << std::endl;
    }
  }
  return correct;
}
98-98: Consider using uppercase literal suffix for consistency.

Static analysis suggests using 0.1F instead of 0.1f for consistency with coding standards, though this is a minor stylistic preference.
mllm/backends/ascend/AscendDispatcher.cpp (1)
55-55: Fix TODO comment format.

The TODO comment should have a space after // per coding guidelines.
-    case TaskTypes::kExecuteModule: {   //TODO: execute module
+    case TaskTypes::kExecuteModule: {   // TODO: execute module
mllm/backends/ascend/AscendCommon.cpp (1)
50-52: Adjacent parameters of similar type (batch, size) are easily swapped.

Both batch and size are int parameters in the same position, making accidental swaps at call sites likely. Consider using a struct or named parameters for clarity.
+struct AscendTensorShape {
+  int batch;
+  int size;
+};
+
-AscendTensorHandle prepareAscendTensor(const std::vector<float>& host_data,
-                                       int batch,
-                                       int size) {
+AscendTensorHandle prepareAscendTensor(const std::vector<float>& host_data,
+                                       const AscendTensorShape& shape) {
+  const int batch = shape.batch;
+  const int size = shape.size;
Alternatively, reorder to (int batch, int size, const std::vector<float>& host_data) to separate integer params from the vector, reducing swap risk.
mllm/backends/ascend/AscendAllocator.cpp (1)
82-91: Use consistent logging macros instead of std::cout.

The rest of the file uses MLLM_INFO, MLLM_WARN, and MLLM_ERROR for logging. Using std::cout here is inconsistent and bypasses the logging infrastructure.
 bool AscendAllocator::generalAlloc(void** ptr, size_t cap, size_t align) {
-    //we don't support generalAlloc , therefore return false
-    std::cout << "generalAlloc is not supported in AscendAllocator" << std::endl;
+    // generalAlloc is not supported in AscendAllocator
+    MLLM_WARN("generalAlloc is not supported in AscendAllocator");
     return false;
 }
 
 void AscendAllocator::generalFree(void* ptr) {
-    //we don't support generalFree , therefore do nothing
-    std::cout << "generalFree is not supported in AscendAllocator" << std::endl;
+    // generalFree is not supported in AscendAllocator
+    MLLM_WARN("generalFree is not supported in AscendAllocator");
 }
This also allows removing #include <iostream> from line 6.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b743b7 and 4b37bb5.

📒 Files selected for processing (7)

examples/ascend_add_demo/main.cpp (1 hunks)
mllm/backends/ascend/AscendAllocator.cpp (1 hunks)
mllm/backends/ascend/AscendCommon.cpp (1 hunks)
mllm/backends/ascend/AscendDispatcher.cpp (1 hunks)
mllm/backends/ascend/AscendDispatcher.hpp (1 hunks)
mllm/backends/ascend/memory/AscendMemoryManager.cpp (1 hunks)
mllm/backends/ascend/memory/AscendMemoryPool.hpp (1 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/ascend/AscendDispatcher.cpp
mllm/backends/ascend/AscendAllocator.cpp
mllm/backends/ascend/memory/AscendMemoryManager.cpp
mllm/backends/ascend/AscendCommon.cpp
mllm/backends/ascend/AscendDispatcher.hpp
mllm/backends/ascend/memory/AscendMemoryPool.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/ascend/AscendDispatcher.cpp
mllm/backends/ascend/AscendAllocator.cpp
mllm/backends/ascend/memory/AscendMemoryManager.cpp
mllm/backends/ascend/AscendCommon.cpp
mllm/backends/ascend/AscendDispatcher.hpp
mllm/backends/ascend/memory/AscendMemoryPool.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/ascend/AscendDispatcher.cpp
mllm/backends/ascend/AscendAllocator.cpp
mllm/backends/ascend/memory/AscendMemoryManager.cpp
mllm/backends/ascend/AscendCommon.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/ascend/AscendDispatcher.cpp
mllm/backends/ascend/AscendAllocator.cpp
mllm/backends/ascend/memory/AscendMemoryManager.cpp
mllm/backends/ascend/AscendCommon.cpp
mllm/backends/ascend/AscendDispatcher.hpp
mllm/backends/ascend/memory/AscendMemoryPool.hpp

🧠 Learnings (3)

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/* : No line may end with trailing whitespace.

Applied to files:

mllm/backends/ascend/AscendDispatcher.cpp
mllm/backends/ascend/memory/AscendMemoryPool.hpp

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.

Applied to files:

mllm/backends/ascend/AscendDispatcher.cpp

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).

Applied to files:

mllm/backends/ascend/AscendCommon.cpp

🧬 Code graph analysis (4)

mllm/backends/ascend/AscendDispatcher.cpp (3)

mllm/backends/ascend/AscendDispatcher.hpp (4)

AscendDispatcher (21-22)

task (24-24)

task (26-26)

task (28-28)

mllm/engine/DispatcherManager.hpp (1)

thread_pool_ (37-37)

mllm/backends/ascend/AscendCommon.hpp (1)

instance (57-60)

mllm/backends/ascend/AscendAllocator.cpp (3)

mllm/backends/ascend/AscendAllocator.hpp (2)

AscendAllocator (17-17)

AscendAllocator (18-18)

mllm/backends/ascend/memory/AscendMemoryManager.cpp (2)

getAscendMemoryManager (15-18)

getAscendMemoryManager (15-15)

mllm/backends/ascend/memory/AscendMemoryPool.hpp (2)

block_id (22-22)

block_id (24-24)

mllm/backends/ascend/AscendCommon.cpp (3)

mllm/backends/ascend/AscendCommon.hpp (10)

AscendTensorHandle (70-70)

AscendTensorHandle (71-71)

AscendTensorHandle (72-72)

AscendTensorHandle (74-74)

AscendTensorHandle (76-76)

other (77-77)

tensor_ (82-82)

tensor_ (83-83)

block_id_ (80-80)

block_id_ (84-84)

mllm/backends/ascend/memory/AscendMemoryPool.hpp (3)

block_id (22-22)

block_id (24-24)

size (20-20)

mllm/backends/ascend/memory/AscendMemoryManager.cpp (2)

getAscendMemoryManager (15-18)

getAscendMemoryManager (15-15)

mllm/backends/ascend/memory/AscendMemoryPool.hpp (2)

mllm/backends/ascend/memory/AscendMemoryManager.hpp (4)

pool_size (18-18)

size (24-24)

block_id (26-26)

block_id (28-28)

mllm/backends/ascend/memory/AscendMemoryPool.cpp (2)

AscendMemoryPool (14-26)

AscendMemoryPool (28-36)

🪛 Clang (14.0.6)

examples/ascend_add_demo/main.cpp

[error] 98-98: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 111-111: if with identical then and else branches

(bugprone-branch-clone,-warnings-as-errors)

mllm/backends/ascend/memory/AscendMemoryManager.cpp

[error] 20-20: method 'createMemoryPool' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

[error] 22-22: variable 'device_count' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 37-37: variable 'device_id' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 45-45: variable 'device_id' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

mllm/backends/ascend/AscendCommon.cpp

[error] 50-50: 3 adjacent parameters of 'prepareAscendTensor' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 53-53: variable 'expected_elements' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 64-64: variable 'device_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 65-65: variable 'bytes' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 80-80: variable 'ctx' provides global access to a non-const object; consider making the pointed-to data 'const'

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 116-116: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/ascend/AscendDispatcher.hpp

[error] 13-13: constructor does not initialize these fields: MLLM_EMPTY_SCOPE

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

mllm/backends/ascend/memory/AscendMemoryPool.hpp

[error] 15-15: class 'AscendMemoryPool' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

🔇 Additional comments (13)

examples/ascend_add_demo/main.cpp (6)

1-12: LGTM!

The includes are appropriate for an Ascend backend demo, covering ACL operations, tensor management, context execution, and elementwise operations.

17-20: LGTM!

Ascend backend initialization is properly sequenced and wrapped in exception handling.

43-48: LGTM!

Tensor preparation correctly uses the prepareAscendTensor helper and obtains tensor references for subsequent operations.

60-80: LGTM with note on resource management.

The result copying and fp16-to-float conversion is correctly implemented. The error path (lines 68-73) releases x_handle and y_handle but not z_ascend, which appears intentional assuming the Tensor class uses RAII for automatic cleanup.

111-115: LGTM!

The pass/fail reporting logic is correct. Note that the static analysis hint claiming "identical then and else branches" at line 111 is a false positive—the branches clearly differ (PASSED vs FAILED messages).

117-125: LGTM!

Resource cleanup and exception handling are properly implemented. The explicit release of tensor handles before return and the catch block for reporting errors are appropriate.

mllm/backends/ascend/AscendCommon.cpp (2)

19-48: LGTM - AscendTensorHandle RAII implementation.

The move semantics are correctly implemented with proper resource cleanup in release(). The self-assignment check in move assignment and invalidation of the source handle are good practices.

79-101: Thread-safe initialization pattern looks good.

The use of std::once_flag with std::call_once ensures thread-safe one-time initialization of the global ATB context and stream. This addresses the previous review concern about race conditions.

mllm/backends/ascend/AscendAllocator.cpp (1)

23-47: Memory pool allocation implementation looks correct.

The allocation flow properly:

Allocates a block from the memory manager

Gets the pointer for the block

Handles failures by freeing allocated blocks

Tracks the mapping from pointer to block ID

The mutex protection for the map access ensures thread safety.

mllm/backends/ascend/memory/AscendMemoryPool.hpp (1)

20-24: Memory pool API design looks appropriate.

The interface provides the essential block management operations: allocateBlock, freeBlock, and getBlockPtr. Using output parameters for block_id and addr aligns with the C-style ACL APIs being wrapped.

mllm/backends/ascend/memory/AscendMemoryManager.cpp (3)

13-18: LGTM! Singleton pattern correctly implemented.

The function-local static singleton pattern is thread-safe per C++11 and addresses the past review feedback about avoiding file-scope globals.

35-41: LGTM! Proper error handling and defensive initialization.

The function correctly initializes device_id to -1 and validates the ACL call result.

52-65: LGTM! Clean delegation pattern.

The block management methods appropriately delegate to the per-device memory pool.

examples/ascend_add_demo/main.cpp

coderabbitai · 2025-12-13T09:47:09Z

mllm/backends/ascend/AscendCommon.cpp

+  }
+}
+
+void fillAtbTensorDesc(const Tensor& t, atb::TensorDesc& desc) {


🛠️ Refactor suggestion | 🟠 Major

Rename short parameter t to satisfy identifier length requirements.

The parameter name t is too short per readability-identifier-length guidelines. Rename to a descriptive identifier.

-void fillAtbTensorDesc(const Tensor& t, atb::TensorDesc& desc) { +void fillAtbTensorDesc(const Tensor& tensor, atb::TensorDesc& desc) { desc.dtype = ACL_FLOAT16; // Currently hardcoded as per demo, can be expanded later desc.format = ACL_FORMAT_ND; - auto shape = t.shape(); + auto shape = tensor.shape();

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Clang (14.0.6)

[error] 116-116: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

🤖 Prompt for AI Agents

In mllm/backends/ascend/AscendCommon.cpp around line 116, the function parameter named `t` is too short; rename it to a descriptive identifier such as `tensor` (or `inputTensor`) and update all references inside the function body to the new name; also adjust the function declaration/signature in any corresponding header or other translation units to match the new parameter name to satisfy the readability-identifier-length rule.

coderabbitai · 2025-12-13T09:47:09Z

mllm/backends/ascend/memory/AscendMemoryManager.cpp

+    MLLM_ACL_CHECK(ret);
+    for (size_t i = 0; i < device_count; i++) {
+
+        aclrtSetDevice(i);


⚠️ Potential issue | 🔴 Critical

Check the return value of aclrtSetDevice.

If aclrtSetDevice fails, the memory pool will be created with an incorrect device context, leading to device mismatch errors in subsequent operations.

Apply this diff to check the return value:

- aclrtSetDevice(i); + ret = aclrtSetDevice(i); + MLLM_ACL_CHECK(ret);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

aclrtSetDevice(i);

ret = aclrtSetDevice(i);

MLLM_ACL_CHECK(ret);

🤖 Prompt for AI Agents

In mllm/backends/ascend/memory/AscendMemoryManager.cpp around line 27, the call to aclrtSetDevice(i) currently ignores its return value; modify the code to check the returned aclError (or appropriate status type), handle non-success results by logging an error and returning or throwing (depending on existing error handling patterns) before proceeding to create the memory pool, and ensure the device is only used when aclrtSetDevice reports success to avoid device-context mismatches.

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

examples/ascend_add_demo/main.cpp (1)

52-52: Remove debug print statement.

This debug print "context over" should be removed before merging.

🧹 Nitpick comments (5)

examples/ascend_add_demo/main.cpp (1)

60-88: Fix inconsistent indentation.

Lines 60-88 use 2-space indentation while the surrounding code in the same try block uses 4-space indentation. Align all code within the try block to consistent 4-space indentation.

mllm/backends/ascend/AscendCommon.hpp (4)

19-28: Parenthesize macro arguments for safety.

The macro arguments should be enclosed in parentheses to prevent operator precedence issues when complex expressions are passed.

 #define MLLM_ACL_CHECK(err)                                                                                    \
-  if (err != ACL_SUCCESS) {                                                                                    \
+  if ((err) != ACL_SUCCESS) {                                                                                  \
     MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "ACL error code {}: {}", int(err), aclGetRecentErrMsg()); \
   }
 
 #define MLLM_ATB_CHECK(err)                                                                                    \
-  if (err != atb::NO_ERROR) {                                                                                  \
+  if ((err) != atb::NO_ERROR) {                                                                                \
     MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "ATB error code {}", int(err));                           \
   }

45-51: Consider default-initializing struct members.

AscendDeviceInfo members are uninitialized by default. Adding default values prevents undefined behavior if a field is not explicitly set.

 struct AscendDeviceInfo {
   std::string name;
-  unsigned int id;
-  size_t total_memory;  // bytes
-  size_t free_memory;   // bytes
+  unsigned int id{0};
+  size_t total_memory{0};  // bytes
+  size_t free_memory{0};   // bytes
   std::string soc_version;
 };

81-85: Consider adding [[nodiscard]] to accessor functions.

Functions like valid(), tensor(), and blockId() return values that should typically not be ignored.

-  void release();  // Manually release memory block and invalidate handle
-  bool valid() const { return block_id_ >= 0; }  // Check if handle owns a valid memory block
+  void release();
+  [[nodiscard]] bool valid() const { return block_id_ >= 0; }
 
-  Tensor& tensor() { return tensor_; }  // Access tensor
-  const Tensor& tensor() const { return tensor_; }  // Access tensor (const)
-  int blockId() const { return block_id_; }  // Get memory block ID
+  [[nodiscard]] Tensor& tensor() { return tensor_; }
+  [[nodiscard]] const Tensor& tensor() const { return tensor_; }
+  [[nodiscard]] int blockId() const { return block_id_; }

117-127: Delete copy/move operations for AscendTimer.

Since AscendTimer is an RAII timer that manages timing state, it should not be copied or moved. Explicitly delete these operations to prevent misuse.

 class AscendTimer {
  public:
   explicit AscendTimer(const char* tag, bool sync_before = true, bool sync_after = true);
   ~AscendTimer();
 
+  AscendTimer(const AscendTimer&) = delete;
+  AscendTimer& operator=(const AscendTimer&) = delete;
+  AscendTimer(AscendTimer&&) = delete;
+  AscendTimer& operator=(AscendTimer&&) = delete;
+
  private:

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b37bb5 and 939a216.

📒 Files selected for processing (5)

docs/ascend_backend/core_design.rst (1 hunks)
examples/ascend_add_demo/main.cpp (1 hunks)
mllm/backends/ascend/AscendCommon.cpp (1 hunks)
mllm/backends/ascend/AscendCommon.hpp (1 hunks)
mllm/backends/ascend/ops/AscnedElewiseOps.cpp (1 hunks)

✅ Files skipped from review due to trivial changes (1)

docs/ascend_backend/core_design.rst

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/ascend/ops/AscnedElewiseOps.cpp
mllm/backends/ascend/AscendCommon.cpp
mllm/backends/ascend/AscendCommon.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/ascend/ops/AscnedElewiseOps.cpp
mllm/backends/ascend/AscendCommon.cpp
mllm/backends/ascend/AscendCommon.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/ascend/ops/AscnedElewiseOps.cpp
mllm/backends/ascend/AscendCommon.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/ascend/ops/AscnedElewiseOps.cpp
mllm/backends/ascend/AscendCommon.cpp
mllm/backends/ascend/AscendCommon.hpp

🧬 Code graph analysis (4)

mllm/backends/ascend/ops/AscnedElewiseOps.cpp (1)

mllm/backends/ascend/AscendCommon.cpp (6)

getGlobalAtbContext (173-195)

getGlobalAtbContext (173-173)

fillAtbTensorDesc (210-219)

fillAtbTensorDesc (210-210)

syncGlobalAtbStream (202-208)

syncGlobalAtbStream (202-202)

mllm/backends/ascend/AscendCommon.cpp (3)

mllm/backends/ascend/AscendCommon.hpp (12)

AscendTensorHandle (71-71)

AscendTensorHandle (72-72)

AscendTensorHandle (73-73)

AscendTensorHandle (75-75)

AscendTensorHandle (77-77)

other (78-78)

tensor_ (83-83)

tensor_ (84-84)

block_id_ (81-81)

block_id_ (85-85)

AscendDeviceMetaInfo (56-56)

AscendDeviceMetaInfo (63-63)

mllm/backends/ascend/memory/AscendMemoryPool.hpp (3)

block_id (22-22)

block_id (24-24)

size (20-20)

mllm/backends/ascend/memory/AscendMemoryManager.cpp (2)

getAscendMemoryManager (15-18)

getAscendMemoryManager (15-15)

mllm/backends/ascend/AscendCommon.hpp (1)

mllm/backends/ascend/AscendCommon.cpp (7)

AscendTensorHandle (23-24)

AscendTensorHandle (26-26)

AscendTensorHandle (28-31)

other (33-41)

other (33-33)

AscendTimer (154-162)

AscendTimer (164-171)

examples/ascend_add_demo/main.cpp (3)

mllm/backends/ascend/AscendCommon.cpp (6)

prepareAscendTensor (54-81)

prepareAscendTensor (54-56)

verifyAscendTensor (103-142)

verifyAscendTensor (103-108)

verifyAscendTensor (144-152)

verifyAscendTensor (144-149)

mllm/backends/ascend/AscendCommon.hpp (3)

prepareAscendTensor (93-95)

verifyAscendTensor (101-106)

verifyAscendTensor (109-114)

mllm/backends/qnn/op/QNNElewiseOp.hpp (1)

kAdd (36-38)

🪛 Clang (14.0.6)

mllm/backends/ascend/AscendCommon.cpp

[error] 54-54: 3 adjacent parameters of 'prepareAscendTensor' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 57-57: variable 'expected_elements' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 68-68: variable 'device_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 69-69: variable 'bytes' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 87-87: variable 'elem_cnt' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 103-103: 3 adjacent parameters of 'verifyAscendTensor' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 103-103: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 122-122: variable name 'ok' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 144-144: 3 adjacent parameters of 'verifyAscendTensor' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 144-144: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 174-174: variable 'ctx' provides global access to a non-const object; consider making the pointed-to data 'const'

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 210-210: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/ascend/AscendCommon.hpp

[error] 19-19: function-like macro 'MLLM_ACL_CHECK' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

[error] 20-20: macro argument should be enclosed in parentheses

(bugprone-macro-parentheses,-warnings-as-errors)

[error] 25-25: function-like macro 'MLLM_ATB_CHECK' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

[error] 26-26: macro argument should be enclosed in parentheses

(bugprone-macro-parentheses,-warnings-as-errors)

[error] 42-42: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 45-45: constructor does not initialize these fields: name, id, total_memory, free_memory, soc_version

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 54-54: class 'AscendDeviceMetaInfo' defines a copy constructor and a copy assignment operator but does not define a destructor, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

[error] 70-70: constructor does not initialize these fields: tensor_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 81-81: function 'valid' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)

[error] 84-84: function 'tensor' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)

[error] 85-85: function 'blockId' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)

[error] 101-101: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 103-103: 1e-2f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 103-103: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 104-104: 1e-2f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 104-104: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 109-109: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 111-111: 1e-2f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 111-111: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 112-112: 1e-2f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 112-112: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 117-117: class 'AscendTimer' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

[error] 130-130: function-like macro 'ASCEND_TIME_SCOPE' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

[error] 131-131: function-like macro 'ASCEND_TIME_SCOPE_NOSYNC' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

examples/ascend_add_demo/main.cpp

[error] 62-62: variable 'correct' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 84-84: if with identical then and else branches

(bugprone-branch-clone,-warnings-as-errors)

🔇 Additional comments (6)

examples/ascend_add_demo/main.cpp (1)

14-99: Demo structure looks good.

The demo properly initializes the backend, prepares test data, executes the Add operation, verifies results with tolerances, and handles errors via try/catch. The use of AscendTensorHandle RAII for resource management is appropriate.

mllm/backends/ascend/ops/AscnedElewiseOps.cpp (1)

37-118: Forward implementation looks correct.

The ATB workflow follows the expected pattern: create operation, setup tensors, configure VariantPack, setup workspace, execute, synchronize, and cleanup. Error handling is comprehensive with appropriate MLLM_ERROR_EXIT calls.

mllm/backends/ascend/AscendCommon.cpp (3)

23-52: AscendTensorHandle RAII implementation is correct.

Move semantics properly transfer ownership by resetting block_id_ to -1 in the source. The release() method correctly handles both managed blocks (via memory manager) and unmanaged tensors.

54-81: Note: Returned handle has block_id = -1.

This is intentional since memory is allocated via Tensor::empty/alloc rather than the memory manager. The release() method correctly handles this case by calling tensor.delete_().

173-195: Thread-safe ATB context initialization is correct.

Using std::call_once ensures the context is initialized exactly once, even under concurrent access. The initialization sequence (set device → create context → create stream → set execute stream) follows ATB requirements.

mllm/backends/ascend/AscendCommon.hpp (1)

1-133: Header structure and API documentation are good.

The header provides clear documentation for each function and class. The namespace organization and include structure are appropriate for a backend utility header.

coderabbitai · 2025-12-16T08:48:30Z

mllm/backends/ascend/AscendCommon.cpp

+AscendDeviceMetaInfo::AscendDeviceMetaInfo() {
+#ifndef ASCENDC_CPU_DEBUG
+  // Initialize ACL to query devices
+  auto ret = aclInit(nullptr);
+  if (ret != ACL_SUCCESS) {
+    MLLM_ERROR("Failed to initialize ACL for device enumeration: {}", ret);
+    return;
+  }
+
+  // Get device count
+  uint32_t device_count = 0;
+  ret = aclrtGetDeviceCount(&device_count);
+  if (ret != ACL_SUCCESS) {
+    MLLM_ERROR("Failed to get Ascend device count: {}", ret);
+    aclFinalize();
+    return;
+  }
+
+  // Collect info for each device
+  for (uint32_t i = 0; i < device_count; ++i) {
+    AscendDeviceInfo info;
+    info.id = i;
+    info.name = "Ascend Device " + std::to_string(i);
+
+    // Set device to query its properties
+    ret = aclrtSetDevice(i);
+    if (ret == ACL_SUCCESS) {
+      // Get memory information
+      size_t free_mem = 0, total_mem = 0;
+      ret = aclrtGetMemInfo(ACL_HBM_MEM, &free_mem, &total_mem);
+      if (ret == ACL_SUCCESS) {
+        info.total_memory = total_mem;
+        info.free_memory = free_mem;
+      } else {
+        info.total_memory = 0;
+        info.free_memory = 0;
+      }
+
+      // SOC version - platform specific, set to unknown for now
+      info.soc_version = "Unknown";
+    } else {
+      info.total_memory = 0;
+      info.free_memory = 0;
+      info.soc_version = "Unknown";
+    }
+
+    devices.push_back(info);
+  }
+
+  // Finalize ACL after enumeration
+  aclFinalize();
+#else
+  // In CPU debug mode, add a dummy device
+  AscendDeviceInfo info;
+  info.id = 0;
+  info.name = "Ascend CPU Debug Device";
+  info.total_memory = 0;
+  info.free_memory = 0;
+  info.soc_version = "CPU_DEBUG";
+  devices.push_back(info);
+#endif
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for AscendDeviceMetaInfo and getGlobalAtbContext usage rg -n 'AscendDeviceMetaInfo::instance|getGlobalAtbContext' --type=cpp

Repository: UbiquitousLearning/mllm

Length of output: 513

🏁 Script executed:

#!/bin/bash # First, locate the AscendCommon files and understand the structure find . -name "*.cpp" -o -name "*.h" | grep -i ascend | head -20

Repository: UbiquitousLearning/mllm

Length of output: 502

🏁 Script executed:

#!/bin/bash # Search for usage of both functions rg -n 'AscendDeviceMetaInfo|getGlobalAtbContext' --type cpp --type hpp

Repository: UbiquitousLearning/mllm

Length of output: 94

🏁 Script executed:

#!/bin/bash # Look at the file structure and find relevant files ls -la mllm/backends/ascend/ 2>/dev/null || echo "Path not found, searching..." find . -path "*/ascend/*" -name "*.cpp" -o -path "*/ascend/*" -name "*.h" | head -20

Repository: UbiquitousLearning/mllm

Length of output: 1585

🏁 Script executed:

#!/bin/bash # Get the file content to understand the context better find . -type f $ -name "*.cpp" -o -name "*.h" $ -path "*ascend*" 2>/dev/null | head -5

Repository: UbiquitousLearning/mllm

Length of output: 269

🏁 Script executed:

#!/bin/bash # Search for the usage pattern rg 'AscendDeviceMetaInfo::instance' . --type cpp --type hpp -A 3 2>/dev/null | head -40

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

#!/bin/bash # Search for getGlobalAtbContext rg 'getGlobalAtbContext' . --type cpp --type hpp -A 3 2>/dev/null | head -40

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

#!/bin/bash # Check where aclFinalize is called rg 'aclFinalize' . --type cpp --type hpp -B 2 -A 2 2>/dev/null

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

#!/bin/bash stat mllm/backends/ascend/AscendCommon.cpp 2>/dev/null && echo "File exists" || echo "File not found" ls -la mllm/backends/ascend/ 2>/dev/null | head -10

Repository: UbiquitousLearning/mllm

Length of output: 667

🏁 Script executed:

#!/bin/bash # Read AscendCommon.cpp to see the full context of both functions cat -n mllm/backends/ascend/AscendCommon.cpp | sed -n '150,290p'

Repository: UbiquitousLearning/mllm

Length of output: 4570

🏁 Script executed:

#!/bin/bash # Read AscendBackend.cpp to see the usage pattern cat -n mllm/backends/ascend/AscendBackend.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1239

🏁 Script executed:

#!/bin/bash # Look at the getGlobalAtbContext implementation more closely rg -A 30 'atb::Context\* getGlobalAtbContext' mllm/backends/ascend/AscendCommon.cpp

Repository: UbiquitousLearning/mllm

Length of output: 783

🏁 Script executed:

#!/bin/bash # Also check if aclInit and aclFinalize are called anywhere else rg 'aclInit|aclFinalize|aclrtSetDevice' mllm/backends/ascend/ --type cpp

Repository: UbiquitousLearning/mllm

Length of output: 628

Remove aclFinalize() call from AscendDeviceMetaInfo constructor or defer it until backend shutdown.

Calling aclFinalize() at line 271 after device enumeration terminates the ACL runtime. Since getGlobalAtbContext() (used later in operations) requires aclrtSetDevice() to function, subsequent ACL operations will fail if invoked after this destructor. Either:

Remove the aclFinalize() call and let backend shutdown handle cleanup

Ensure getGlobalAtbContext() is initialized before device enumeration

🤖 Prompt for AI Agents

In mllm/backends/ascend/AscendCommon.cpp around lines 221 to 282, the constructor calls aclFinalize() after enumerating devices which shuts down the ACL runtime and breaks subsequent ACL calls (e.g., aclrtSetDevice()/getGlobalAtbContext()); remove the aclFinalize() call from this constructor (do not finalize ACL here) and instead ensure ACL is finalized once during backend shutdown/cleanup; update error paths to avoid finalizing ACL prematurely and document that ACL initialization lifetime spans device enumeration through backend shutdown.

coderabbitai · 2025-12-16T08:48:30Z

mllm/backends/ascend/AscendCommon.hpp

+#define ASCEND_TIME_SCOPE(tag) ::mllm::ascend::AscendTimer timer_scope_##__LINE__(tag, true, true)
+#define ASCEND_TIME_SCOPE_NOSYNC(tag) ::mllm::ascend::AscendTimer timer_scope_##__LINE__(tag, false, false)


⚠️ Potential issue | 🟡 Minor

Fix __LINE__ concatenation in macros.

__LINE__ requires a two-level macro expansion to work correctly in token concatenation. Currently, the variable name will literally be timer_scope___LINE__ rather than timer_scope_123.

+#define ASCEND_TIME_SCOPE_CONCAT_INNER(a, b) a##b +#define ASCEND_TIME_SCOPE_CONCAT(a, b) ASCEND_TIME_SCOPE_CONCAT_INNER(a, b) -#define ASCEND_TIME_SCOPE(tag) ::mllm::ascend::AscendTimer timer_scope_##__LINE__(tag, true, true) -#define ASCEND_TIME_SCOPE_NOSYNC(tag) ::mllm::ascend::AscendTimer timer_scope_##__LINE__(tag, false, false) +#define ASCEND_TIME_SCOPE(tag) ::mllm::ascend::AscendTimer ASCEND_TIME_SCOPE_CONCAT(timer_scope_, __LINE__)(tag, true, true) +#define ASCEND_TIME_SCOPE_NOSYNC(tag) ::mllm::ascend::AscendTimer ASCEND_TIME_SCOPE_CONCAT(timer_scope_, __LINE__)(tag, false, false)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#define ASCEND_TIME_SCOPE(tag) ::mllm::ascend::AscendTimer timer_scope_##__LINE__(tag, true, true)

#define ASCEND_TIME_SCOPE_NOSYNC(tag) ::mllm::ascend::AscendTimer timer_scope_##__LINE__(tag, false, false)

#define ASCEND_TIME_SCOPE_CONCAT_INNER(a, b) a##b

#define ASCEND_TIME_SCOPE_CONCAT(a, b) ASCEND_TIME_SCOPE_CONCAT_INNER(a, b)

#define ASCEND_TIME_SCOPE(tag) ::mllm::ascend::AscendTimer ASCEND_TIME_SCOPE_CONCAT(timer_scope_, __LINE__)(tag, true, true)

#define ASCEND_TIME_SCOPE_NOSYNC(tag) ::mllm::ascend::AscendTimer ASCEND_TIME_SCOPE_CONCAT(timer_scope_, __LINE__)(tag, false, false)

🧰 Tools

🪛 Clang (14.0.6)

[error] 130-130: function-like macro 'ASCEND_TIME_SCOPE' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

[error] 131-131: function-like macro 'ASCEND_TIME_SCOPE_NOSYNC' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

🤖 Prompt for AI Agents

In mllm/backends/ascend/AscendCommon.hpp around lines 130-131, the macros use token concatenation with __LINE__ directly which prevents proper expansion; change them to use a two-level helper macro (e.g., a CONCAT/CONCAT_INNER pair) so __LINE__ is expanded before pasting, then use the outer concatenation macro to build the variable name (e.g., CONCAT(timer_scope_, __LINE__)) and keep the rest of the macro arguments the same; update both ASCEND_TIME_SCOPE and ASCEND_TIME_SCOPE_NOSYNC to call the outer helper to ensure the variable becomes timer_scope_<lineNumber>.

coderabbitai · 2025-12-16T08:48:31Z

mllm/backends/ascend/ops/AscendElewiseOps.cpp

+// Copyright (c) MLLM Team.
+// Licensed under the MIT License.
+


⚠️ Potential issue | 🟡 Minor

Filename typo: "AscnedElewiseOps.cpp" should be "AscendElewiseOps.cpp".

The filename has a typo - "Ascned" instead of "Ascend". This should be corrected to maintain consistency with the header file name AscendElewiseOps.hpp.

🤖 Prompt for AI Agents

In mllm/backends/ascend/ops/AscnedElewiseOps.cpp lines 1-3, the filename contains a typo ("AscnedElewiseOps.cpp"); rename the file to AscendElewiseOps.cpp to match the header AscendElewiseOps.hpp and the project naming, then update all references (includes, CMakeLists.txt / build files, source lists, and any #include or forward declarations) to use the corrected filename; ensure any case-sensitive platforms and CI are accounted for and run a build to verify there are no remaining broken references.

mllm/backends/ascend/ops/AscendElewiseOps.cpp

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

mllm/backends/ascend/ops/AscnedElewiseOps.cpp (1)
95-108: Workspace memory leak on error path.

If op->Execute() fails at line 107, the workspace allocated at line 99 is never freed because MLLM_ERROR_EXIT prevents reaching the cleanup code at lines 113-116. This is the same issue flagged in the previous review.

Use RAII or a scope guard pattern to ensure workspace is freed even on error:
struct WorkspaceGuard {
  AscendMemoryManager& mem_mgr_;
  int block_id_;
  WorkspaceGuard(AscendMemoryManager& mgr, int id) : mem_mgr_(mgr), block_id_(id) {}
  ~WorkspaceGuard() { 
    if (block_id_ != -1) mem_mgr_.freeBlock(block_id_); 
  }
  WorkspaceGuard(const WorkspaceGuard&) = delete;
  WorkspaceGuard& operator=(const WorkspaceGuard&) = delete;
};
Then use:
void* workspace = nullptr;
int workspace_block_id = -1;
if (workspaceSize > 0) {
  auto& mem_mgr = getAscendMemoryManager();
  mem_mgr.allocateBlock(static_cast<uint32_t>(workspaceSize), workspace_block_id);
  mem_mgr.getBlockPtr(workspace_block_id, workspace);
}
WorkspaceGuard ws_guard(getAscendMemoryManager(), workspace_block_id);  // Automatic cleanup

🧹 Nitpick comments (1)

mllm/backends/ascend/ops/AscnedElewiseOps.cpp (1)
22-36: Remove commented-out code.

The commented-out allocation code (lines 23-34) should be removed entirely rather than left in the source. If this code serves as documentation of a previous approach, consider adding a brief comment explaining why BaseOp::setup() is used instead, or move the explanation to documentation.

Apply this diff to clean up:
 void AscendAddOp::setup(const std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) {
-  // for (auto& t : outputs) {
-  //   if (!t.isNil()) {
-  //     auto& mem_mgr = getAscendMemoryManager();
-  //     int block_id = -1;
-  //     void* device_ptr = nullptr;
-      
-  //     mem_mgr.allocateBlock(static_cast<uint32_t>(t.bytes()), block_id);
-  //     mem_mgr.getBlockPtr(block_id, device_ptr);
-      
-  //     t.impl()->storage()->ptr_ = device_ptr;
-  //   }
-  // }
   BaseOp::setup(inputs, outputs);
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 939a216 and b487ef4.

📒 Files selected for processing (1)

mllm/backends/ascend/ops/AscnedElewiseOps.cpp (1 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/ascend/ops/AscnedElewiseOps.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/ascend/ops/AscnedElewiseOps.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/ascend/ops/AscnedElewiseOps.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/ascend/ops/AscnedElewiseOps.cpp

🧬 Code graph analysis (1)

mllm/backends/ascend/ops/AscnedElewiseOps.cpp (2)

mllm/backends/ascend/AscendCommon.hpp (3)

getGlobalAtbContext (33-33)

fillAtbTensorDesc (42-42)

syncGlobalAtbStream (39-39)

mllm/backends/ascend/AscendCommon.cpp (6)

getGlobalAtbContext (173-195)

getGlobalAtbContext (173-173)

fillAtbTensorDesc (210-219)

fillAtbTensorDesc (210-210)

syncGlobalAtbStream (202-208)

syncGlobalAtbStream (202-202)

🔇 Additional comments (4)

mllm/backends/ascend/ops/AscnedElewiseOps.cpp (4)

1-16: LGTM: Copyright header and includes are appropriate.

The MIT license header follows project conventions, and the included headers (ATB/ACL libraries, memory manager, common utilities) are appropriate for implementing the Ascend Add operation.

18-21: LGTM: Constructor correctly delegates to base class.

The constructor properly forwards the options to the base aops::AddOp class.

38-51: LGTM: Input validation is thorough and appropriate.

The validation checks ensure:

Correct number of inputs/outputs

Dtype consistency across tensors

No-broadcast constraint for this initial implementation

The use of NYI for unsupported features clearly communicates current limitations.

110-119: Cleanup logic is correct but vulnerable to error paths.

The cleanup sequence (sync stream, free workspace, destroy operation) is correct. However, as flagged in previous comments, this cleanup is bypassed if MLLM_ERROR_EXIT is called earlier. Once RAII guards are added for workspace and operation, this code can remain as the happy-path cleanup with the guards handling error paths.

coderabbitai · 2025-12-16T09:57:14Z

mllm/backends/ascend/ops/AscendElewiseOps.cpp

+  atb::infer::ElewiseParam addParam;
+  addParam.elewiseType = atb::infer::ElewiseParam::ELEWISE_ADD;
+
+  atb::Operation* op = nullptr;
+  auto st = atb::CreateOperation(addParam, &op);
+  if (st != atb::NO_ERROR || op == nullptr) {
+    MLLM_ERROR_EXIT(ExitCode::kAscendError, "ATB CreateOperation(ELEWISE_ADD) failed, status={}", static_cast<int>(st));
+  }
+
+  atb::Context* atb_ctx = getGlobalAtbContext();
+
+  atb::Tensor atb_x;
+  atb::Tensor atb_y;
+  atb::Tensor atb_z;
+
+  fillAtbTensorDesc(x, atb_x.desc);
+  fillAtbTensorDesc(y, atb_y.desc);
+  fillAtbTensorDesc(z, atb_z.desc);
+
+  atb_x.deviceData = reinterpret_cast<uint8_t*>(x.ptr<void>());
+  atb_x.dataSize = x.bytes();
+  atb_y.deviceData = reinterpret_cast<uint8_t*>(y.ptr<void>());
+  atb_y.dataSize = y.bytes();
+  atb_z.deviceData = reinterpret_cast<uint8_t*>(z.ptr<void>());
+  atb_z.dataSize = z.bytes();
+
+  atb::SVector<atb::Tensor> inTensors;
+  atb::SVector<atb::Tensor> outTensors;
+  inTensors.push_back(atb_x);
+  inTensors.push_back(atb_y);
+  outTensors.push_back(atb_z);
+
+  atb::VariantPack vp;
+  vp.inTensors = inTensors;
+  vp.outTensors = outTensors;
+
+  uint64_t workspaceSize = 0;
+  st = op->Setup(vp, workspaceSize, atb_ctx);
+  if (st != atb::NO_ERROR) {
+    MLLM_ERROR_EXIT(ExitCode::kAscendError, "ATB AddOp Setup failed, status={}", static_cast<int>(st));
+  }


⚠️ Potential issue | 🔴 Critical

Resource leak: ATB operation not destroyed on error.

If op->Setup() fails at line 92, MLLM_ERROR_EXIT is called before reaching atb::DestroyOperation(op) at line 118. The ATB operation is leaked.

Consider using RAII or a scope guard to ensure cleanup:

// Add at the top of forward() or create a helper class struct AtbOperationGuard { atb::Operation* op_; explicit AtbOperationGuard(atb::Operation* op) : op_(op) {} ~AtbOperationGuard() { if (op_) atb::DestroyOperation(op_); } AtbOperationGuard(const AtbOperationGuard&) = delete; AtbOperationGuard& operator=(const AtbOperationGuard&) = delete; };

Then use:

auto st = atb::CreateOperation(addParam, &op); if (st != atb::NO_ERROR || op == nullptr) { MLLM_ERROR_EXIT(ExitCode::kAscendError, "ATB CreateOperation(ELEWISE_ADD) failed, status={}", static_cast<int>(st)); } AtbOperationGuard op_guard(op); // Automatic cleanup

🤖 Prompt for AI Agents

In mllm/backends/ascend/ops/AscnedElewiseOps.cpp around lines 53-93, the ATB Operation created by atb::CreateOperation is not cleaned up if op->Setup(...) fails, causing a resource leak; after successful CreateOperation (and null check) instantiate a small RAII guard or scope-guard object that holds the atb::Operation* and calls atb::DestroyOperation in its destructor (delete copy/move as appropriate), and place this guard immediately after the success path so any early returns or error exits (like MLLM_ERROR_EXIT) will still destroy the operation; ensure the guard only runs when op is non-null and that the normal end-of-function cleanup does not double-destroy (e.g., release or reset the guard before the existing explicit DestroyOperation call).

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

mllm/backends/ascend/AscendCommon.cpp (1)
220-281: Critical: aclFinalize() call breaks subsequent ACL operations.

Calling aclFinalize() at line 270 terminates the ACL runtime immediately after device enumeration. Since getGlobalAtbContext() (used by operators) requires a valid ACL runtime and calls aclrtSetDevice(), any subsequent ACL operations will fail after this constructor completes.

This issue was previously flagged in past review comments and remains unresolved.

Recommended solution:
Remove the aclFinalize() call from this constructor and defer ACL finalization to backend shutdown or application exit. ACL should remain initialized for the lifetime of the backend.
🔎 Proposed fix
     devices.push_back(info);
   }
 
-  // Finalize ACL after enumeration
-  aclFinalize();
+  // Note: ACL should remain initialized for the lifetime of the backend.
+  // Finalization will be handled during backend shutdown.
 #else
   // In CPU debug mode, add a dummy device

🧹 Nitpick comments (7)

docs/ascend_backend/index.rst (1)
8-9: Consider removing extra trailing empty line.

The file ends with two empty lines. Per coding guidelines, it's recommended to have only a single trailing newline.
🔎 Proposed fix
 
   core_design
-
-
+
tests/ascend/KernelTestHelper.hpp (1)
7-17: Consider removing empty SetUp/TearDown methods if unused.

The SetUp() and TearDown() methods are currently empty. If no per-test setup or cleanup is needed, these methods can be removed entirely, as GoogleTest provides default implementations.
🔎 Proposed simplification
 class KernelTest : public testing::Test {
  public:
   KernelTest() = default;
   ~KernelTest() override = default;
-
-  // If the constructor and destructor are not enough for setting up
-  // and cleaning up each test, you can define the following methods:
-  void SetUp() override {}
-
-  void TearDown() override {}
 };
Note: The static analysis warning about special member functions is a false positive for test fixtures and can be safely ignored.
tests/CMakeLists.txt (1)

14-16: Consider consistency with other backend patterns.

The Ascend backend uses an environment variable check (ENV{ASCEND_HOME_PATH}), while the CUDA backend uses a CMake option (MLLM_BUILD_CUDA_BACKEND). This may be intentional due to SDK location requirements, but consider whether a CMake option with a separate ASCEND_HOME_PATH variable would provide more consistent build configuration.
mllm/backends/ascend/ops/AscendX2XOp.cpp (2)
34-44: Simplify error handling by removing redundant check.

The if (ret != ACL_SUCCESS) check before MLLM_ACL_CHECK(ret) is redundant. The macro should handle error checking internally, so you can call it directly after the memcpy operation.
🔎 Proposed simplification
     auto ret = aclrtMemcpy(
         dst_data, data_size,
         src_data, data_size,
         ACL_MEMCPY_HOST_TO_DEVICE);
-
-    if (ret != ACL_SUCCESS) {
-      MLLM_ACL_CHECK(ret);
-    }
+    MLLM_ACL_CHECK(ret);
 
     syncGlobalAtbStream();
Apply the same simplification to lines 54-61 and 74-81 for consistency.
73-85: Skip synchronization when source and destination pointers are identical.

When src_data == dst_data, the memcpy is correctly skipped, but the function still calls syncGlobalAtbStream() at line 83 (outside the if-block). For consistency and to avoid unnecessary synchronization overhead, move the sync call inside the if-block.
🔎 Proposed fix
     if (src_data != dst_data) {
       auto ret = aclrtMemcpy(
           dst_data, data_size,
           src_data, data_size,
           ACL_MEMCPY_DEVICE_TO_DEVICE);
-
-      if (ret != ACL_SUCCESS) {
-        MLLM_ACL_CHECK(ret);
-      }
-
+      MLLM_ACL_CHECK(ret);
       syncGlobalAtbStream();
     }
     return;
mllm/backends/ascend/AscendCommon.cpp (2)
209-218: Rename short parameter name for better readability.

The parameter name t at line 209 is too short per coding guidelines. Rename it to a more descriptive identifier such as tensor.

Based on coding guidelines, parameter names should be at least 3 characters long for clarity.
🔎 Proposed fix
-void fillAtbTensorDesc(const Tensor& t, atb::TensorDesc& desc) {
+void fillAtbTensorDesc(const Tensor& tensor, atb::TensorDesc& desc) {
   desc.dtype = ACL_FLOAT16; // Currently hardcoded as per demo, can be expanded later
   desc.format = ACL_FORMAT_ND;
 
-  auto shape = t.shape();
+  auto shape = tensor.shape();
   desc.shape.dimNum = static_cast<uint64_t>(shape.size());
   for (uint64_t i = 0; i < desc.shape.dimNum; ++i) {
     desc.shape.dims[i] = static_cast<int64_t>(shape[i]);
   }
 }
Note: This change should also be reflected in the corresponding header file declaration.
121-135: Rename short variable name for better readability.

The variable name ok at line 121 is too short per coding guidelines. Consider renaming it to a more descriptive identifier such as is_valid or all_match.
🔎 Proposed fix
-  bool ok = true;
+  bool is_valid = true;
   for (size_t i = 0; i < actual.size(); ++i) {
     const float diff = std::abs(actual[i] - expected[i]);
     const float thr = atol + rtol * std::abs(expected[i]);
     if (diff > thr) {
-      ok = false;
+      is_valid = false;
       if (verbose) {
         std::cout << "[AscendVerify] idx " << i
                   << " expected " << expected[i]
                   << " got " << actual[i]
                   << " diff " << diff
                   << " thr " << thr << "\n";
       }
     }
   }
 
   if (verbose) {
-    std::cout << (ok ? "[AscendVerify] OK" : "[AscendVerify] FAIL") << "\n";
+    std::cout << (is_valid ? "[AscendVerify] OK" : "[AscendVerify] FAIL") << "\n";
   }
-  return ok;
+  return is_valid;

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b487ef4 and bd8eea4.

📒 Files selected for processing (13)

docs/ascend_backend/core_design.rst (1 hunks)
docs/ascend_backend/index.rst (1 hunks)
docs/index.rst (1 hunks)
examples/CMakeLists.txt (1 hunks)
mllm/backends/ascend/AscendBackend.cpp (1 hunks)
mllm/backends/ascend/AscendCommon.cpp (1 hunks)
mllm/backends/ascend/ops/AscendX2XOp.cpp (1 hunks)
mllm/backends/ascend/ops/AscendX2XOp.hpp (1 hunks)
tests/CMakeLists.txt (1 hunks)
tests/ascend/AscendKernelTest.hpp (1 hunks)
tests/ascend/CMakeLists.txt (1 hunks)
tests/ascend/KernelTest.cpp (1 hunks)
tests/ascend/KernelTestHelper.hpp (1 hunks)

✅ Files skipped from review due to trivial changes (1)

examples/CMakeLists.txt

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/ascend/ops/AscendX2XOp.hpp
mllm/backends/ascend/ops/AscendX2XOp.cpp
mllm/backends/ascend/AscendBackend.cpp
mllm/backends/ascend/AscendCommon.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/ascend/ops/AscendX2XOp.hpp
mllm/backends/ascend/ops/AscendX2XOp.cpp
mllm/backends/ascend/AscendBackend.cpp
mllm/backends/ascend/AscendCommon.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/ascend/ops/AscendX2XOp.hpp
mllm/backends/ascend/ops/AscendX2XOp.cpp
mllm/backends/ascend/AscendBackend.cpp
mllm/backends/ascend/AscendCommon.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/ascend/ops/AscendX2XOp.cpp
mllm/backends/ascend/AscendBackend.cpp
mllm/backends/ascend/AscendCommon.cpp

🧬 Code graph analysis (1)

mllm/backends/ascend/ops/AscendX2XOp.cpp (1)

mllm/backends/ascend/ops/AscendX2XOp.hpp (4)

AscendX2XOp (14-14)

options (21-23)

options (21-21)

inputs (16-16)

🪛 Clang (14.0.6)

tests/ascend/AscendKernelTest.hpp

[error] 12-12: class 'AscendKernelTest' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

[error] 18-18: method 'AddFloat16Test' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

tests/ascend/KernelTestHelper.hpp

[error] 7-7: class 'KernelTest' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/ascend/AscendCommon.cpp

[error] 54-54: 3 adjacent parameters of 'prepareAscendTensor' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 57-57: variable 'expected_elements' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 68-68: variable 'device_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 69-69: variable 'bytes' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 92-92: variable 'elem_cnt' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 95-95: variable 'src' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 102-102: 3 adjacent parameters of 'verifyAscendTensor' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 102-102: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 121-121: variable name 'ok' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 143-143: 3 adjacent parameters of 'verifyAscendTensor' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 143-143: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 173-173: variable 'ctx' provides global access to a non-const object; consider making the pointed-to data 'const'

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 209-209: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

tests/ascend/KernelTest.cpp

[error] 16-16: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)

[error] 21-21: 8 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 21-21: 8 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 22-22: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 22-22: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 23-23: 32 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 23-23: 32 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

🔇 Additional comments (8)

docs/index.rst (1)

354-358: LGTM!

The Ascend backend documentation entry is properly integrated into the main documentation index with consistent formatting.

tests/ascend/KernelTest.cpp (2)

16-26: LGTM!

The test systematically validates the Add operation across various tensor shapes, from small ({2,3}) to larger ({32,32}) dimensions. The static analysis warnings about magic numbers are false positives—these values represent test tensor dimensions and are self-documenting in this context.

28-44: LGTM!

The main function properly orchestrates the test lifecycle: initializes GoogleTest, sets up the Ascend backend and runtime context, executes tests, reports memory usage for leak detection, and performs cleanup. The sequence is correct and demonstrates good test hygiene.

tests/ascend/CMakeLists.txt (1)

1-26: LGTM!

The CMake configuration properly sets up the Ascend kernel test target with:

Conditional SDK path inclusion when ASCEND_HOME_PATH is defined

Correct library dependencies (GoogleTest, runtime, backends, Ascend CL)

Appropriate C++20 standard requirement

GoogleTest integration via include(GoogleTest)

The configuration follows CMake best practices and aligns with the project's build structure.

tests/ascend/AscendKernelTest.hpp (2)

18-50: Test implementation is sound with minor style consideration.

The test properly validates the Add operation by:

Generating random FP16 inputs on CPU

Computing a reference result with element-wise FP16 addition

Executing the operation on Ascend hardware

Validating results with allClose (tolerance 1e-2f, appropriate for FP16 precision)

Minor note: Line 19 uses using namespace mllm within the method. While marked with NOLINT, consider using explicit mllm:: prefixes to maintain namespace hygiene (purely stylistic for tests).

The static analysis suggestions can be safely ignored:

The "special member functions" warning is a false positive for test fixtures

Making the method static would prevent it from accessing inherited test fixture state if needed in the future

28-34: FP16 arithmetic is correctly supported.

The mllm_fp16_t type is defined as either float16_t (native ARM floating-point type) or half_float::half (established library type with full operator support). Both implementations properly support the + operator for floating-point addition. The same pattern is used consistently in CPU ElementwiseKernelTest and is known to work correctly.

mllm/backends/ascend/AscendBackend.cpp (1)

14-23: Verify device enumeration success in constructor.

The constructor properly registers operation factories and logs device information during initialization. However, AscendDeviceMetaInfo::instance().devices is not guaranteed to be populated—device enumeration can fail silently if aclInit() or aclrtGetDeviceCount() fails. While errors are logged in the singleton constructor, the Backend constructor does not validate that devices were successfully enumerated. If initialization fails, the devices vector will be empty and the logging loop will silently not execute, leaving users unaware of the failure.

Consider adding a check after accessing devices to ensure enumeration succeeded, or log a warning if no devices were found.
mllm/backends/ascend/AscendCommon.cpp (1)
102-107: Verify the parameter order to prevent accidental swaps.

Static analysis warns that parameters atol, rtol, and verbose at line 102 have similar or convertible types and could be easily swapped. While the current usage appears correct, consider using strongly-typed parameters or adding runtime validation to catch misuse.

You can add a comment documenting the parameter order or consider using named parameters in C++20 if available:
// Parameters: atol (absolute tolerance), rtol (relative tolerance), verbose (logging flag)
bool verifyAscendTensor(const Tensor& tensor,
                        const std::vector<float>& expected,
                        float atol,
                        float rtol,
                        bool verbose,
                        std::vector<float>* actual_out = nullptr);

chenghuaWang · 2025-12-23T05:59:20Z

tests/ascend/CMakeLists.txt

+  MllmRT
+  MllmCPUBackend
+  MllmAscendBackend
+  ascendcl


ascendcl should be linked into MllmAscendBackend

chenghuaWang · 2025-12-23T05:59:43Z

tests/ascend/KernelTest.cpp

+  testing::InitGoogleTest(&argc, argv);
+
+  // Initialize Ascend backend
+  mllm::initAscendBackend();


init mllm context before init ascend backend

chenghuaWang · 2025-12-23T06:00:13Z

tests/CMakeLists.txt

 if(MLLM_BUILD_CUDA_BACKEND)
  add_subdirectory(cuda)
 endif()
+if(DEFINED ENV{ASCEND_HOME_PATH})


if (MLLM_BUILD_ASCEND_BACKEND)

chenghuaWang · 2025-12-23T06:01:04Z

mllm/backends/ascend/CMakeLists.txt

-set(ASCEND_CANN_PACKAGE_PATH "/usr/local/Ascend/ascend-toolkit/latest"
-    CACHE STRING "ASCEND CANN package installation directory"
-)
+set(SOC_VERSION "Ascend310B1" CACHE STRING "system on chip type")


Make this configurable in mllm's top cmake files

chenghuaWang · 2025-12-23T06:01:36Z

mllm/backends/ascend/CMakeLists.txt

+set(SOC_VERSION "Ascend310B1" CACHE STRING "system on chip type")
+
 if(NOT CMAKE_BUILD_TYPE)
    set(CMAKE_BUILD_TYPE "Debug" CACHE STRING "Build type Release/Debug (default Debug)" FORCE)


indent should use 2 blank

chenghuaWang · 2025-12-23T06:02:05Z

mllm/backends/ascend/CMakeLists.txt

+
+if(DEFINED ENV{ASCEND_HOME_PATH})
+  target_include_directories(MllmAscendBackend PUBLIC $ENV{ASCEND_HOME_PATH}/include)
+  target_link_directories(MllmAscendBackend PRIVATE $ENV{ASCEND_HOME_PATH}/lib64)


make this PUBLIC

chenghuaWang · 2025-12-23T06:02:19Z

mllm/backends/ascend/CMakeLists.txt

+
+if(DEFINED ENV{ATB_HOME_PATH})
+  target_include_directories(MllmAscendBackend PUBLIC $ENV{ATB_HOME_PATH}/include)
+  target_link_directories(MllmAscendBackend PRIVATE $ENV{ATB_HOME_PATH}/lib)


make this PUBLIC

chenghuaWang · 2025-12-23T06:02:32Z

mllm/backends/ascend/CMakeLists.txt

+endif()
+
+
 target_link_libraries(MllmAscendBackend PRIVATE


chenghuaWang · 2025-12-23T06:03:34Z

mllm/backends/ascend/ops/AscendElewiseOps.cpp

@@ -0,0 +1,121 @@
+// Copyright (c) MLLM Team.


The file name has typo Ascned -> Ascend

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

mllm/backends/ascend/ops/AscendElewiseOps.cpp (1)

44-48: Resource leak: ATB operation and workspace not destroyed on error paths.

If op->Setup() (line 79) or op->Execute() (line 95) fails, MLLM_ERROR_EXIT is called before atb::DestroyOperation(op) at line 106, leaking the ATB operation. Additionally, if Execute() fails after workspace allocation, the workspace is also leaked.

Consider using RAII for cleanup:

🔎 Proposed fix using scope guards

+#include <utility>  // for std::exchange if needed
+
+namespace {
+struct AtbOpGuard {
+  atb::Operation* op_ = nullptr;
+  explicit AtbOpGuard(atb::Operation* op) : op_(op) {}
+  ~AtbOpGuard() { if (op_) atb::DestroyOperation(op_); }
+  void release() { op_ = nullptr; }
+  AtbOpGuard(const AtbOpGuard&) = delete;
+  AtbOpGuard& operator=(const AtbOpGuard&) = delete;
+};
+}  // namespace
+
 void AscendAddOp::forward(const std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) {
   // ... validation code ...
 
   atb::Operation* op = nullptr;
   auto st = atb::CreateOperation(addParam, &op);
   if (st != atb::NO_ERROR || op == nullptr) {
     MLLM_ERROR_EXIT(ExitCode::kAscendError, "ATB CreateOperation(ELEWISE_ADD) failed, status={}", static_cast<int>(st));
   }
+  AtbOpGuard op_guard(op);
 
   // ... rest of forward() ...
 
   syncGlobalAtbStream();
 
   if (workspace_block_id != -1) {
     auto& mem_mgr = getAscendMemoryManager();
     mem_mgr.freeBlock(workspace_block_id);
   }
 
-  atb::DestroyOperation(op);
+  op_guard.release();
+  atb::DestroyOperation(op);
 }

Also applies to: 77-81, 94-96

🧹 Nitpick comments (2)

CMakeLists.txt (1)
48-51: Consider adding validation for allowed values.

The new cache variables are well-documented. For improved user experience, consider constraining allowed values using set_property:
set(MLLM_RUN_MODE "npu" CACHE STRING "Run mode for mLLM backends: cpu/sim/npu")
set_property(CACHE MLLM_RUN_MODE PROPERTY STRINGS "cpu" "sim" "npu")

set(MLLM_SOC_VERSION "Ascend310B1" CACHE STRING "SOC version for Ascend backend")
This provides dropdown selection in CMake GUIs and catches invalid values early.
tests/ascend/CMakeLists.txt (1)
24-24: Unused include(GoogleTest) without test discovery.

The GoogleTest module is included but gtest_discover_tests() is never called, so the test won't be registered with CTest. Either add discovery or remove the include:
🔎 Option A: Add test discovery
 include(GoogleTest)
+gtest_discover_tests(Mllm-Test-AscendKernel)
🔎 Option B: Remove unused include
-include(GoogleTest)

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7becc18 and 04e56be.

📒 Files selected for processing (8)

CMakeLists.txt
mllm/backends/ascend/CMakeLists.txt
mllm/backends/ascend/ops/AscendElewiseOps.cpp
mllm/backends/cpu/kernels/arm/rmsnorm.cpp
mllm/backends/cpu/kernels/arm/softmax.cpp
tests/CMakeLists.txt
tests/ascend/CMakeLists.txt
tests/ascend/KernelTest.cpp

🚧 Files skipped from review as they are similar to previous changes (1)

mllm/backends/ascend/CMakeLists.txt

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/ascend/ops/AscendElewiseOps.cpp
mllm/backends/cpu/kernels/arm/rmsnorm.cpp
mllm/backends/cpu/kernels/arm/softmax.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/ascend/ops/AscendElewiseOps.cpp
mllm/backends/cpu/kernels/arm/rmsnorm.cpp
mllm/backends/cpu/kernels/arm/softmax.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/ascend/ops/AscendElewiseOps.cpp
mllm/backends/cpu/kernels/arm/rmsnorm.cpp
mllm/backends/cpu/kernels/arm/softmax.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/ascend/ops/AscendElewiseOps.cpp
mllm/backends/cpu/kernels/arm/rmsnorm.cpp
mllm/backends/cpu/kernels/arm/softmax.cpp

🧬 Code graph analysis (2)

mllm/backends/cpu/kernels/arm/rmsnorm.cpp (1)

mllm/backends/cpu/kernels/arm/math.hpp (2)

vsquare_mean_fp32 (270-284)

vsquare_mean_fp32 (270-270)

mllm/backends/cpu/kernels/arm/softmax.cpp (2)

mllm/nn/Functional.cpp (2)

exp (81-81)

exp (81-81)

mllm/nn/Functional.hpp (1)

exp (115-115)