Skip to content

feat(ascend): initial Ascend backend and add elementwise add op#564

Merged
chenghuaWang merged 16 commits intoUbiquitousLearning:mainfrom
lywbarca:lyw/ascend-backend
Dec 23, 2025
Merged

feat(ascend): initial Ascend backend and add elementwise add op#564
chenghuaWang merged 16 commits intoUbiquitousLearning:mainfrom
lywbarca:lyw/ascend-backend

Conversation

@lywbarca
Copy link

@lywbarca lywbarca commented Dec 12, 2025

Please check Guidelines for Contributing.

Summary by CodeRabbit

  • New Features

    • Added full Ascend backend: memory pooling, device/context/stream management, dispatcher, allocator, and operator support (elementwise add and host-device transfers).
  • Documentation

    • Added comprehensive Ascend backend design docs.
  • Build System

    • Enabled Ascend backend build and cross-compile configs; exposed Ascend-related CMake options.
  • Tests

    • Added Ascend kernel tests and test harness for FP16 validation.
  • Chores

    • Registered Ascend device types/dispatcher and minor CPU math adjustments.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 12, 2025

Walkthrough

Adds a complete Ascend backend: device enumeration, global ATB/ACL context and stream, per-device memory pools and manager, pool-backed allocator, dispatcher, ATB-based ops (Add, X2X), tests, CMake build integration, and documentation.

Changes

Cohort / File(s) Summary
Allocator & Registration
mllm/backends/ascend/AscendAllocator.hpp, mllm/backends/ascend/AscendAllocator.cpp, mllm/backends/ascend/Register.cpp
New pool-backed AscendAllocator with storage->block-id tracking, constructor/destructor logging, alloc/free mapping; initAscendBackend() creates 100MB pools, registers backend, allocator, and dispatcher.
Memory Manager & Pool
mllm/backends/ascend/memory/AscendMemoryManager.hpp, mllm/backends/ascend/memory/AscendMemoryManager.cpp, mllm/backends/ascend/memory/AscendMemoryPool.hpp, mllm/backends/ascend/memory/AscendMemoryPool.cpp, mllm/backends/ascend/memory/MemoryBlock.hpp
New AscendMemoryManager singleton and per-device AscendMemoryPool: block allocation/free with 64-byte alignment, free/used registries, thread-safety, block-id APIs and pool lifecycle.
Common Utilities & Device Meta
mllm/backends/ascend/AscendCommon.hpp, mllm/backends/ascend/AscendCommon.cpp
Global ATB/ACL init and stream accessors, AscendDeviceMetaInfo device enumeration, AscendTensorHandle RAII, prepare/copy/verify tensor helpers, AscendTimer RAII, ACL/ATB check macros.
Backend, Dispatcher & Registration
mllm/backends/ascend/AscendBackend.cpp, mllm/backends/ascend/AscendDispatcher.hpp, mllm/backends/ascend/AscendDispatcher.cpp
Backend registers op factories and logs device memory; AscendDispatcher added to route/async tasks (kExecuteOp/kExecuteModule), optional Perfetto tracing, process/asyncReceive implementation and factory.
Ascend Ops
mllm/backends/ascend/ops/AscendElewiseOps.hpp, mllm/backends/ascend/ops/AscendElewiseOps.cpp, mllm/backends/ascend/ops/AscendX2XOp.hpp, mllm/backends/ascend/ops/AscendX2XOp.cpp
Adds AscendAddOp (ATB ELEWISE_ADD setup/forward, workspace allocation via memory manager) and AscendX2XOp (CPU↔Ascend and Ascend↔Ascend memcpy paths) plus factories.
Build & CMake
mllm/backends/ascend/CMakeLists.txt, CMakeLists.txt, tests/CMakeLists.txt, tests/ascend/CMakeLists.txt
New MLLM_ASCEND_SOURCES, unified public linkage to ascendcl/opapi/nnopbase/atb/MllmRT, MLLM_USE_ASCEND_MEMPOOL define, ASCEND/ATB env include/link exposure, MLLM_RUN_MODE/MLLM_SOC_VERSION cache vars, conditional tests inclusion.
Tests & Test Helpers
tests/ascend/KernelTest.cpp, tests/ascend/AscendKernelTest.hpp, tests/ascend/KernelTestHelper.hpp
GoogleTest suite for Add FP16: harness that init context/backend, creates AscendKernelTest with AddFloat16Test generating FP16 inputs, runs Ascend op and validates against CPU reference.
Docs & Build Tasks
docs/ascend_backend/core_design.rst, docs/ascend_backend/index.rst, docs/index.rst, tasks/build_arm_ascend.yaml
New Ascend backend design doc and index entries; ARM Ascend cross-build YAML added.
Core enums/IDs
mllm/core/DeviceTypes.hpp, mllm/engine/Dispatcher.hpp, mllm/utils/Common.hpp
Added DeviceTypes strings for kAscend/kAscendHost, Dispatcher::ascend_dispatcher_id constant, and ExitCode::kAscendError.
Minor CPU kernels (precision) changes
mllm/backends/cpu/kernels/arm/rmsnorm.cpp, mllm/backends/cpu/kernels/arm/softmax.cpp
Replaced std::sqrtf→std::sqrt and std::expf→std::exp for numeric calls (precision/consistency).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Dispatcher as AscendDispatcher
    participant Backend as AscendBackend
    participant MemMgr as AscendMemoryManager
    participant MemPool as AscendMemoryPool
    participant Alloc as AscendAllocator
    participant ATB as ATB/ACL

    Client->>Dispatcher: submit Task (kExecuteOp / kExecuteModule)
    activate Dispatcher
    Dispatcher->>Dispatcher: route & optional trace
    Dispatcher->>Backend: forward(inputs, outputs)
    activate Backend
    Backend->>MemMgr: allocateBlock(workspace_size)
    activate MemMgr
    MemMgr->>MemPool: allocateBlock(size)
    activate MemPool
    MemPool-->>MemMgr: block_id + ptr
    deactivate MemPool
    MemMgr-->>Backend: workspace ptr + id
    deactivate MemMgr

    Backend->>ATB: create op, fill tensor descs, setup
    ATB-->>Backend: op ready
    Backend->>ATB: execute op
    ATB-->>Backend: done
    Backend->>MemMgr: freeBlock(block_id)
    Backend-->>Dispatcher: complete
    deactivate Backend
    Dispatcher-->>Client: result
    deactivate Dispatcher
Loading
sequenceDiagram
    participant Init
    participant Register as initAscendBackend()
    participant MemMgr as AscendMemoryManager
    participant MemPool as AscendMemoryPool
    participant Alloc as AscendAllocator
    participant Dispatcher as AscendDispatcher
    participant Context as GlobalContext

    Init->>Register: call initAscendBackend()
    activate Register
    Register->>MemMgr: createMemoryPool(100MB) per device
    activate MemMgr
    loop per device
        MemMgr->>MemPool: construct pool (aclrtMalloc base region)
        MemPool-->>MemMgr: pool ready
    end
    deactivate MemMgr
    Register->>Context: register AscendBackend
    Register->>MemMgr: register allocator to memory manager
    Register->>Dispatcher: create & register AscendDispatcher
    Register-->>Init: done
    deactivate Register
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • xumengwei
  • liang1232018
  • yirongjie
  • oreomaker

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description only contains the contribution template boilerplate without any actual details about the changes, objectives, or implementation notes. Add a detailed description explaining the Ascend backend implementation, memory management design, dispatcher architecture, supported operations, and testing approach. Include motivation and key design decisions.
Docstring Coverage ⚠️ Warning Docstring coverage is 14.61% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding an initial Ascend backend implementation with an elementwise add operator support.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
mllm/backends/ascend/AscendAllocator.hpp (1)

15-44: Delete copy/move for allocator (mutex member + resource tracking) to satisfy Rule of Five

 class AscendAllocator final : public Allocator {
  public:
   AscendAllocator();
-  ~AscendAllocator();
+  ~AscendAllocator() override;
+
+  AscendAllocator(const AscendAllocator&) = delete;
+  AscendAllocator& operator=(const AscendAllocator&) = delete;
+  AscendAllocator(AscendAllocator&&) = delete;
+  AscendAllocator& operator=(AscendAllocator&&) = delete;
🟠 Major comments (18)
examples/CMakeLists.txt-18-18 (1)

18-18: Add conditional compilation guard for Ascend demo.

The Ascend demo is added unconditionally, which will cause build failures on systems without the Ascend SDK. Following the pattern of other backends (e.g., QNN with MLLM_BUILD_QNN_BACKEND), this should be conditional.

Apply this diff to add a conditional guard:

+if(MLLM_BUILD_ASCEND_BACKEND)
   add_subdirectory(ascend_add_demo)
+endif()
mllm/backends/ascend/ops/AscnedElewiseOps.cpp-1-4 (1)

1-4: Fix filename typo (AscnedAscend) to avoid build/grep confusion.
Even if CMake references it correctly today, this will cause ongoing maintenance issues (and is easy to miss in future edits).

mllm/backends/ascend/memory/AscendMemoryPool.cpp-73-104 (1)

73-104: Harden free/get semantics (and fix clang-tidy iterator naming).
getBlockPtr() should set addr=nullptr on failure; otherwise callers can accidentally reuse stale pointers. Also clang-tidy is flagging it naming length.

 void AscendMemoryPool::freeBlock(int block_id) {
     std::unique_lock<std::mutex> lock(block_mutex_);
@@
-    auto it = used_blocks_.find(block_id);
-    if (it != used_blocks_.end()) {
-        free_blocks_.insert(*it);
-        used_blocks_.erase(it);
+    auto used_it = used_blocks_.find(block_id);
+    if (used_it != used_blocks_.end()) {
+        free_blocks_.insert(*used_it);
+        used_blocks_.erase(used_it);
     } else {
         MLLM_ERROR("Double free block id {}", block_id);
     }
 }
 
 void AscendMemoryPool::getBlockPtr(int block_id, void *&addr) {
     std::unique_lock<std::mutex> lock(block_mutex_);
+    addr = nullptr;
@@
-    auto it = used_blocks_.find(block_id);
-    if (it != used_blocks_.end()) {
-        addr = it->second.address_;
+    auto used_it = used_blocks_.find(block_id);
+    if (used_it != used_blocks_.end()) {
+        addr = used_it->second.address_;
     } else {
         MLLM_ERROR("Get block address error, block id {}", block_id);
     }
 }
mllm/backends/ascend/AscendAllocator.cpp-23-47 (1)

23-47: Prevent size truncation when allocating blocks (size_t → uint32_t).
AscendMemoryManager::allocateBlock takes uint32_t; storage->size_ is typically size_t. Please validate/cast explicitly and fail cleanly if size_ > UINT32_MAX.

 bool AscendAllocator::alloc(Storage* storage) {
     auto& mem_manager = getAscendMemoryManager();
     int block_id = -1;
-    mem_manager.allocateBlock(storage->size_, block_id);
+    if (storage->size_ > std::numeric_limits<uint32_t>::max()) {
+        MLLM_ERROR("Storage size too large for Ascend memory pool: {} bytes", storage->size_);
+        return false;
+    }
+    mem_manager.allocateBlock(static_cast<uint32_t>(storage->size_), block_id);
mllm/backends/ascend/ops/AscendElewiseOps.hpp-6-25 (1)

6-25: Make header self-contained (+ add brief API docs).
This file uses std::shared_ptr/std::make_shared but doesn’t include <memory> (and similarly relies on transitive <vector>). Also, public op/factory declarations should have a short comment per guidelines.

 #pragma once
 
+#include <memory>
+#include <vector>
+
 #include "mllm/core/BaseOp.hpp"
 #include "mllm/core/aops/ElewiseOps.hpp"
 #include "mllm/core/OpTypes.hpp"
 
 namespace mllm::ascend {
 
+// Ascend implementation of elementwise add (currently no-broadcast, same-dtype).
 class AscendAddOp final : public aops::AddOp {
  public:
   explicit AscendAddOp(const aops::AddOpOptions& options);
@@
 class AscendAddOpFactory final : public TypedOpFactory<OpTypes::kAdd, aops::AddOpOptions> {
  public:
   std::shared_ptr<BaseOp> createOpImpl(const aops::AddOpOptions& options) override {
     return std::make_shared<AscendAddOp>(options);
   }
 };

Committable suggestion skipped: line range outside the PR's diff.

mllm/backends/ascend/memory/AscendMemoryManager.hpp-6-35 (1)

6-35: Tighten API surface (includes + docs + avoid returning mutable ref).
Add <cstddef> for size_t, add short comments for public APIs, and prefer returning std::shared_ptr<AscendMemoryPool> (by value) or const std::shared_ptr<...>& to avoid external mutation of memory_pools_.

 #pragma once
 
+#include <cstddef>
 #include <memory>
 #include <vector>
 #include <cstdint>
 #include "AscendMemoryPool.hpp"
@@
 class AscendMemoryManager {
 public:  
     AscendMemoryManager();
 
+    // Create a pool for the current device (or specified device if supported by impl).
     void createMemoryPool(size_t pool_size);
@@
-    std::shared_ptr<AscendMemoryPool> &getMemoryPool();
+    std::shared_ptr<AscendMemoryPool> getMemoryPool();

Committable suggestion skipped: line range outside the PR's diff.

mllm/backends/ascend/AscendAllocator.hpp-40-43 (1)

40-43: Fix map key/comment mismatch (void* vs “Storage ptr”)

Either change the key to Storage* (if that’s what you mean), or update the comment to match what the pointer actually represents (e.g., storage->data() pointer / device ptr).

-  std::unordered_map<void*, int> storage_to_block_id_;  // Storage ptr -> block ID
+  std::unordered_map<void*, int> storage_to_block_id_;  // device ptr (e.g., storage->data()) -> block ID
mllm/backends/ascend/memory/AscendMemoryManager.cpp-21-34 (1)

21-34: createMemoryPool: check aclrtSetDevice, restore previous device, and prevent double-init leaks

Today it changes global device context and doesn’t restore it; also repeated calls will keep pushing pools.

 void AscendMemoryManager::createMemoryPool(size_t pool_size)
 {
-    uint32_t device_count = 0;
+    uint32_t device_count{};
     auto ret = aclrtGetDeviceCount(&device_count);
     MLLM_ACL_CHECK(ret);
+
+    int32_t prev_device_id{};
+    ret = aclrtGetDevice(&prev_device_id);
+    MLLM_ACL_CHECK(ret);
+
+    if (!memory_pools_.empty()) {
+        MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "createMemoryPool called more than once");
+    }
+
     for (size_t i = 0; i < device_count; i++) {
-
-        aclrtSetDevice(i);
+        ret = aclrtSetDevice(static_cast<int32_t>(i));
+        MLLM_ACL_CHECK(ret);
 
         std::shared_ptr<AscendMemoryPool> memory_pool = std::make_shared<AscendMemoryPool>(pool_size);
         memory_pools_.push_back(memory_pool);
         MLLM_INFO("create mempool for device {} success", i);
     }
+
+    ret = aclrtSetDevice(prev_device_id);
+    MLLM_ACL_CHECK(ret);
 }
mllm/backends/ascend/AscendDispatcher.cpp-34-44 (1)

34-44: Remove dead MLLM_EMPTY_SCOPE statement in switch block, and consider guarding this capture with shared_from_this() for async safety

The MLLM_EMPTY_SCOPE; statement in the switch case is a no-op and can be removed. More importantly, the lambda [this, task] captures a raw pointer to the dispatcher. While the design assumes DispatcherManager keeps the dispatcher alive via shared_ptr, using shared_from_this() would provide defensive safety: if the dispatcher is ever removed from the manager while async work is pending, the raw this pointer would become invalid. Consider using [self = shared_from_this(), task] instead, which requires adding enable_shared_from_this<AscendDispatcher> to the class hierarchy.

mllm/backends/ascend/AscendCommon.cpp-124-185 (1)

124-185: Revisit aclInit/aclFinalize inside AscendDeviceMetaInfo ctor
Calling aclInit()/aclFinalize() inside a singleton constructor (Line 127, 174) can conflict with other runtime init (including getGlobalAtbContext() which does ACL calls but doesn’t init). Prefer a single, well-defined ACL lifetime owner (or make enumeration reuse the existing initialized runtime and avoid finalizing here).

mllm/backends/ascend/AscendCommon.hpp-44-50 (1)

44-50: Default-initialize AscendDeviceInfo fields to satisfy member-init checks

 struct AscendDeviceInfo {
-  std::string name;
-  unsigned int id;
-  size_t total_memory;  // bytes
-  size_t free_memory;   // bytes
-  std::string soc_version;
+  std::string name{};
+  unsigned int id{0};
+  size_t total_memory{0};  // bytes
+  size_t free_memory{0};   // bytes
+  std::string soc_version{};
 };
mllm/backends/ascend/AscendCommon.cpp-14-41 (1)

14-41: Guard against null Tensor internals and potential shared-storage corruption in release()
tensor_.impl()->storage()->ptr_ = nullptr; (Line 39) assumes impl/storage are non-null and that you “own” that storage uniquely. If Tensor storage is shared, nulling the pointer can break other tensors.

At minimum, add null checks; ideally, avoid writing Tensor internals directly (provide a Tensor API / storage owner API).

 void AscendTensorHandle::release() {
   if (block_id_ >= 0) {
     auto& mem_mgr = getAscendMemoryManager();
     mem_mgr.freeBlock(block_id_);
     block_id_ = -1;
-    tensor_.impl()->storage()->ptr_ = nullptr;
+    if (auto* impl = tensor_.impl(); impl && impl->storage()) {
+      impl->storage()->ptr_ = nullptr;
+    }
   }
 }
mllm/backends/ascend/AscendCommon.hpp-69-89 (1)

69-89: Initialize tensor_ and add [[nodiscard]] to query/accessors
Addresses clang-tidy findings and avoids accidentally ignoring ownership state.

 struct AscendTensorHandle {
   AscendTensorHandle() = default;
@@
-  bool valid() const { return block_id_ >= 0; }  // Check if handle owns a valid memory block
+  [[nodiscard]] bool valid() const { return block_id_ >= 0; }
 
-  Tensor& tensor() { return tensor_; }  // Access tensor
-  const Tensor& tensor() const { return tensor_; }  // Access tensor (const)
-  int blockId() const { return block_id_; }  // Get memory block ID
+  [[nodiscard]] Tensor& tensor() { return tensor_; }
+  [[nodiscard]] const Tensor& tensor() const { return tensor_; }
+  [[nodiscard]] int blockId() const { return block_id_; }
 
  private:
-  Tensor tensor_;
+  Tensor tensor_{};
   int block_id_{-1};
 };
mllm/backends/ascend/CMakeLists.txt-1-2 (1)

1-2: Add conditional guards for Ascend library linking based on RUN_MODE

RUN_MODE is defined (Line 1) but only used to conditionally apply the -g flag (Line 59); the Ascend libraries (ascendcl, opapi, nnopbase, atb—Lines 45–51) are linked unconditionally regardless of RUN_MODE. If cpu and sim modes should build without NPU-specific libraries, wrap the target_link_libraries() call (Lines 45–51) and related include/link directory setup (Lines 35–42) with if(RUN_MODE STREQUAL "npu") ... endif(). Additionally, the unconditional -O2 flag (Line 60) should align with the build type; consider applying it conditionally or only for Release builds.

mllm/backends/ascend/AscendCommon.hpp-53-66 (1)

53-66: Make special member intent explicit for the singleton (AscendDeviceMetaInfo)
Clang-tidy will keep complaining unless you explicitly delete moves / default dtor.

 class AscendDeviceMetaInfo {
  public:
   AscendDeviceMetaInfo();
+  ~AscendDeviceMetaInfo() = default;
@@
   AscendDeviceMetaInfo(const AscendDeviceMetaInfo&) = delete;
   AscendDeviceMetaInfo& operator=(const AscendDeviceMetaInfo&) = delete;
+  AscendDeviceMetaInfo(AscendDeviceMetaInfo&&) = delete;
+  AscendDeviceMetaInfo& operator=(AscendDeviceMetaInfo&&) = delete;
mllm/backends/ascend/AscendCommon.cpp-43-77 (1)

43-77: Avoid uint32_t sizing / add allocation failure handling / reduce “swappable params” risk

  • bytes is uint32_t (Line 58): can overflow for large tensors; prefer size_t.
  • allocateBlock(bytes, block_id) (Line 60): no visible error handling if allocation fails/throws.
  • Signature (..., int batch, int size) (Line 43–45) is easy to swap.
-  const uint32_t bytes = static_cast<uint32_t>(expected_elements * sizeof(half_float::half));
+  const size_t bytes = expected_elements * sizeof(half_float::half);
 
   mem_mgr.allocateBlock(bytes, block_id);
+  MLLM_RT_ASSERT(block_id >= 0);

Consider changing API to something like prepareAscendTensor(const std::vector<float>& host_data, std::array<int, 2> shape) or prepareAscendTensor(..., int batch, int features).

mllm/backends/ascend/AscendCommon.cpp-113-122 (1)

113-122: Add bounds validation and use tensor dtype mapping

The function unconditionally writes to desc.shape.dims[i] without checking the array capacity—add a bounds check to ensure shape.size() doesn't exceed ATB's dimension limit (assert or validate the rank).

The hardcoded ACL_FLOAT16 dtype ignores t.dtype() which is available on the Tensor class. Implement dtype mapping from DataTypes to ACL dtype constants (e.g., kFloat32ACL_FLOAT, kFloat16ACL_FLOAT16, kInt8ACL_INT8).

mllm/backends/ascend/AscendCommon.hpp-17-28 (1)

17-28: Replace function-like error-check macros with inline functions

Both macros can trigger clang-tidy warnings (macro-usage) and reduce type safety. Replace with inline functions:

-#define MLLM_ACL_CHECK(err)                                                                                    \
-  if (err != ACL_SUCCESS) {                                                                                    \
-    MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "ACL error code {}: {}", int(err), aclGetRecentErrMsg()); \
-  }
-
-#define MLLM_ATB_CHECK(err)                                                                                    \
-  if (err != atb::NO_ERROR) {                                                                                  \
-    MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "ATB error code {}", int(err));                           \
-  }
+namespace mllm::ascend {
+inline void aclCheck(aclError err) {
+  if (err != ACL_SUCCESS) {
+    MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "ACL error code {}: {}", int(err), aclGetRecentErrMsg());
+  }
+}
+inline void atbCheck(atb::ErrorCode err) {
+  if (err != atb::NO_ERROR) {
+    MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "ATB error code {}", int(err));
+  }
+}
+}  // namespace mllm::ascend

Update call sites: MLLM_ACL_CHECK(ret)mllm::ascend::aclCheck(ret) (same for ATB).

🟡 Minor comments (4)
examples/ascend_add_demo/main.cpp-117-117 (1)

117-117: Replace Chinese comment with English.

Comments in source files should be in English per the coding guidelines.

Apply this diff:

-    // 清理内存池中的块
+    // Release memory pool blocks
     x_handle.release();
examples/ascend_add_demo/main.cpp-52-52 (1)

52-52: Remove debug print statement.

This appears to be a leftover debug statement from development.

Apply this diff:

     auto& ctx = Context::instance();
-    std::cout << "context over" <<std::endl;
     auto z_ascend = ctx.buildOpAndSubmitTask(
examples/ascend_add_demo/CMakeLists.txt-11-11 (1)

11-11: Replace Chinese comment with English.

Comments in source files should be in English per the coding guidelines.

Apply this diff:

-  ascendcl  # 添加 ACL 库,因为 main.cpp 中直接使用了 aclrtMemcpy
+  ascendcl  # ACL library required for aclrtMemcpy used in main.cpp
mllm/backends/ascend/AscendCommon.cpp-1-187 (1)

1-187: Remove trailing whitespace at line 81 of AscendCommon.cpp

Verification found trailing whitespace at line 81 that violates the repository's encoding and formatting rules.

🧹 Nitpick comments (11)
tasks/build_arm_ascend.yaml (2)

9-9: Android API level 28 is outdated; document the rationale or consider updating.

Android API level 28 (Android 9.0, released in 2018) is quite old. Modern Android projects typically target API 31+. If API 28 is intentional for broad compatibility, document this decision. If not, consider updating to a more recent level.


12-12: Hard-coded install prefix may limit portability.

The install prefix /root/mllm-install-android-arm64-v8a assumes /root/ is accessible and writable. While the current CI/CD workflow runs with elevated permissions, this will be problematic if the build configuration is used in non-root environments or in containers with restricted permissions.

Consider using a relative path for better portability:

-        - "-DCMAKE_INSTALL_PREFIX=/root/mllm-install-android-arm64-v8a"
+        - "-DCMAKE_INSTALL_PREFIX=./mllm-install-android-arm64-v8a"

This aligns with the pattern used in other build configurations like tasks/build_sdk_android.yaml and allows the install directory to be placed relative to the build directory regardless of the execution environment.

mllm/backends/ascend/Register.cpp (2)

19-19: Consider making memory pool size configurable.

The 100MB pool size is hardcoded. While the comment mentions "can be adjusted as needed," exposing this as a configuration parameter (environment variable or function argument) would provide better flexibility for different deployment scenarios.

Consider this approach:

// Add parameter with default
void initAscendBackend(size_t pool_size = 100 * 1024 * 1024) {
  auto& ctx = Context::instance();
  
  ascend::getAscendMemoryManager().createMemoryPool(pool_size);
  MLLM_INFO("Ascend memory pool initialized with {} MB", pool_size / (1024 * 1024));
  // ... rest of function
}

36-38: Remove or document commented code scaffolding.

The commented custom ops registration is placeholder code. Either remove it or add a TODO comment explaining when/how it should be used.

If keeping for documentation purposes:

// TODO: Example of registering custom ops:
// ctx.registerCustomizedOp(kAscend, "CustomOpName", 
//                          std::make_shared<CustomOpFactory>());
mllm/backends/ascend/AscendDispatcher.hpp (1)

17-38: Add brief API comments for new public types (required by repo guidelines)

Public header introduces new API surface (options, dispatcher, factory) but has no purpose/behavior comments (params/returns/errors).

mllm/backends/ascend/memory/AscendMemoryPool.hpp (2)

20-24: Prefer size_t for allocation sizes to avoid truncation

uint32_t size can truncate large allocations on 64-bit builds; consider size_t.


15-25: Add brief comments for public methods (purpose / params / errors)

This is a new public header API and should describe allocation policy, failure behavior, and thread-safety expectations.

mllm/backends/ascend/CMakeLists.txt (2)

17-29: Avoid bare file(GLOB ...) for sources (or add CONFIGURE_DEPENDS)
Without CONFIGURE_DEPENDS, adding/removing files won’t reliably trigger a CMake reconfigure.

-file(GLOB MLLM_ASCEND_CORE_FILES ${CMAKE_CURRENT_SOURCE_DIR}/Ascend*.cpp)
-file(GLOB MLLM_ASCEND_MEMORY_FILES ${CMAKE_CURRENT_SOURCE_DIR}/memory/*.cpp)
-file(GLOB MLLM_ASCEND_OPS_FILES ${CMAKE_CURRENT_SOURCE_DIR}/ops/*.cpp)
+file(GLOB CONFIGURE_DEPENDS MLLM_ASCEND_CORE_FILES "${CMAKE_CURRENT_SOURCE_DIR}/Ascend*.cpp")
+file(GLOB CONFIGURE_DEPENDS MLLM_ASCEND_MEMORY_FILES "${CMAKE_CURRENT_SOURCE_DIR}/memory/*.cpp")
+file(GLOB CONFIGURE_DEPENDS MLLM_ASCEND_OPS_FILES "${CMAKE_CURRENT_SOURCE_DIR}/ops/*.cpp")

30-42: Quote env-derived paths; consider avoiding PUBLIC exposure of machine-local include dirs
Unquoted $ENV{...} breaks if the path contains spaces. Also, PUBLIC include dirs propagate into dependents (and can leak build-machine paths).

-  target_include_directories(MllmAscendBackend PUBLIC $ENV{ASCEND_HOME_PATH}/include)
-  target_link_directories(MllmAscendBackend PRIVATE $ENV{ASCEND_HOME_PATH}/lib64)
+  target_include_directories(MllmAscendBackend PUBLIC "$ENV{ASCEND_HOME_PATH}/include")
+  target_link_directories(MllmAscendBackend PRIVATE "$ENV{ASCEND_HOME_PATH}/lib64")
@@
-  target_include_directories(MllmAscendBackend PUBLIC $ENV{ATB_HOME_PATH}/include)
-  target_link_directories(MllmAscendBackend PRIVATE $ENV{ATB_HOME_PATH}/lib)
+  target_include_directories(MllmAscendBackend PUBLIC "$ENV{ATB_HOME_PATH}/include")
+  target_link_directories(MllmAscendBackend PRIVATE "$ENV{ATB_HOME_PATH}/lib")
mllm/backends/ascend/AscendCommon.hpp (2)

41-42: Rename ttensor (identifier-length rule)
fillAtbTensorDesc(const Tensor& t, ...) (Line 41) is flagged; rename for readability + clang-tidy compliance.


92-95: Reduce “easily swappable parameters” for prepareAscendTensor
(host_data, int batch, int size) is easy to call incorrectly. Prefer prepareAscendTensor(host_data, /*shape=*/{batch, size}) or rename sizefeatures (or similar).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 439e3cc and 6b743b7.

📒 Files selected for processing (23)
  • examples/CMakeLists.txt (1 hunks)
  • examples/ascend_add_demo/CMakeLists.txt (1 hunks)
  • examples/ascend_add_demo/main.cpp (1 hunks)
  • mllm/backends/ascend/AscendAllocator.cpp (1 hunks)
  • mllm/backends/ascend/AscendAllocator.hpp (2 hunks)
  • mllm/backends/ascend/AscendBackend.cpp (1 hunks)
  • mllm/backends/ascend/AscendCommon.cpp (1 hunks)
  • mllm/backends/ascend/AscendCommon.hpp (1 hunks)
  • mllm/backends/ascend/AscendDispatcher.cpp (1 hunks)
  • mllm/backends/ascend/AscendDispatcher.hpp (1 hunks)
  • mllm/backends/ascend/CMakeLists.txt (2 hunks)
  • mllm/backends/ascend/Register.cpp (1 hunks)
  • mllm/backends/ascend/memory/AscendMemoryManager.cpp (1 hunks)
  • mllm/backends/ascend/memory/AscendMemoryManager.hpp (1 hunks)
  • mllm/backends/ascend/memory/AscendMemoryPool.cpp (1 hunks)
  • mllm/backends/ascend/memory/AscendMemoryPool.hpp (1 hunks)
  • mllm/backends/ascend/memory/MemoryBlock.hpp (1 hunks)
  • mllm/backends/ascend/ops/AscendElewiseOps.hpp (1 hunks)
  • mllm/backends/ascend/ops/AscnedElewiseOps.cpp (1 hunks)
  • mllm/core/DeviceTypes.hpp (2 hunks)
  • mllm/engine/Dispatcher.hpp (1 hunks)
  • mllm/utils/Common.hpp (1 hunks)
  • tasks/build_arm_ascend.yaml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/engine/Dispatcher.hpp
  • mllm/backends/ascend/ops/AscendElewiseOps.hpp
  • mllm/backends/ascend/memory/AscendMemoryManager.hpp
  • mllm/backends/ascend/AscendBackend.cpp
  • mllm/backends/ascend/memory/AscendMemoryPool.hpp
  • mllm/backends/ascend/AscendCommon.hpp
  • mllm/backends/ascend/ops/AscnedElewiseOps.cpp
  • mllm/backends/ascend/AscendDispatcher.cpp
  • mllm/backends/ascend/AscendDispatcher.hpp
  • mllm/backends/ascend/memory/AscendMemoryPool.cpp
  • mllm/backends/ascend/memory/MemoryBlock.hpp
  • mllm/backends/ascend/memory/AscendMemoryManager.cpp
  • mllm/utils/Common.hpp
  • mllm/core/DeviceTypes.hpp
  • mllm/backends/ascend/Register.cpp
  • mllm/backends/ascend/AscendAllocator.cpp
  • mllm/backends/ascend/AscendCommon.cpp
  • mllm/backends/ascend/AscendAllocator.hpp
  • mllm/backends/ascend/CMakeLists.txt
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/engine/Dispatcher.hpp
  • mllm/backends/ascend/ops/AscendElewiseOps.hpp
  • mllm/backends/ascend/memory/AscendMemoryManager.hpp
  • mllm/backends/ascend/AscendBackend.cpp
  • mllm/backends/ascend/memory/AscendMemoryPool.hpp
  • mllm/backends/ascend/AscendCommon.hpp
  • mllm/backends/ascend/ops/AscnedElewiseOps.cpp
  • mllm/backends/ascend/AscendDispatcher.cpp
  • mllm/backends/ascend/AscendDispatcher.hpp
  • mllm/backends/ascend/memory/AscendMemoryPool.cpp
  • mllm/backends/ascend/memory/MemoryBlock.hpp
  • mllm/backends/ascend/memory/AscendMemoryManager.cpp
  • mllm/utils/Common.hpp
  • mllm/core/DeviceTypes.hpp
  • mllm/backends/ascend/Register.cpp
  • mllm/backends/ascend/AscendAllocator.cpp
  • mllm/backends/ascend/AscendCommon.cpp
  • mllm/backends/ascend/AscendAllocator.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/engine/Dispatcher.hpp
  • mllm/backends/ascend/ops/AscendElewiseOps.hpp
  • mllm/backends/ascend/memory/AscendMemoryManager.hpp
  • mllm/backends/ascend/AscendBackend.cpp
  • mllm/backends/ascend/memory/AscendMemoryPool.hpp
  • mllm/backends/ascend/AscendCommon.hpp
  • mllm/backends/ascend/ops/AscnedElewiseOps.cpp
  • mllm/backends/ascend/AscendDispatcher.cpp
  • mllm/backends/ascend/AscendDispatcher.hpp
  • mllm/backends/ascend/memory/AscendMemoryPool.cpp
  • mllm/backends/ascend/memory/MemoryBlock.hpp
  • mllm/backends/ascend/memory/AscendMemoryManager.cpp
  • mllm/utils/Common.hpp
  • mllm/core/DeviceTypes.hpp
  • mllm/backends/ascend/Register.cpp
  • mllm/backends/ascend/AscendAllocator.cpp
  • mllm/backends/ascend/AscendCommon.cpp
  • mllm/backends/ascend/AscendAllocator.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/ascend/AscendBackend.cpp
  • mllm/backends/ascend/ops/AscnedElewiseOps.cpp
  • mllm/backends/ascend/AscendDispatcher.cpp
  • mllm/backends/ascend/memory/AscendMemoryPool.cpp
  • mllm/backends/ascend/memory/AscendMemoryManager.cpp
  • mllm/backends/ascend/Register.cpp
  • mllm/backends/ascend/AscendAllocator.cpp
  • mllm/backends/ascend/AscendCommon.cpp
{mllm,mllm-cli,pymllm}/**/*.{toml,json,txt,md}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

When suggesting new dependencies, evaluate their maturity, licensing, and maintenance status. Prefer well-established and actively maintained libraries.

Files:

  • mllm/backends/ascend/CMakeLists.txt
🧬 Code graph analysis (7)
mllm/backends/ascend/memory/AscendMemoryManager.hpp (1)
mllm/backends/ascend/memory/AscendMemoryPool.hpp (3)
  • size (20-20)
  • block_id (22-22)
  • block_id (24-24)
mllm/backends/ascend/AscendBackend.cpp (4)
mllm/backends/ascend/AscendBackend.hpp (1)
  • AscendBackend (13-13)
mllm/backends/ascend/AscendAllocator.cpp (2)
  • createAscendAllocator (104-104)
  • createAscendAllocator (104-104)
mllm/backends/ascend/AscendAllocator.hpp (1)
  • createAscendAllocator (46-46)
mllm/backends/ascend/AscendCommon.hpp (1)
  • instance (57-60)
mllm/backends/ascend/memory/AscendMemoryPool.hpp (1)
mllm/backends/ascend/memory/AscendMemoryPool.cpp (2)
  • AscendMemoryPool (14-26)
  • AscendMemoryPool (28-36)
mllm/backends/ascend/AscendCommon.hpp (1)
mllm/backends/ascend/AscendCommon.cpp (5)
  • AscendTensorHandle (14-15)
  • AscendTensorHandle (17-17)
  • AscendTensorHandle (19-22)
  • other (24-32)
  • other (24-24)
mllm/backends/ascend/ops/AscnedElewiseOps.cpp (1)
mllm/backends/ascend/ops/AscendElewiseOps.hpp (5)
  • AscendAddOp (14-14)
  • options (22-24)
  • options (22-22)
  • inputs (16-16)
  • inputs (17-17)
mllm/backends/ascend/Register.cpp (5)
mllm/backends/ascend/AscendCommon.hpp (1)
  • instance (57-60)
mllm/backends/ascend/memory/AscendMemoryManager.hpp (1)
  • pool_size (18-18)
mllm/engine/DispatcherManager.hpp (1)
  • dispatcher (49-49)
mllm/backends/ascend/AscendDispatcher.cpp (2)
  • createAscendDispatcher (84-87)
  • createAscendDispatcher (84-85)
mllm/backends/ascend/AscendDispatcher.hpp (1)
  • createAscendDispatcher (36-37)
mllm/backends/ascend/AscendAllocator.hpp (1)
mllm/backends/ascend/AscendAllocator.cpp (2)
  • AscendAllocator (11-13)
  • AscendAllocator (15-21)
🪛 Clang (14.0.6)
mllm/engine/Dispatcher.hpp

[error] 28-28: invalid case style for global constant 'ascend_dispatcher_id'

(readability-identifier-naming,-warnings-as-errors)

examples/ascend_add_demo/main.cpp

[error] 98-98: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 111-111: if with identical then and else branches

(bugprone-branch-clone,-warnings-as-errors)

mllm/backends/ascend/memory/AscendMemoryPool.hpp

[error] 15-15: class 'AscendMemoryPool' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/ascend/AscendCommon.hpp

[error] 18-18: function-like macro 'MLLM_ACL_CHECK' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)


[error] 19-19: macro argument should be enclosed in parentheses

(bugprone-macro-parentheses,-warnings-as-errors)


[error] 24-24: function-like macro 'MLLM_ATB_CHECK' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)


[error] 25-25: macro argument should be enclosed in parentheses

(bugprone-macro-parentheses,-warnings-as-errors)


[error] 41-41: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 44-44: constructor does not initialize these fields: name, id, total_memory, free_memory, soc_version

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 53-53: class 'AscendDeviceMetaInfo' defines a copy constructor and a copy assignment operator but does not define a destructor, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 69-69: constructor does not initialize these fields: tensor_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 80-80: function 'valid' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)


[error] 83-83: function 'tensor' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)


[error] 84-84: function 'blockId' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)

mllm/backends/ascend/AscendDispatcher.hpp

[error] 13-13: constructor does not initialize these fields: MLLM_EMPTY_SCOPE

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 21-21: parameter name 'id' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/ascend/memory/AscendMemoryPool.cpp

[error] 14-14: constructor does not initialize these fields: block_mutex_, used_blocks_, free_blocks_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 42-42: method 'allocateBlock' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 45-45: variable 'align_size' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 59-59: variable 'cur_mem_ptr_align' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 73-73: method 'freeBlock' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 81-81: variable name 'it' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 90-90: method 'getBlockPtr' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 98-98: variable name 'it' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/ascend/memory/MemoryBlock.hpp

[error] 11-11: constructor does not initialize these fields: block_id_, block_size_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

mllm/backends/ascend/memory/AscendMemoryManager.cpp

[error] 13-13: variable 'g_ascendMemoryManager' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)


[error] 13-13: invalid case style for variable 'g_ascendMemoryManager'

(readability-identifier-naming,-warnings-as-errors)


[error] 15-15: use '= default' to define a trivial default constructor

(modernize-use-equals-default,-warnings-as-errors)


[error] 21-21: method 'createMemoryPool' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 23-23: variable 'device_count' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 38-38: variable 'device_id' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 46-46: variable 'device_id' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

mllm/backends/ascend/Register.cpp

[error] 19-19: variable 'pool_size' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

mllm/backends/ascend/AscendCommon.cpp

[error] 12-12: variable 'g_atb_stream' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)


[error] 43-43: 3 adjacent parameters of 'prepareAscendTensor' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 46-46: variable 'expected_elements' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 58-58: variable 'bytes' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 80-80: variable 'ctx' provides global access to a non-const object; consider making the pointed-to data 'const'

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)


[error] 113-113: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/ascend/AscendAllocator.hpp

[error] 15-15: class 'AscendAllocator' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

🔇 Additional comments (9)
mllm/utils/Common.hpp (1)

34-34: LGTM! Well-integrated enum value addition.

The kAscendError enum value is correctly placed alongside other backend-specific error codes and follows the established naming convention.

mllm/core/DeviceTypes.hpp (1)

36-37: LGTM! Ascend device type integration follows existing patterns.

The additions to both conversion functions are consistent with existing device types and properly handle the new Ascend and AscendHost device types.

Also applies to: 52-55

mllm/engine/Dispatcher.hpp (1)

28-28: LGTM! Dispatcher ID follows existing naming conventions.

The naming style is consistent with all other dispatcher IDs in the file (cpu_dispatcher_id, cuda_dispatcher_id, etc.). While the static analysis tool flags the naming convention, changing only this identifier would create inconsistency with the established codebase pattern.

mllm/backends/ascend/AscendBackend.cpp (1)

13-21: LGTM! Backend initialization follows established patterns.

The constructor properly registers the operation factory and logs device information. The lambda for byte-to-MB conversion is clear and the logging will be helpful for users to verify their Ascend hardware configuration.

examples/ascend_add_demo/main.cpp (1)

17-21: LGTM! Excellent error handling and resource management.

The demo demonstrates good practices:

  • Comprehensive error handling with try-catch
  • Proper cleanup on both success and error paths (lines 70-72, 118-119)
  • Clear verification logic with tolerance checking
  • Informative output for users

Also applies to: 63-73, 111-121, 123-126

mllm/backends/ascend/ops/AscnedElewiseOps.cpp (2)

22-35: The file mllm/backends/ascend/ops/AscnedElewiseOps.cpp does not exist in the repository. The review comment references a code location that cannot be found in the codebase. Verify the correct file path and file name before proceeding with this review.

Likely an incorrect or invalid review comment.


37-116: The file mllm/backends/ascend/ops/AscnedElewiseOps.cpp does not exist in the repository, and the code snippet shown in the review comment cannot be located through any search. This means either:

  1. The code is being added in a pending change not yet in the repository, or
  2. The file path is incorrect (note the typo "Ascned" instead of "Ascend")

Without access to the actual file being reviewed, the substantive concerns cannot be definitively verified. However, the review comment raises legitimate C++ safety concerns that are generally valid:

  • null-pointer checks for context objects are prudent defensive programming
  • RAII guards for resource management (operation handles) align with C++ best practices per the coding guidelines
  • Explicit handling of memory allocation failures is required for production-ready code

To properly verify this review comment, please:

  1. Confirm the correct file path
  2. Provide context on whether this is code in an uncommitted change or pending PR
  3. Ensure the file is accessible in the current repository state
mllm/backends/ascend/memory/AscendMemoryPool.cpp (1)

12-26: ****

The file mllm/backends/ascend/memory/AscendMemoryPool.cpp mentioned in this review comment could not be located in the current repository state. Manual verification is required to:

  1. Confirm the file exists at the specified path or locate its actual location
  2. Verify the header file AscendMemoryPool.hpp and whether it declares a default argument for the constructor
  3. Validate the presence of trailing whitespace on the lines mentioned
  4. Confirm clang-tidy member-initialization warnings with -Werror flag in the CI configuration
  5. Review the actual class member declarations to ensure the proposed initializer list in the diff matches all members

The review comment's claims regarding default argument placement, clang-tidy warnings, and whitespace violations cannot be verified without access to the actual source files.

mllm/backends/ascend/AscendAllocator.hpp (1)

20-20: Double-check ctrlByMemManager() semantics now that allocations are “memory pool support”

If this flag is used by the engine to decide who owns lifecycle/free behavior, returning false may be wrong after switching to a global memory manager + block IDs.

Comment on lines +56 to +77
case TaskTypes::kExecuteModule: {
auto moduleName = static_cast<nn::Module*>(task->custom_context_ptr)->getModuleName();
#ifdef MLLM_PERFETTO_ENABLE
MLLM_PERF_TRACE_EVENT("mllm.ascend.execute.", perfetto::DynamicString{moduleName},
[&](perfetto::EventContext ctx) {
int cnt = 0;
for (auto& i : task->inputs) {
ctx.AddDebugAnnotation(perfetto::DynamicString{"inputs-" + std::to_string(cnt++)},
i.shape());
}
});
#endif
auto ascendBackend = std::static_pointer_cast<AscendBackend>(Context::instance().getBackend(kAscend));

task->outputs = ((nn::Module*)(task->custom_context_ptr))->forward(task->inputs, task->args);

// TODO:
// ascendBackend->graphExecute(moduleName, task->inputs, task->outputs);
break;
}
default: NYI("AscendDispatcher::process not supported task type");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Avoid unchecked custom_context_ptr cast + fix unused ascendBackend (-Werror risk)

  • task->custom_context_ptr is used as nn::Module* without null/type validation.
  • ascendBackend is currently unused (likely -Wunused-variable).
     case TaskTypes::kExecuteModule: {
-      auto moduleName = static_cast<nn::Module*>(task->custom_context_ptr)->getModuleName();
+      auto* module = static_cast<nn::Module*>(task->custom_context_ptr);
+      if (module == nullptr) {
+        MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Task custom_context_ptr is null (expected nn::Module*)");
+      }
+      auto moduleName = module->getModuleName();
 #ifdef MLLM_PERFETTO_ENABLE
       MLLM_PERF_TRACE_EVENT("mllm.ascend.execute.", perfetto::DynamicString{moduleName},
                             [&](perfetto::EventContext ctx) {
                               int cnt = 0;
                               for (auto& i : task->inputs) {
                                 ctx.AddDebugAnnotation(perfetto::DynamicString{"inputs-" + std::to_string(cnt++)},
                                                        i.shape());
                               }
                             });
 #endif
-      auto ascendBackend = std::static_pointer_cast<AscendBackend>(Context::instance().getBackend(kAscend));
+      [[maybe_unused]] auto ascend_backend =
+          std::static_pointer_cast<AscendBackend>(Context::instance().getBackend(kAscend));
 
-      task->outputs = ((nn::Module*)(task->custom_context_ptr))->forward(task->inputs, task->args);
+      task->outputs = module->forward(task->inputs, task->args);
 
-      // TODO:
-      // ascendBackend->graphExecute(moduleName, task->inputs, task->outputs);
+      // TODO: Wire graph execution path (if/when supported on Ascend backend).
+      // ascend_backend->graphExecute(moduleName, task->inputs, task->outputs);
       break;
     }
🤖 Prompt for AI Agents
In mllm/backends/ascend/AscendDispatcher.cpp around lines 56 to 77, avoid the
unchecked static cast of task->custom_context_ptr to nn::Module* and resolve the
unused ascendBackend variable: first validate that task->custom_context_ptr is
non-null and is a valid nn::Module instance (use a safe cast such as
dynamic_cast on a properly-typed pointer or check a runtime type field) and
handle the error path (log and bail out) if the pointer is null or the cast
fails; replace the static_cast with that safe cast and use the resulting
nn::Module* to call forward; for ascendBackend either remove the variable if not
needed or mark it used (e.g., call the TODO graphExecute path or explicitly
suppress unused-warning) so there is no -Wunused-variable.

Comment on lines +44 to +51
std::shared_ptr<AscendMemoryPool> &AscendMemoryManager::getMemoryPool()
{
size_t device_id = static_cast<size_t>(getDeviceId());
if (device_id >= memory_pools_.size()) {
MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Invalid device id {}", device_id);
}
return memory_pools_[device_id];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

getMemoryPool: handle negative device id and “not initialized” state explicitly

Casting -1 to size_t creates a huge index and hides the real error.

 std::shared_ptr<AscendMemoryPool> &AscendMemoryManager::getMemoryPool()
 {
-    size_t device_id = static_cast<size_t>(getDeviceId());
+    if (memory_pools_.empty()) {
+        MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Ascend memory pools not initialized");
+    }
+    const int32_t device_id_i32 = getDeviceId();
+    if (device_id_i32 < 0) {
+        MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Invalid device id {}", device_id_i32);
+    }
+    const size_t device_id = static_cast<size_t>(device_id_i32);
     if (device_id >= memory_pools_.size()) {
         MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Invalid device id {}", device_id);
     }
     return memory_pools_[device_id];
 }
🧰 Tools
🪛 Clang (14.0.6)

[error] 46-46: variable 'device_id' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

🤖 Prompt for AI Agents
In mllm/backends/ascend/memory/AscendMemoryManager.cpp around lines 44 to 51,
the current code casts getDeviceId() to size_t which hides negative values
(e.g., -1) by producing a huge unsigned index; change to read the device id as a
signed integer (auto or int), explicitly check for device_id < 0 and call
MLLM_ERROR_EXIT with a clear "device not initialized" or "invalid device id"
message, then check the non-negative device_id against memory_pools_.size()
before returning memory_pools_[device_id]; this ensures negative/uninitialized
ids are handled distinctly from out-of-range indices.

Comment on lines +42 to +71
void AscendMemoryPool::allocateBlock(uint32_t size, int &block_id) {
std::unique_lock<std::mutex> lock(block_mutex_);

size_t align_size = ((size + 31) & ~31) + 32;

for (auto it = free_blocks_.begin(); it != free_blocks_.end(); it++) {
if (it->second.block_size_ >= align_size) {
block_id = it->second.block_id_;
used_blocks_.insert(*it);
free_blocks_.erase(it);
MLLM_INFO("find free block id {} to allocate", block_id);
return;
}
}

if (remain_size_ > align_size) {
block_id = generateBlocksId();
uint64_t cur_mem_ptr_align = (reinterpret_cast<uint64_t>(cur_mem_ptr_) + 63) & ~63;
remain_size_ -= (cur_mem_ptr_align - reinterpret_cast<uint64_t>(cur_mem_ptr_));
cur_mem_ptr_ = reinterpret_cast<void *>(cur_mem_ptr_align);

MemoryBlock block = {block_id, align_size, cur_mem_ptr_};
used_blocks_.insert({block_id, block});
remain_size_ -= align_size;
cur_mem_ptr_ = reinterpret_cast<uint8_t *>(cur_mem_ptr_) + align_size;
MLLM_INFO("allocate block id {} for size {}", block_id, align_size);
return;
}
MLLM_ERROR("allocate block fail");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# 1. Find AscendMemoryPool and related files
echo "=== Finding Ascend memory-related files ==="
find . -type f -name "*AscendMemory*" -o -name "*AscendAllocator*" 2>/dev/null | head -20

# 2. Look for AscendAllocator::alignSize
echo -e "\n=== Searching for AscendAllocator::alignSize ==="
rg "alignSize" --type cpp --type h -B 2 -A 5 2>/dev/null | head -60

# 3. Check current memory pool implementation
echo -e "\n=== Current AscendMemoryPool.cpp (lines around 42-71) ==="
wc -l mllm/backends/ascend/memory/AscendMemoryPool.cpp 2>/dev/null

Repository: UbiquitousLearning/mllm

Length of output: 4294


Fix pool capacity check + alignment math (padding underflow / exact-fit bug).

The allocation check uses remain_size_ > align_size but fails to account for pointer alignment padding, causing exact-fit allocations to be rejected and potential underflow of remain_size_ in edge cases. For example, with remain_size_ = 96 and align_size = 96, the check 96 > 96 fails even though the allocation is possible. Additionally, with padding to a 64B boundary, a value like remain_size_ = 144 passes the check but underflows after subtracting both padding (64 bytes) and the block size (96 bytes), resulting in -16. The alignment formula ((size + 31) & ~31) + 32 also uses an inconsistent approach compared to pointer alignment (ptr + 63) & ~63, and the uint32_t size parameter can overflow in the arithmetic.

Use uintptr_t and unify to 64B alignment with constexpr size_t kAlign = 64. Cast the size parameter to size_t to prevent overflow. Change the capacity check to remain_size_ >= padding + align_size to account for alignment overhead and allow exact-fit allocations. Rename the iterator variable to free_it for clarity. Set block_id = -1 on allocation failure and enhance the error log with allocation parameters for debugging.

 void AscendMemoryPool::allocateBlock(uint32_t size, int &block_id) {
     std::unique_lock<std::mutex> lock(block_mutex_);
 
-    size_t align_size = ((size + 31) & ~31) + 32;  
+    constexpr size_t kAlign = 64;
+    const size_t req = static_cast<size_t>(size);
+    const size_t align_size = (req + (kAlign - 1)) & ~(kAlign - 1);
 
-    for (auto it = free_blocks_.begin(); it != free_blocks_.end(); it++) {
-        if (it->second.block_size_ >= align_size) {
-            block_id = it->second.block_id_;
-            used_blocks_.insert(*it);
-            free_blocks_.erase(it);
+    for (auto free_it = free_blocks_.begin(); free_it != free_blocks_.end(); ++free_it) {
+        if (free_it->second.block_size_ >= align_size) {
+            block_id = free_it->second.block_id_;
+            used_blocks_.insert(*free_it);
+            free_blocks_.erase(free_it);
             MLLM_INFO("find free block id {} to allocate", block_id);
             return;
         }
     }
 
-    if (remain_size_ > align_size) {
+    const uintptr_t cur = reinterpret_cast<uintptr_t>(cur_mem_ptr_);
+    const uintptr_t aligned = (cur + (kAlign - 1)) & ~(static_cast<uintptr_t>(kAlign - 1));
+    const size_t padding = static_cast<size_t>(aligned - cur);
+    if (remain_size_ >= padding + align_size) {
         block_id = generateBlocksId();
-        uint64_t cur_mem_ptr_align = (reinterpret_cast<uint64_t>(cur_mem_ptr_) + 63) & ~63;  
-        remain_size_ -= (cur_mem_ptr_align - reinterpret_cast<uint64_t>(cur_mem_ptr_));
-        cur_mem_ptr_ = reinterpret_cast<void *>(cur_mem_ptr_align);
+        remain_size_ -= padding;
+        cur_mem_ptr_ = reinterpret_cast<void *>(aligned);
 
         MemoryBlock block = {block_id, align_size, cur_mem_ptr_};
         used_blocks_.insert({block_id, block});
         remain_size_ -= align_size;
         cur_mem_ptr_ = reinterpret_cast<uint8_t *>(cur_mem_ptr_) + align_size;
         MLLM_INFO("allocate block id {} for size {}", block_id, align_size);
         return;
     }
-    MLLM_ERROR("allocate block fail");
+    block_id = -1;
+    MLLM_ERROR("allocate block fail: req={}, aligned={}, remain={}", req, align_size, remain_size_);
 }
🧰 Tools
🪛 Clang (14.0.6)

[error] 42-42: method 'allocateBlock' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 45-45: variable 'align_size' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 59-59: variable 'cur_mem_ptr_align' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

🤖 Prompt for AI Agents
In mllm/backends/ascend/memory/AscendMemoryPool.cpp around lines 42 to 71, the
allocation logic does incorrect alignment math and capacity checking: change to
use constexpr size_t kAlign = 64 and cast the uint32_t size parameter to size_t
to avoid overflow, compute align_size as a 64-byte-aligned size (use the same
64B alignment rule you use for the pointer), compute padding =
((uintptr_t(cur_mem_ptr_) + (kAlign-1)) & ~(kAlign-1)) -
uintptr_t(cur_mem_ptr_), and then check remain_size_ >= padding + align_size
(allowing exact-fit and preventing underflow); when advancing pointer use
uintptr_t for alignment and update remain_size_ by subtracting padding before
subtracting align_size; rename the free-block iterator to free_it for clarity;
on failure set block_id = -1 and make the error log include current size,
align_size, padding and remain_size_ for debugging.

Comment on lines +15 to +38
class AscendMemoryPool {
public:
explicit AscendMemoryPool(size_t pool_size);
~AscendMemoryPool();

void allocateBlock(uint32_t size, int& block_id);

void freeBlock(int block_id);

void getBlockPtr(int block_id, void*& addr);

private:
uint64_t generateBlocksId();

std::atomic<uint64_t> id_ = 0;
std::mutex block_mutex_;

void* base_mem_ptr_ = nullptr;
void* cur_mem_ptr_ = nullptr;
int64_t remain_size_ = 0;

std::unordered_map<int, MemoryBlock> used_blocks_;
std::unordered_map<int, MemoryBlock> free_blocks_;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Define copy/move semantics (Rule of Five) for pool owning raw device memory

Current type is copyable by default but owns base_mem_ptr_ and has a mutex/maps → copying is unsafe.

 class AscendMemoryPool {
 public:
     explicit AscendMemoryPool(size_t pool_size);
     ~AscendMemoryPool();
+
+    AscendMemoryPool(const AscendMemoryPool&) = delete;
+    AscendMemoryPool& operator=(const AscendMemoryPool&) = delete;
+    AscendMemoryPool(AscendMemoryPool&&) = delete;
+    AscendMemoryPool& operator=(AscendMemoryPool&&) = delete;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class AscendMemoryPool {
public:
explicit AscendMemoryPool(size_t pool_size);
~AscendMemoryPool();
void allocateBlock(uint32_t size, int& block_id);
void freeBlock(int block_id);
void getBlockPtr(int block_id, void*& addr);
private:
uint64_t generateBlocksId();
std::atomic<uint64_t> id_ = 0;
std::mutex block_mutex_;
void* base_mem_ptr_ = nullptr;
void* cur_mem_ptr_ = nullptr;
int64_t remain_size_ = 0;
std::unordered_map<int, MemoryBlock> used_blocks_;
std::unordered_map<int, MemoryBlock> free_blocks_;
};
class AscendMemoryPool {
public:
explicit AscendMemoryPool(size_t pool_size);
~AscendMemoryPool();
AscendMemoryPool(const AscendMemoryPool&) = delete;
AscendMemoryPool& operator=(const AscendMemoryPool&) = delete;
AscendMemoryPool(AscendMemoryPool&&) = delete;
AscendMemoryPool& operator=(AscendMemoryPool&&) = delete;
void allocateBlock(uint32_t size, int& block_id);
void freeBlock(int block_id);
void getBlockPtr(int block_id, void*& addr);
private:
uint64_t generateBlocksId();
std::atomic<uint64_t> id_ = 0;
std::mutex block_mutex_;
void* base_mem_ptr_ = nullptr;
void* cur_mem_ptr_ = nullptr;
int64_t remain_size_ = 0;
std::unordered_map<int, MemoryBlock> used_blocks_;
std::unordered_map<int, MemoryBlock> free_blocks_;
};
🧰 Tools
🪛 Clang (14.0.6)

[error] 15-15: class 'AscendMemoryPool' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
mllm/backends/ascend/AscendDispatcher.cpp (1)

55-73: Unchecked custom_context_ptr cast and unused ascendBackend variable.

The task->custom_context_ptr is cast to nn::Module* without null validation (lines 56, 69), which could cause undefined behavior. Additionally, ascendBackend on line 67 is unused, which will trigger -Wunused-variable warnings.

Apply this diff to add validation and suppress the unused warning:

     case TaskTypes::kExecuteModule: {   //TODO: execute module
-      auto moduleName = static_cast<nn::Module*>(task->custom_context_ptr)->getModuleName();
+      auto* module = static_cast<nn::Module*>(task->custom_context_ptr);
+      if (module == nullptr) {
+        MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Task custom_context_ptr is null");
+      }
+      auto moduleName = module->getModuleName();
 #ifdef MLLM_PERFETTO_ENABLE
       MLLM_PERF_TRACE_EVENT("mllm.ascend.execute.", perfetto::DynamicString{moduleName},
                             [&](perfetto::EventContext ctx) {
                               int cnt = 0;
                               for (auto& i : task->inputs) {
                                 ctx.AddDebugAnnotation(perfetto::DynamicString{"inputs-" + std::to_string(cnt++)},
                                                        i.shape());
                               }
                             });
 #endif
-      auto ascendBackend = std::static_pointer_cast<AscendBackend>(Context::instance().getBackend(kAscend));
+      [[maybe_unused]] auto ascendBackend =
+          std::static_pointer_cast<AscendBackend>(Context::instance().getBackend(kAscend));
 
-      task->outputs = ((nn::Module*)(task->custom_context_ptr))->forward(task->inputs, task->args);
+      task->outputs = module->forward(task->inputs, task->args);
mllm/backends/ascend/AscendDispatcher.hpp (1)

13-15: AscendDispatcherOptions using MLLM_EMPTY_SCOPE causes uninitialized field warning.

Static analysis reports that the implicit constructor doesn't initialize fields introduced by MLLM_EMPTY_SCOPE. If the struct is currently empty, prefer a truly empty struct body or a comment placeholder.

 struct AscendDispatcherOptions {
-  MLLM_EMPTY_SCOPE;
+  // Reserved for future options.
 };
mllm/backends/ascend/memory/AscendMemoryPool.hpp (1)

15-38: Define copy/move semantics (Rule of Five) - class owns raw device memory.

The class defines a non-default destructor that frees base_mem_ptr_, but copy/move operations are implicitly generated. Copying this class would cause double-free of device memory. The mutex and maps also make copying semantically incorrect.

 class AscendMemoryPool {
 public:
     explicit AscendMemoryPool(size_t pool_size);
     ~AscendMemoryPool();
+
+    AscendMemoryPool(const AscendMemoryPool&) = delete;
+    AscendMemoryPool& operator=(const AscendMemoryPool&) = delete;
+    AscendMemoryPool(AscendMemoryPool&&) = delete;
+    AscendMemoryPool& operator=(AscendMemoryPool&&) = delete;
 
     void allocateBlock(uint32_t size, int& block_id);
mllm/backends/ascend/memory/AscendMemoryManager.cpp (1)

43-50: CRITICAL: Handle negative device ID before casting to size_t.

This issue was flagged in a previous review but remains unaddressed. Casting a negative device ID (e.g., -1) to size_t produces a very large unsigned value that could bypass the bounds check or cause undefined behavior.

Apply the fix suggested in the previous review:

 std::shared_ptr<AscendMemoryPool> &AscendMemoryManager::getMemoryPool()
 {
-    size_t device_id = static_cast<size_t>(getDeviceId());
+    if (memory_pools_.empty()) {
+        MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Ascend memory pools not initialized");
+    }
+    const int32_t device_id_i32 = getDeviceId();
+    if (device_id_i32 < 0) {
+        MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Invalid device id {}", device_id_i32);
+    }
+    const size_t device_id = static_cast<size_t>(device_id_i32);
     if (device_id >= memory_pools_.size()) {
         MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Invalid device id {}", device_id);
     }
     return memory_pools_[device_id];
 }
🧹 Nitpick comments (6)
examples/ascend_add_demo/main.cpp (3)

22-41: Consider extracting repetitive print logic into a helper function.

The loops for printing vectors (lines 29-34 and 36-41) follow the same pattern and could be refactored into a reusable helper:

void printVector(const std::string& label, const std::vector<float>& vec) {
  std::cout << "   " << label << ": [";
  for (size_t i = 0; i < vec.size(); ++i) {
    std::cout << vec[i];
    if (i < vec.size() - 1) std::cout << ", ";
  }
  std::cout << "]" << std::endl;
}

This would reduce duplication and improve readability.


82-109: Consider extracting verification logic for reusability.

The verification loop (lines 100-109) and the print loops (lines 83-88, 90-95) could be extracted into helper functions to reduce duplication and improve testability:

bool verifyResults(const std::vector<float>& actual, 
                   const std::vector<float>& expected, 
                   float tolerance) {
  bool correct = true;
  for (size_t i = 0; i < actual.size(); ++i) {
    float diff = std::abs(actual[i] - expected[i]);
    if (diff > tolerance) {
      correct = false;
      std::cout << "   ✗ Mismatch at index " << i 
                << ": expected " << expected[i] 
                << ", got " << actual[i] 
                << " (diff: " << diff << ")" << std::endl;
    }
  }
  return correct;
}

98-98: Consider using uppercase literal suffix for consistency.

Static analysis suggests using 0.1F instead of 0.1f for consistency with coding standards, though this is a minor stylistic preference.

mllm/backends/ascend/AscendDispatcher.cpp (1)

55-55: Fix TODO comment format.

The TODO comment should have a space after // per coding guidelines.

-    case TaskTypes::kExecuteModule: {   //TODO: execute module
+    case TaskTypes::kExecuteModule: {   // TODO: execute module
mllm/backends/ascend/AscendCommon.cpp (1)

50-52: Adjacent parameters of similar type (batch, size) are easily swapped.

Both batch and size are int parameters in the same position, making accidental swaps at call sites likely. Consider using a struct or named parameters for clarity.

+struct AscendTensorShape {
+  int batch;
+  int size;
+};
+
-AscendTensorHandle prepareAscendTensor(const std::vector<float>& host_data,
-                                       int batch,
-                                       int size) {
+AscendTensorHandle prepareAscendTensor(const std::vector<float>& host_data,
+                                       const AscendTensorShape& shape) {
+  const int batch = shape.batch;
+  const int size = shape.size;

Alternatively, reorder to (int batch, int size, const std::vector<float>& host_data) to separate integer params from the vector, reducing swap risk.

mllm/backends/ascend/AscendAllocator.cpp (1)

82-91: Use consistent logging macros instead of std::cout.

The rest of the file uses MLLM_INFO, MLLM_WARN, and MLLM_ERROR for logging. Using std::cout here is inconsistent and bypasses the logging infrastructure.

 bool AscendAllocator::generalAlloc(void** ptr, size_t cap, size_t align) {
-    //we don't support generalAlloc , therefore return false
-    std::cout << "generalAlloc is not supported in AscendAllocator" << std::endl;
+    // generalAlloc is not supported in AscendAllocator
+    MLLM_WARN("generalAlloc is not supported in AscendAllocator");
     return false;
 }
 
 void AscendAllocator::generalFree(void* ptr) {
-    //we don't support generalFree , therefore do nothing
-    std::cout << "generalFree is not supported in AscendAllocator" << std::endl;
+    // generalFree is not supported in AscendAllocator
+    MLLM_WARN("generalFree is not supported in AscendAllocator");
 }

This also allows removing #include <iostream> from line 6.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b743b7 and 4b37bb5.

📒 Files selected for processing (7)
  • examples/ascend_add_demo/main.cpp (1 hunks)
  • mllm/backends/ascend/AscendAllocator.cpp (1 hunks)
  • mllm/backends/ascend/AscendCommon.cpp (1 hunks)
  • mllm/backends/ascend/AscendDispatcher.cpp (1 hunks)
  • mllm/backends/ascend/AscendDispatcher.hpp (1 hunks)
  • mllm/backends/ascend/memory/AscendMemoryManager.cpp (1 hunks)
  • mllm/backends/ascend/memory/AscendMemoryPool.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/ascend/AscendDispatcher.cpp
  • mllm/backends/ascend/AscendAllocator.cpp
  • mllm/backends/ascend/memory/AscendMemoryManager.cpp
  • mllm/backends/ascend/AscendCommon.cpp
  • mllm/backends/ascend/AscendDispatcher.hpp
  • mllm/backends/ascend/memory/AscendMemoryPool.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/ascend/AscendDispatcher.cpp
  • mllm/backends/ascend/AscendAllocator.cpp
  • mllm/backends/ascend/memory/AscendMemoryManager.cpp
  • mllm/backends/ascend/AscendCommon.cpp
  • mllm/backends/ascend/AscendDispatcher.hpp
  • mllm/backends/ascend/memory/AscendMemoryPool.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/ascend/AscendDispatcher.cpp
  • mllm/backends/ascend/AscendAllocator.cpp
  • mllm/backends/ascend/memory/AscendMemoryManager.cpp
  • mllm/backends/ascend/AscendCommon.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/ascend/AscendDispatcher.cpp
  • mllm/backends/ascend/AscendAllocator.cpp
  • mllm/backends/ascend/memory/AscendMemoryManager.cpp
  • mllm/backends/ascend/AscendCommon.cpp
  • mllm/backends/ascend/AscendDispatcher.hpp
  • mllm/backends/ascend/memory/AscendMemoryPool.hpp
🧠 Learnings (3)
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/* : No line may end with trailing whitespace.

Applied to files:

  • mllm/backends/ascend/AscendDispatcher.cpp
  • mllm/backends/ascend/memory/AscendMemoryPool.hpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.

Applied to files:

  • mllm/backends/ascend/AscendDispatcher.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).

Applied to files:

  • mllm/backends/ascend/AscendCommon.cpp
🧬 Code graph analysis (4)
mllm/backends/ascend/AscendDispatcher.cpp (3)
mllm/backends/ascend/AscendDispatcher.hpp (4)
  • AscendDispatcher (21-22)
  • task (24-24)
  • task (26-26)
  • task (28-28)
mllm/engine/DispatcherManager.hpp (1)
  • thread_pool_ (37-37)
mllm/backends/ascend/AscendCommon.hpp (1)
  • instance (57-60)
mllm/backends/ascend/AscendAllocator.cpp (3)
mllm/backends/ascend/AscendAllocator.hpp (2)
  • AscendAllocator (17-17)
  • AscendAllocator (18-18)
mllm/backends/ascend/memory/AscendMemoryManager.cpp (2)
  • getAscendMemoryManager (15-18)
  • getAscendMemoryManager (15-15)
mllm/backends/ascend/memory/AscendMemoryPool.hpp (2)
  • block_id (22-22)
  • block_id (24-24)
mllm/backends/ascend/AscendCommon.cpp (3)
mllm/backends/ascend/AscendCommon.hpp (10)
  • AscendTensorHandle (70-70)
  • AscendTensorHandle (71-71)
  • AscendTensorHandle (72-72)
  • AscendTensorHandle (74-74)
  • AscendTensorHandle (76-76)
  • other (77-77)
  • tensor_ (82-82)
  • tensor_ (83-83)
  • block_id_ (80-80)
  • block_id_ (84-84)
mllm/backends/ascend/memory/AscendMemoryPool.hpp (3)
  • block_id (22-22)
  • block_id (24-24)
  • size (20-20)
mllm/backends/ascend/memory/AscendMemoryManager.cpp (2)
  • getAscendMemoryManager (15-18)
  • getAscendMemoryManager (15-15)
mllm/backends/ascend/memory/AscendMemoryPool.hpp (2)
mllm/backends/ascend/memory/AscendMemoryManager.hpp (4)
  • pool_size (18-18)
  • size (24-24)
  • block_id (26-26)
  • block_id (28-28)
mllm/backends/ascend/memory/AscendMemoryPool.cpp (2)
  • AscendMemoryPool (14-26)
  • AscendMemoryPool (28-36)
🪛 Clang (14.0.6)
examples/ascend_add_demo/main.cpp

[error] 98-98: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 111-111: if with identical then and else branches

(bugprone-branch-clone,-warnings-as-errors)

mllm/backends/ascend/memory/AscendMemoryManager.cpp

[error] 20-20: method 'createMemoryPool' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 22-22: variable 'device_count' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 37-37: variable 'device_id' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 45-45: variable 'device_id' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

mllm/backends/ascend/AscendCommon.cpp

[error] 50-50: 3 adjacent parameters of 'prepareAscendTensor' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 53-53: variable 'expected_elements' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 64-64: variable 'device_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 65-65: variable 'bytes' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 80-80: variable 'ctx' provides global access to a non-const object; consider making the pointed-to data 'const'

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)


[error] 116-116: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/ascend/AscendDispatcher.hpp

[error] 13-13: constructor does not initialize these fields: MLLM_EMPTY_SCOPE

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

mllm/backends/ascend/memory/AscendMemoryPool.hpp

[error] 15-15: class 'AscendMemoryPool' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

🔇 Additional comments (13)
examples/ascend_add_demo/main.cpp (6)

1-12: LGTM!

The includes are appropriate for an Ascend backend demo, covering ACL operations, tensor management, context execution, and elementwise operations.


17-20: LGTM!

Ascend backend initialization is properly sequenced and wrapped in exception handling.


43-48: LGTM!

Tensor preparation correctly uses the prepareAscendTensor helper and obtains tensor references for subsequent operations.


60-80: LGTM with note on resource management.

The result copying and fp16-to-float conversion is correctly implemented. The error path (lines 68-73) releases x_handle and y_handle but not z_ascend, which appears intentional assuming the Tensor class uses RAII for automatic cleanup.


111-115: LGTM!

The pass/fail reporting logic is correct. Note that the static analysis hint claiming "identical then and else branches" at line 111 is a false positive—the branches clearly differ (PASSED vs FAILED messages).


117-125: LGTM!

Resource cleanup and exception handling are properly implemented. The explicit release of tensor handles before return and the catch block for reporting errors are appropriate.

mllm/backends/ascend/AscendCommon.cpp (2)

19-48: LGTM - AscendTensorHandle RAII implementation.

The move semantics are correctly implemented with proper resource cleanup in release(). The self-assignment check in move assignment and invalidation of the source handle are good practices.


79-101: Thread-safe initialization pattern looks good.

The use of std::once_flag with std::call_once ensures thread-safe one-time initialization of the global ATB context and stream. This addresses the previous review concern about race conditions.

mllm/backends/ascend/AscendAllocator.cpp (1)

23-47: Memory pool allocation implementation looks correct.

The allocation flow properly:

  1. Allocates a block from the memory manager
  2. Gets the pointer for the block
  3. Handles failures by freeing allocated blocks
  4. Tracks the mapping from pointer to block ID

The mutex protection for the map access ensures thread safety.

mllm/backends/ascend/memory/AscendMemoryPool.hpp (1)

20-24: Memory pool API design looks appropriate.

The interface provides the essential block management operations: allocateBlock, freeBlock, and getBlockPtr. Using output parameters for block_id and addr aligns with the C-style ACL APIs being wrapped.

mllm/backends/ascend/memory/AscendMemoryManager.cpp (3)

13-18: LGTM! Singleton pattern correctly implemented.

The function-local static singleton pattern is thread-safe per C++11 and addresses the past review feedback about avoiding file-scope globals.


35-41: LGTM! Proper error handling and defensive initialization.

The function correctly initializes device_id to -1 and validates the ACL call result.


52-65: LGTM! Clean delegation pattern.

The block management methods appropriately delegate to the per-device memory pool.

}
}

void fillAtbTensorDesc(const Tensor& t, atb::TensorDesc& desc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Rename short parameter t to satisfy identifier length requirements.

The parameter name t is too short per readability-identifier-length guidelines. Rename to a descriptive identifier.

-void fillAtbTensorDesc(const Tensor& t, atb::TensorDesc& desc) {
+void fillAtbTensorDesc(const Tensor& tensor, atb::TensorDesc& desc) {
   desc.dtype = ACL_FLOAT16; // Currently hardcoded as per demo, can be expanded later
   desc.format = ACL_FORMAT_ND;
 
-  auto shape = t.shape();
+  auto shape = tensor.shape();

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Clang (14.0.6)

[error] 116-116: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

🤖 Prompt for AI Agents
In mllm/backends/ascend/AscendCommon.cpp around line 116, the function parameter
named `t` is too short; rename it to a descriptive identifier such as `tensor`
(or `inputTensor`) and update all references inside the function body to the new
name; also adjust the function declaration/signature in any corresponding header
or other translation units to match the new parameter name to satisfy the
readability-identifier-length rule.

MLLM_ACL_CHECK(ret);
for (size_t i = 0; i < device_count; i++) {

aclrtSetDevice(i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Check the return value of aclrtSetDevice.

If aclrtSetDevice fails, the memory pool will be created with an incorrect device context, leading to device mismatch errors in subsequent operations.

Apply this diff to check the return value:

-        aclrtSetDevice(i);
+        ret = aclrtSetDevice(i);
+        MLLM_ACL_CHECK(ret);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
aclrtSetDevice(i);
ret = aclrtSetDevice(i);
MLLM_ACL_CHECK(ret);
🤖 Prompt for AI Agents
In mllm/backends/ascend/memory/AscendMemoryManager.cpp around line 27, the call
to aclrtSetDevice(i) currently ignores its return value; modify the code to
check the returned aclError (or appropriate status type), handle non-success
results by logging an error and returning or throwing (depending on existing
error handling patterns) before proceeding to create the memory pool, and ensure
the device is only used when aclrtSetDevice reports success to avoid
device-context mismatches.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
examples/ascend_add_demo/main.cpp (1)

52-52: Remove debug print statement.

This debug print "context over" should be removed before merging.

🧹 Nitpick comments (5)
examples/ascend_add_demo/main.cpp (1)

60-88: Fix inconsistent indentation.

Lines 60-88 use 2-space indentation while the surrounding code in the same try block uses 4-space indentation. Align all code within the try block to consistent 4-space indentation.

mllm/backends/ascend/AscendCommon.hpp (4)

19-28: Parenthesize macro arguments for safety.

The macro arguments should be enclosed in parentheses to prevent operator precedence issues when complex expressions are passed.

 #define MLLM_ACL_CHECK(err)                                                                                    \
-  if (err != ACL_SUCCESS) {                                                                                    \
+  if ((err) != ACL_SUCCESS) {                                                                                  \
     MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "ACL error code {}: {}", int(err), aclGetRecentErrMsg()); \
   }
 
 #define MLLM_ATB_CHECK(err)                                                                                    \
-  if (err != atb::NO_ERROR) {                                                                                  \
+  if ((err) != atb::NO_ERROR) {                                                                                \
     MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "ATB error code {}", int(err));                           \
   }

45-51: Consider default-initializing struct members.

AscendDeviceInfo members are uninitialized by default. Adding default values prevents undefined behavior if a field is not explicitly set.

 struct AscendDeviceInfo {
   std::string name;
-  unsigned int id;
-  size_t total_memory;  // bytes
-  size_t free_memory;   // bytes
+  unsigned int id{0};
+  size_t total_memory{0};  // bytes
+  size_t free_memory{0};   // bytes
   std::string soc_version;
 };

81-85: Consider adding [[nodiscard]] to accessor functions.

Functions like valid(), tensor(), and blockId() return values that should typically not be ignored.

-  void release();  // Manually release memory block and invalidate handle
-  bool valid() const { return block_id_ >= 0; }  // Check if handle owns a valid memory block
+  void release();
+  [[nodiscard]] bool valid() const { return block_id_ >= 0; }
 
-  Tensor& tensor() { return tensor_; }  // Access tensor
-  const Tensor& tensor() const { return tensor_; }  // Access tensor (const)
-  int blockId() const { return block_id_; }  // Get memory block ID
+  [[nodiscard]] Tensor& tensor() { return tensor_; }
+  [[nodiscard]] const Tensor& tensor() const { return tensor_; }
+  [[nodiscard]] int blockId() const { return block_id_; }

117-127: Delete copy/move operations for AscendTimer.

Since AscendTimer is an RAII timer that manages timing state, it should not be copied or moved. Explicitly delete these operations to prevent misuse.

 class AscendTimer {
  public:
   explicit AscendTimer(const char* tag, bool sync_before = true, bool sync_after = true);
   ~AscendTimer();
 
+  AscendTimer(const AscendTimer&) = delete;
+  AscendTimer& operator=(const AscendTimer&) = delete;
+  AscendTimer(AscendTimer&&) = delete;
+  AscendTimer& operator=(AscendTimer&&) = delete;
+
  private:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b37bb5 and 939a216.

📒 Files selected for processing (5)
  • docs/ascend_backend/core_design.rst (1 hunks)
  • examples/ascend_add_demo/main.cpp (1 hunks)
  • mllm/backends/ascend/AscendCommon.cpp (1 hunks)
  • mllm/backends/ascend/AscendCommon.hpp (1 hunks)
  • mllm/backends/ascend/ops/AscnedElewiseOps.cpp (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/ascend_backend/core_design.rst
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/ascend/ops/AscnedElewiseOps.cpp
  • mllm/backends/ascend/AscendCommon.cpp
  • mllm/backends/ascend/AscendCommon.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/ascend/ops/AscnedElewiseOps.cpp
  • mllm/backends/ascend/AscendCommon.cpp
  • mllm/backends/ascend/AscendCommon.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/ascend/ops/AscnedElewiseOps.cpp
  • mllm/backends/ascend/AscendCommon.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/ascend/ops/AscnedElewiseOps.cpp
  • mllm/backends/ascend/AscendCommon.cpp
  • mllm/backends/ascend/AscendCommon.hpp
🧬 Code graph analysis (4)
mllm/backends/ascend/ops/AscnedElewiseOps.cpp (1)
mllm/backends/ascend/AscendCommon.cpp (6)
  • getGlobalAtbContext (173-195)
  • getGlobalAtbContext (173-173)
  • fillAtbTensorDesc (210-219)
  • fillAtbTensorDesc (210-210)
  • syncGlobalAtbStream (202-208)
  • syncGlobalAtbStream (202-202)
mllm/backends/ascend/AscendCommon.cpp (3)
mllm/backends/ascend/AscendCommon.hpp (12)
  • AscendTensorHandle (71-71)
  • AscendTensorHandle (72-72)
  • AscendTensorHandle (73-73)
  • AscendTensorHandle (75-75)
  • AscendTensorHandle (77-77)
  • other (78-78)
  • tensor_ (83-83)
  • tensor_ (84-84)
  • block_id_ (81-81)
  • block_id_ (85-85)
  • AscendDeviceMetaInfo (56-56)
  • AscendDeviceMetaInfo (63-63)
mllm/backends/ascend/memory/AscendMemoryPool.hpp (3)
  • block_id (22-22)
  • block_id (24-24)
  • size (20-20)
mllm/backends/ascend/memory/AscendMemoryManager.cpp (2)
  • getAscendMemoryManager (15-18)
  • getAscendMemoryManager (15-15)
mllm/backends/ascend/AscendCommon.hpp (1)
mllm/backends/ascend/AscendCommon.cpp (7)
  • AscendTensorHandle (23-24)
  • AscendTensorHandle (26-26)
  • AscendTensorHandle (28-31)
  • other (33-41)
  • other (33-33)
  • AscendTimer (154-162)
  • AscendTimer (164-171)
examples/ascend_add_demo/main.cpp (3)
mllm/backends/ascend/AscendCommon.cpp (6)
  • prepareAscendTensor (54-81)
  • prepareAscendTensor (54-56)
  • verifyAscendTensor (103-142)
  • verifyAscendTensor (103-108)
  • verifyAscendTensor (144-152)
  • verifyAscendTensor (144-149)
mllm/backends/ascend/AscendCommon.hpp (3)
  • prepareAscendTensor (93-95)
  • verifyAscendTensor (101-106)
  • verifyAscendTensor (109-114)
mllm/backends/qnn/op/QNNElewiseOp.hpp (1)
  • kAdd (36-38)
🪛 Clang (14.0.6)
mllm/backends/ascend/AscendCommon.cpp

[error] 54-54: 3 adjacent parameters of 'prepareAscendTensor' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 57-57: variable 'expected_elements' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 68-68: variable 'device_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 69-69: variable 'bytes' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 87-87: variable 'elem_cnt' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 103-103: 3 adjacent parameters of 'verifyAscendTensor' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 103-103: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 122-122: variable name 'ok' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 144-144: 3 adjacent parameters of 'verifyAscendTensor' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 144-144: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 174-174: variable 'ctx' provides global access to a non-const object; consider making the pointed-to data 'const'

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)


[error] 210-210: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/ascend/AscendCommon.hpp

[error] 19-19: function-like macro 'MLLM_ACL_CHECK' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)


[error] 20-20: macro argument should be enclosed in parentheses

(bugprone-macro-parentheses,-warnings-as-errors)


[error] 25-25: function-like macro 'MLLM_ATB_CHECK' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)


[error] 26-26: macro argument should be enclosed in parentheses

(bugprone-macro-parentheses,-warnings-as-errors)


[error] 42-42: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 45-45: constructor does not initialize these fields: name, id, total_memory, free_memory, soc_version

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 54-54: class 'AscendDeviceMetaInfo' defines a copy constructor and a copy assignment operator but does not define a destructor, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 70-70: constructor does not initialize these fields: tensor_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 81-81: function 'valid' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)


[error] 84-84: function 'tensor' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)


[error] 85-85: function 'blockId' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)


[error] 101-101: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 103-103: 1e-2f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 103-103: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 104-104: 1e-2f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 104-104: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 109-109: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 111-111: 1e-2f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 111-111: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 112-112: 1e-2f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 112-112: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 117-117: class 'AscendTimer' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 130-130: function-like macro 'ASCEND_TIME_SCOPE' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)


[error] 131-131: function-like macro 'ASCEND_TIME_SCOPE_NOSYNC' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

examples/ascend_add_demo/main.cpp

[error] 62-62: variable 'correct' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 84-84: if with identical then and else branches

(bugprone-branch-clone,-warnings-as-errors)

🔇 Additional comments (6)
examples/ascend_add_demo/main.cpp (1)

14-99: Demo structure looks good.

The demo properly initializes the backend, prepares test data, executes the Add operation, verifies results with tolerances, and handles errors via try/catch. The use of AscendTensorHandle RAII for resource management is appropriate.

mllm/backends/ascend/ops/AscnedElewiseOps.cpp (1)

37-118: Forward implementation looks correct.

The ATB workflow follows the expected pattern: create operation, setup tensors, configure VariantPack, setup workspace, execute, synchronize, and cleanup. Error handling is comprehensive with appropriate MLLM_ERROR_EXIT calls.

mllm/backends/ascend/AscendCommon.cpp (3)

23-52: AscendTensorHandle RAII implementation is correct.

Move semantics properly transfer ownership by resetting block_id_ to -1 in the source. The release() method correctly handles both managed blocks (via memory manager) and unmanaged tensors.


54-81: Note: Returned handle has block_id = -1.

This is intentional since memory is allocated via Tensor::empty/alloc rather than the memory manager. The release() method correctly handles this case by calling tensor.delete_().


173-195: Thread-safe ATB context initialization is correct.

Using std::call_once ensures the context is initialized exactly once, even under concurrent access. The initialization sequence (set device → create context → create stream → set execute stream) follows ATB requirements.

mllm/backends/ascend/AscendCommon.hpp (1)

1-133: Header structure and API documentation are good.

The header provides clear documentation for each function and class. The namespace organization and include structure are appropriate for a backend utility header.

Comment on lines +221 to +282
AscendDeviceMetaInfo::AscendDeviceMetaInfo() {
#ifndef ASCENDC_CPU_DEBUG
// Initialize ACL to query devices
auto ret = aclInit(nullptr);
if (ret != ACL_SUCCESS) {
MLLM_ERROR("Failed to initialize ACL for device enumeration: {}", ret);
return;
}

// Get device count
uint32_t device_count = 0;
ret = aclrtGetDeviceCount(&device_count);
if (ret != ACL_SUCCESS) {
MLLM_ERROR("Failed to get Ascend device count: {}", ret);
aclFinalize();
return;
}

// Collect info for each device
for (uint32_t i = 0; i < device_count; ++i) {
AscendDeviceInfo info;
info.id = i;
info.name = "Ascend Device " + std::to_string(i);

// Set device to query its properties
ret = aclrtSetDevice(i);
if (ret == ACL_SUCCESS) {
// Get memory information
size_t free_mem = 0, total_mem = 0;
ret = aclrtGetMemInfo(ACL_HBM_MEM, &free_mem, &total_mem);
if (ret == ACL_SUCCESS) {
info.total_memory = total_mem;
info.free_memory = free_mem;
} else {
info.total_memory = 0;
info.free_memory = 0;
}

// SOC version - platform specific, set to unknown for now
info.soc_version = "Unknown";
} else {
info.total_memory = 0;
info.free_memory = 0;
info.soc_version = "Unknown";
}

devices.push_back(info);
}

// Finalize ACL after enumeration
aclFinalize();
#else
// In CPU debug mode, add a dummy device
AscendDeviceInfo info;
info.id = 0;
info.name = "Ascend CPU Debug Device";
info.total_memory = 0;
info.free_memory = 0;
info.soc_version = "CPU_DEBUG";
devices.push_back(info);
#endif
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for AscendDeviceMetaInfo and getGlobalAtbContext usage
rg -n 'AscendDeviceMetaInfo::instance|getGlobalAtbContext' --type=cpp

Repository: UbiquitousLearning/mllm

Length of output: 513


🏁 Script executed:

#!/bin/bash
# First, locate the AscendCommon files and understand the structure
find . -name "*.cpp" -o -name "*.h" | grep -i ascend | head -20

Repository: UbiquitousLearning/mllm

Length of output: 502


🏁 Script executed:

#!/bin/bash
# Search for usage of both functions
rg -n 'AscendDeviceMetaInfo|getGlobalAtbContext' --type cpp --type hpp

Repository: UbiquitousLearning/mllm

Length of output: 94


🏁 Script executed:

#!/bin/bash
# Look at the file structure and find relevant files
ls -la mllm/backends/ascend/ 2>/dev/null || echo "Path not found, searching..."
find . -path "*/ascend/*" -name "*.cpp" -o -path "*/ascend/*" -name "*.h" | head -20

Repository: UbiquitousLearning/mllm

Length of output: 1585


🏁 Script executed:

#!/bin/bash
# Get the file content to understand the context better
find . -type f \( -name "*.cpp" -o -name "*.h" \) -path "*ascend*" 2>/dev/null | head -5

Repository: UbiquitousLearning/mllm

Length of output: 269


🏁 Script executed:

#!/bin/bash
# Search for the usage pattern
rg 'AscendDeviceMetaInfo::instance' . --type cpp --type hpp -A 3 2>/dev/null | head -40

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Search for getGlobalAtbContext
rg 'getGlobalAtbContext' . --type cpp --type hpp -A 3 2>/dev/null | head -40

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Check where aclFinalize is called
rg 'aclFinalize' . --type cpp --type hpp -B 2 -A 2 2>/dev/null

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

#!/bin/bash
stat mllm/backends/ascend/AscendCommon.cpp 2>/dev/null && echo "File exists" || echo "File not found"
ls -la mllm/backends/ascend/ 2>/dev/null | head -10

Repository: UbiquitousLearning/mllm

Length of output: 667


🏁 Script executed:

#!/bin/bash
# Read AscendCommon.cpp to see the full context of both functions
cat -n mllm/backends/ascend/AscendCommon.cpp | sed -n '150,290p'

Repository: UbiquitousLearning/mllm

Length of output: 4570


🏁 Script executed:

#!/bin/bash
# Read AscendBackend.cpp to see the usage pattern
cat -n mllm/backends/ascend/AscendBackend.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1239


🏁 Script executed:

#!/bin/bash
# Look at the getGlobalAtbContext implementation more closely
rg -A 30 'atb::Context\* getGlobalAtbContext' mllm/backends/ascend/AscendCommon.cpp

Repository: UbiquitousLearning/mllm

Length of output: 783


🏁 Script executed:

#!/bin/bash
# Also check if aclInit and aclFinalize are called anywhere else
rg 'aclInit|aclFinalize|aclrtSetDevice' mllm/backends/ascend/ --type cpp

Repository: UbiquitousLearning/mllm

Length of output: 628


Remove aclFinalize() call from AscendDeviceMetaInfo constructor or defer it until backend shutdown.

Calling aclFinalize() at line 271 after device enumeration terminates the ACL runtime. Since getGlobalAtbContext() (used later in operations) requires aclrtSetDevice() to function, subsequent ACL operations will fail if invoked after this destructor. Either:

  1. Remove the aclFinalize() call and let backend shutdown handle cleanup
  2. Ensure getGlobalAtbContext() is initialized before device enumeration
🤖 Prompt for AI Agents
In mllm/backends/ascend/AscendCommon.cpp around lines 221 to 282, the
constructor calls aclFinalize() after enumerating devices which shuts down the
ACL runtime and breaks subsequent ACL calls (e.g.,
aclrtSetDevice()/getGlobalAtbContext()); remove the aclFinalize() call from this
constructor (do not finalize ACL here) and instead ensure ACL is finalized once
during backend shutdown/cleanup; update error paths to avoid finalizing ACL
prematurely and document that ACL initialization lifetime spans device
enumeration through backend shutdown.

Comment on lines +130 to +131
#define ASCEND_TIME_SCOPE(tag) ::mllm::ascend::AscendTimer timer_scope_##__LINE__(tag, true, true)
#define ASCEND_TIME_SCOPE_NOSYNC(tag) ::mllm::ascend::AscendTimer timer_scope_##__LINE__(tag, false, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix __LINE__ concatenation in macros.

__LINE__ requires a two-level macro expansion to work correctly in token concatenation. Currently, the variable name will literally be timer_scope___LINE__ rather than timer_scope_123.

+#define ASCEND_TIME_SCOPE_CONCAT_INNER(a, b) a##b
+#define ASCEND_TIME_SCOPE_CONCAT(a, b) ASCEND_TIME_SCOPE_CONCAT_INNER(a, b)
-#define ASCEND_TIME_SCOPE(tag) ::mllm::ascend::AscendTimer timer_scope_##__LINE__(tag, true, true)
-#define ASCEND_TIME_SCOPE_NOSYNC(tag) ::mllm::ascend::AscendTimer timer_scope_##__LINE__(tag, false, false)
+#define ASCEND_TIME_SCOPE(tag) ::mllm::ascend::AscendTimer ASCEND_TIME_SCOPE_CONCAT(timer_scope_, __LINE__)(tag, true, true)
+#define ASCEND_TIME_SCOPE_NOSYNC(tag) ::mllm::ascend::AscendTimer ASCEND_TIME_SCOPE_CONCAT(timer_scope_, __LINE__)(tag, false, false)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#define ASCEND_TIME_SCOPE(tag) ::mllm::ascend::AscendTimer timer_scope_##__LINE__(tag, true, true)
#define ASCEND_TIME_SCOPE_NOSYNC(tag) ::mllm::ascend::AscendTimer timer_scope_##__LINE__(tag, false, false)
#define ASCEND_TIME_SCOPE_CONCAT_INNER(a, b) a##b
#define ASCEND_TIME_SCOPE_CONCAT(a, b) ASCEND_TIME_SCOPE_CONCAT_INNER(a, b)
#define ASCEND_TIME_SCOPE(tag) ::mllm::ascend::AscendTimer ASCEND_TIME_SCOPE_CONCAT(timer_scope_, __LINE__)(tag, true, true)
#define ASCEND_TIME_SCOPE_NOSYNC(tag) ::mllm::ascend::AscendTimer ASCEND_TIME_SCOPE_CONCAT(timer_scope_, __LINE__)(tag, false, false)
🧰 Tools
🪛 Clang (14.0.6)

[error] 130-130: function-like macro 'ASCEND_TIME_SCOPE' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)


[error] 131-131: function-like macro 'ASCEND_TIME_SCOPE_NOSYNC' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

🤖 Prompt for AI Agents
In mllm/backends/ascend/AscendCommon.hpp around lines 130-131, the macros use
token concatenation with __LINE__ directly which prevents proper expansion;
change them to use a two-level helper macro (e.g., a CONCAT/CONCAT_INNER pair)
so __LINE__ is expanded before pasting, then use the outer concatenation macro
to build the variable name (e.g., CONCAT(timer_scope_, __LINE__)) and keep the
rest of the macro arguments the same; update both ASCEND_TIME_SCOPE and
ASCEND_TIME_SCOPE_NOSYNC to call the outer helper to ensure the variable becomes
timer_scope_<lineNumber>.

Comment on lines +1 to +3
// Copyright (c) MLLM Team.
// Licensed under the MIT License.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Filename typo: "AscnedElewiseOps.cpp" should be "AscendElewiseOps.cpp".

The filename has a typo - "Ascned" instead of "Ascend". This should be corrected to maintain consistency with the header file name AscendElewiseOps.hpp.

🤖 Prompt for AI Agents
In mllm/backends/ascend/ops/AscnedElewiseOps.cpp lines 1-3, the filename
contains a typo ("AscnedElewiseOps.cpp"); rename the file to
AscendElewiseOps.cpp to match the header AscendElewiseOps.hpp and the project
naming, then update all references (includes, CMakeLists.txt / build files,
source lists, and any #include or forward declarations) to use the corrected
filename; ensure any case-sensitive platforms and CI are accounted for and run a
build to verify there are no remaining broken references.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
mllm/backends/ascend/ops/AscnedElewiseOps.cpp (1)

95-108: Workspace memory leak on error path.

If op->Execute() fails at line 107, the workspace allocated at line 99 is never freed because MLLM_ERROR_EXIT prevents reaching the cleanup code at lines 113-116. This is the same issue flagged in the previous review.

Use RAII or a scope guard pattern to ensure workspace is freed even on error:

struct WorkspaceGuard {
  AscendMemoryManager& mem_mgr_;
  int block_id_;
  WorkspaceGuard(AscendMemoryManager& mgr, int id) : mem_mgr_(mgr), block_id_(id) {}
  ~WorkspaceGuard() { 
    if (block_id_ != -1) mem_mgr_.freeBlock(block_id_); 
  }
  WorkspaceGuard(const WorkspaceGuard&) = delete;
  WorkspaceGuard& operator=(const WorkspaceGuard&) = delete;
};

Then use:

void* workspace = nullptr;
int workspace_block_id = -1;
if (workspaceSize > 0) {
  auto& mem_mgr = getAscendMemoryManager();
  mem_mgr.allocateBlock(static_cast<uint32_t>(workspaceSize), workspace_block_id);
  mem_mgr.getBlockPtr(workspace_block_id, workspace);
}
WorkspaceGuard ws_guard(getAscendMemoryManager(), workspace_block_id);  // Automatic cleanup
🧹 Nitpick comments (1)
mllm/backends/ascend/ops/AscnedElewiseOps.cpp (1)

22-36: Remove commented-out code.

The commented-out allocation code (lines 23-34) should be removed entirely rather than left in the source. If this code serves as documentation of a previous approach, consider adding a brief comment explaining why BaseOp::setup() is used instead, or move the explanation to documentation.

Apply this diff to clean up:

 void AscendAddOp::setup(const std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) {
-  // for (auto& t : outputs) {
-  //   if (!t.isNil()) {
-  //     auto& mem_mgr = getAscendMemoryManager();
-  //     int block_id = -1;
-  //     void* device_ptr = nullptr;
-      
-  //     mem_mgr.allocateBlock(static_cast<uint32_t>(t.bytes()), block_id);
-  //     mem_mgr.getBlockPtr(block_id, device_ptr);
-      
-  //     t.impl()->storage()->ptr_ = device_ptr;
-  //   }
-  // }
   BaseOp::setup(inputs, outputs);
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 939a216 and b487ef4.

📒 Files selected for processing (1)
  • mllm/backends/ascend/ops/AscnedElewiseOps.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/ascend/ops/AscnedElewiseOps.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/ascend/ops/AscnedElewiseOps.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/ascend/ops/AscnedElewiseOps.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/ascend/ops/AscnedElewiseOps.cpp
🧬 Code graph analysis (1)
mllm/backends/ascend/ops/AscnedElewiseOps.cpp (2)
mllm/backends/ascend/AscendCommon.hpp (3)
  • getGlobalAtbContext (33-33)
  • fillAtbTensorDesc (42-42)
  • syncGlobalAtbStream (39-39)
mllm/backends/ascend/AscendCommon.cpp (6)
  • getGlobalAtbContext (173-195)
  • getGlobalAtbContext (173-173)
  • fillAtbTensorDesc (210-219)
  • fillAtbTensorDesc (210-210)
  • syncGlobalAtbStream (202-208)
  • syncGlobalAtbStream (202-202)
🔇 Additional comments (4)
mllm/backends/ascend/ops/AscnedElewiseOps.cpp (4)

1-16: LGTM: Copyright header and includes are appropriate.

The MIT license header follows project conventions, and the included headers (ATB/ACL libraries, memory manager, common utilities) are appropriate for implementing the Ascend Add operation.


18-21: LGTM: Constructor correctly delegates to base class.

The constructor properly forwards the options to the base aops::AddOp class.


38-51: LGTM: Input validation is thorough and appropriate.

The validation checks ensure:

  • Correct number of inputs/outputs
  • Dtype consistency across tensors
  • No-broadcast constraint for this initial implementation

The use of NYI for unsupported features clearly communicates current limitations.


110-119: Cleanup logic is correct but vulnerable to error paths.

The cleanup sequence (sync stream, free workspace, destroy operation) is correct. However, as flagged in previous comments, this cleanup is bypassed if MLLM_ERROR_EXIT is called earlier. Once RAII guards are added for workspace and operation, this code can remain as the happy-path cleanup with the guards handling error paths.

Comment on lines +53 to +93
atb::infer::ElewiseParam addParam;
addParam.elewiseType = atb::infer::ElewiseParam::ELEWISE_ADD;

atb::Operation* op = nullptr;
auto st = atb::CreateOperation(addParam, &op);
if (st != atb::NO_ERROR || op == nullptr) {
MLLM_ERROR_EXIT(ExitCode::kAscendError, "ATB CreateOperation(ELEWISE_ADD) failed, status={}", static_cast<int>(st));
}

atb::Context* atb_ctx = getGlobalAtbContext();

atb::Tensor atb_x;
atb::Tensor atb_y;
atb::Tensor atb_z;

fillAtbTensorDesc(x, atb_x.desc);
fillAtbTensorDesc(y, atb_y.desc);
fillAtbTensorDesc(z, atb_z.desc);

atb_x.deviceData = reinterpret_cast<uint8_t*>(x.ptr<void>());
atb_x.dataSize = x.bytes();
atb_y.deviceData = reinterpret_cast<uint8_t*>(y.ptr<void>());
atb_y.dataSize = y.bytes();
atb_z.deviceData = reinterpret_cast<uint8_t*>(z.ptr<void>());
atb_z.dataSize = z.bytes();

atb::SVector<atb::Tensor> inTensors;
atb::SVector<atb::Tensor> outTensors;
inTensors.push_back(atb_x);
inTensors.push_back(atb_y);
outTensors.push_back(atb_z);

atb::VariantPack vp;
vp.inTensors = inTensors;
vp.outTensors = outTensors;

uint64_t workspaceSize = 0;
st = op->Setup(vp, workspaceSize, atb_ctx);
if (st != atb::NO_ERROR) {
MLLM_ERROR_EXIT(ExitCode::kAscendError, "ATB AddOp Setup failed, status={}", static_cast<int>(st));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Resource leak: ATB operation not destroyed on error.

If op->Setup() fails at line 92, MLLM_ERROR_EXIT is called before reaching atb::DestroyOperation(op) at line 118. The ATB operation is leaked.

Consider using RAII or a scope guard to ensure cleanup:

// Add at the top of forward() or create a helper class
struct AtbOperationGuard {
  atb::Operation* op_;
  explicit AtbOperationGuard(atb::Operation* op) : op_(op) {}
  ~AtbOperationGuard() { if (op_) atb::DestroyOperation(op_); }
  AtbOperationGuard(const AtbOperationGuard&) = delete;
  AtbOperationGuard& operator=(const AtbOperationGuard&) = delete;
};

Then use:

auto st = atb::CreateOperation(addParam, &op);
if (st != atb::NO_ERROR || op == nullptr) {
  MLLM_ERROR_EXIT(ExitCode::kAscendError, "ATB CreateOperation(ELEWISE_ADD) failed, status={}", static_cast<int>(st));
}
AtbOperationGuard op_guard(op);  // Automatic cleanup
🤖 Prompt for AI Agents
In mllm/backends/ascend/ops/AscnedElewiseOps.cpp around lines 53-93, the ATB
Operation created by atb::CreateOperation is not cleaned up if op->Setup(...)
fails, causing a resource leak; after successful CreateOperation (and null
check) instantiate a small RAII guard or scope-guard object that holds the
atb::Operation* and calls atb::DestroyOperation in its destructor (delete
copy/move as appropriate), and place this guard immediately after the success
path so any early returns or error exits (like MLLM_ERROR_EXIT) will still
destroy the operation; ensure the guard only runs when op is non-null and that
the normal end-of-function cleanup does not double-destroy (e.g., release or
reset the guard before the existing explicit DestroyOperation call).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
mllm/backends/ascend/AscendCommon.cpp (1)

220-281: Critical: aclFinalize() call breaks subsequent ACL operations.

Calling aclFinalize() at line 270 terminates the ACL runtime immediately after device enumeration. Since getGlobalAtbContext() (used by operators) requires a valid ACL runtime and calls aclrtSetDevice(), any subsequent ACL operations will fail after this constructor completes.

This issue was previously flagged in past review comments and remains unresolved.

Recommended solution:
Remove the aclFinalize() call from this constructor and defer ACL finalization to backend shutdown or application exit. ACL should remain initialized for the lifetime of the backend.

🔎 Proposed fix
     devices.push_back(info);
   }
 
-  // Finalize ACL after enumeration
-  aclFinalize();
+  // Note: ACL should remain initialized for the lifetime of the backend.
+  // Finalization will be handled during backend shutdown.
 #else
   // In CPU debug mode, add a dummy device
🧹 Nitpick comments (7)
docs/ascend_backend/index.rst (1)

8-9: Consider removing extra trailing empty line.

The file ends with two empty lines. Per coding guidelines, it's recommended to have only a single trailing newline.

🔎 Proposed fix
 
   core_design
-
-
+
tests/ascend/KernelTestHelper.hpp (1)

7-17: Consider removing empty SetUp/TearDown methods if unused.

The SetUp() and TearDown() methods are currently empty. If no per-test setup or cleanup is needed, these methods can be removed entirely, as GoogleTest provides default implementations.

🔎 Proposed simplification
 class KernelTest : public testing::Test {
  public:
   KernelTest() = default;
   ~KernelTest() override = default;
-
-  // If the constructor and destructor are not enough for setting up
-  // and cleaning up each test, you can define the following methods:
-  void SetUp() override {}
-
-  void TearDown() override {}
 };

Note: The static analysis warning about special member functions is a false positive for test fixtures and can be safely ignored.

tests/CMakeLists.txt (1)

14-16: Consider consistency with other backend patterns.

The Ascend backend uses an environment variable check (ENV{ASCEND_HOME_PATH}), while the CUDA backend uses a CMake option (MLLM_BUILD_CUDA_BACKEND). This may be intentional due to SDK location requirements, but consider whether a CMake option with a separate ASCEND_HOME_PATH variable would provide more consistent build configuration.

mllm/backends/ascend/ops/AscendX2XOp.cpp (2)

34-44: Simplify error handling by removing redundant check.

The if (ret != ACL_SUCCESS) check before MLLM_ACL_CHECK(ret) is redundant. The macro should handle error checking internally, so you can call it directly after the memcpy operation.

🔎 Proposed simplification
     auto ret = aclrtMemcpy(
         dst_data, data_size,
         src_data, data_size,
         ACL_MEMCPY_HOST_TO_DEVICE);
-
-    if (ret != ACL_SUCCESS) {
-      MLLM_ACL_CHECK(ret);
-    }
+    MLLM_ACL_CHECK(ret);
 
     syncGlobalAtbStream();

Apply the same simplification to lines 54-61 and 74-81 for consistency.


73-85: Skip synchronization when source and destination pointers are identical.

When src_data == dst_data, the memcpy is correctly skipped, but the function still calls syncGlobalAtbStream() at line 83 (outside the if-block). For consistency and to avoid unnecessary synchronization overhead, move the sync call inside the if-block.

🔎 Proposed fix
     if (src_data != dst_data) {
       auto ret = aclrtMemcpy(
           dst_data, data_size,
           src_data, data_size,
           ACL_MEMCPY_DEVICE_TO_DEVICE);
-
-      if (ret != ACL_SUCCESS) {
-        MLLM_ACL_CHECK(ret);
-      }
-
+      MLLM_ACL_CHECK(ret);
       syncGlobalAtbStream();
     }
     return;
mllm/backends/ascend/AscendCommon.cpp (2)

209-218: Rename short parameter name for better readability.

The parameter name t at line 209 is too short per coding guidelines. Rename it to a more descriptive identifier such as tensor.

Based on coding guidelines, parameter names should be at least 3 characters long for clarity.

🔎 Proposed fix
-void fillAtbTensorDesc(const Tensor& t, atb::TensorDesc& desc) {
+void fillAtbTensorDesc(const Tensor& tensor, atb::TensorDesc& desc) {
   desc.dtype = ACL_FLOAT16; // Currently hardcoded as per demo, can be expanded later
   desc.format = ACL_FORMAT_ND;
 
-  auto shape = t.shape();
+  auto shape = tensor.shape();
   desc.shape.dimNum = static_cast<uint64_t>(shape.size());
   for (uint64_t i = 0; i < desc.shape.dimNum; ++i) {
     desc.shape.dims[i] = static_cast<int64_t>(shape[i]);
   }
 }

Note: This change should also be reflected in the corresponding header file declaration.


121-135: Rename short variable name for better readability.

The variable name ok at line 121 is too short per coding guidelines. Consider renaming it to a more descriptive identifier such as is_valid or all_match.

🔎 Proposed fix
-  bool ok = true;
+  bool is_valid = true;
   for (size_t i = 0; i < actual.size(); ++i) {
     const float diff = std::abs(actual[i] - expected[i]);
     const float thr = atol + rtol * std::abs(expected[i]);
     if (diff > thr) {
-      ok = false;
+      is_valid = false;
       if (verbose) {
         std::cout << "[AscendVerify] idx " << i
                   << " expected " << expected[i]
                   << " got " << actual[i]
                   << " diff " << diff
                   << " thr " << thr << "\n";
       }
     }
   }
 
   if (verbose) {
-    std::cout << (ok ? "[AscendVerify] OK" : "[AscendVerify] FAIL") << "\n";
+    std::cout << (is_valid ? "[AscendVerify] OK" : "[AscendVerify] FAIL") << "\n";
   }
-  return ok;
+  return is_valid;
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b487ef4 and bd8eea4.

📒 Files selected for processing (13)
  • docs/ascend_backend/core_design.rst (1 hunks)
  • docs/ascend_backend/index.rst (1 hunks)
  • docs/index.rst (1 hunks)
  • examples/CMakeLists.txt (1 hunks)
  • mllm/backends/ascend/AscendBackend.cpp (1 hunks)
  • mllm/backends/ascend/AscendCommon.cpp (1 hunks)
  • mllm/backends/ascend/ops/AscendX2XOp.cpp (1 hunks)
  • mllm/backends/ascend/ops/AscendX2XOp.hpp (1 hunks)
  • tests/CMakeLists.txt (1 hunks)
  • tests/ascend/AscendKernelTest.hpp (1 hunks)
  • tests/ascend/CMakeLists.txt (1 hunks)
  • tests/ascend/KernelTest.cpp (1 hunks)
  • tests/ascend/KernelTestHelper.hpp (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • examples/CMakeLists.txt
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/ascend/ops/AscendX2XOp.hpp
  • mllm/backends/ascend/ops/AscendX2XOp.cpp
  • mllm/backends/ascend/AscendBackend.cpp
  • mllm/backends/ascend/AscendCommon.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/ascend/ops/AscendX2XOp.hpp
  • mllm/backends/ascend/ops/AscendX2XOp.cpp
  • mllm/backends/ascend/AscendBackend.cpp
  • mllm/backends/ascend/AscendCommon.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/ascend/ops/AscendX2XOp.hpp
  • mllm/backends/ascend/ops/AscendX2XOp.cpp
  • mllm/backends/ascend/AscendBackend.cpp
  • mllm/backends/ascend/AscendCommon.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/ascend/ops/AscendX2XOp.cpp
  • mllm/backends/ascend/AscendBackend.cpp
  • mllm/backends/ascend/AscendCommon.cpp
🧬 Code graph analysis (1)
mllm/backends/ascend/ops/AscendX2XOp.cpp (1)
mllm/backends/ascend/ops/AscendX2XOp.hpp (4)
  • AscendX2XOp (14-14)
  • options (21-23)
  • options (21-21)
  • inputs (16-16)
🪛 Clang (14.0.6)
tests/ascend/AscendKernelTest.hpp

[error] 12-12: class 'AscendKernelTest' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 18-18: method 'AddFloat16Test' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

tests/ascend/KernelTestHelper.hpp

[error] 7-7: class 'KernelTest' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/ascend/AscendCommon.cpp

[error] 54-54: 3 adjacent parameters of 'prepareAscendTensor' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 57-57: variable 'expected_elements' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 68-68: variable 'device_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 69-69: variable 'bytes' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 92-92: variable 'elem_cnt' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 95-95: variable 'src' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 102-102: 3 adjacent parameters of 'verifyAscendTensor' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 102-102: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 121-121: variable name 'ok' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 143-143: 3 adjacent parameters of 'verifyAscendTensor' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 143-143: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 173-173: variable 'ctx' provides global access to a non-const object; consider making the pointed-to data 'const'

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)


[error] 209-209: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

tests/ascend/KernelTest.cpp

[error] 16-16: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)


[error] 21-21: 8 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 21-21: 8 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 22-22: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 22-22: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 23-23: 32 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 23-23: 32 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

🔇 Additional comments (8)
docs/index.rst (1)

354-358: LGTM!

The Ascend backend documentation entry is properly integrated into the main documentation index with consistent formatting.

tests/ascend/KernelTest.cpp (2)

16-26: LGTM!

The test systematically validates the Add operation across various tensor shapes, from small ({2,3}) to larger ({32,32}) dimensions. The static analysis warnings about magic numbers are false positives—these values represent test tensor dimensions and are self-documenting in this context.


28-44: LGTM!

The main function properly orchestrates the test lifecycle: initializes GoogleTest, sets up the Ascend backend and runtime context, executes tests, reports memory usage for leak detection, and performs cleanup. The sequence is correct and demonstrates good test hygiene.

tests/ascend/CMakeLists.txt (1)

1-26: LGTM!

The CMake configuration properly sets up the Ascend kernel test target with:

  • Conditional SDK path inclusion when ASCEND_HOME_PATH is defined
  • Correct library dependencies (GoogleTest, runtime, backends, Ascend CL)
  • Appropriate C++20 standard requirement
  • GoogleTest integration via include(GoogleTest)

The configuration follows CMake best practices and aligns with the project's build structure.

tests/ascend/AscendKernelTest.hpp (2)

18-50: Test implementation is sound with minor style consideration.

The test properly validates the Add operation by:

  1. Generating random FP16 inputs on CPU
  2. Computing a reference result with element-wise FP16 addition
  3. Executing the operation on Ascend hardware
  4. Validating results with allClose (tolerance 1e-2f, appropriate for FP16 precision)

Minor note: Line 19 uses using namespace mllm within the method. While marked with NOLINT, consider using explicit mllm:: prefixes to maintain namespace hygiene (purely stylistic for tests).

The static analysis suggestions can be safely ignored:

  • The "special member functions" warning is a false positive for test fixtures
  • Making the method static would prevent it from accessing inherited test fixture state if needed in the future

28-34: FP16 arithmetic is correctly supported.

The mllm_fp16_t type is defined as either float16_t (native ARM floating-point type) or half_float::half (established library type with full operator support). Both implementations properly support the + operator for floating-point addition. The same pattern is used consistently in CPU ElementwiseKernelTest and is known to work correctly.

mllm/backends/ascend/AscendBackend.cpp (1)

14-23: Verify device enumeration success in constructor.

The constructor properly registers operation factories and logs device information during initialization. However, AscendDeviceMetaInfo::instance().devices is not guaranteed to be populated—device enumeration can fail silently if aclInit() or aclrtGetDeviceCount() fails. While errors are logged in the singleton constructor, the Backend constructor does not validate that devices were successfully enumerated. If initialization fails, the devices vector will be empty and the logging loop will silently not execute, leaving users unaware of the failure.

Consider adding a check after accessing devices to ensure enumeration succeeded, or log a warning if no devices were found.

mllm/backends/ascend/AscendCommon.cpp (1)

102-107: Verify the parameter order to prevent accidental swaps.

Static analysis warns that parameters atol, rtol, and verbose at line 102 have similar or convertible types and could be easily swapped. While the current usage appears correct, consider using strongly-typed parameters or adding runtime validation to catch misuse.

You can add a comment documenting the parameter order or consider using named parameters in C++20 if available:

// Parameters: atol (absolute tolerance), rtol (relative tolerance), verbose (logging flag)
bool verifyAscendTensor(const Tensor& tensor,
                        const std::vector<float>& expected,
                        float atol,
                        float rtol,
                        bool verbose,
                        std::vector<float>* actual_out = nullptr);

MllmRT
MllmCPUBackend
MllmAscendBackend
ascendcl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ascendcl should be linked into MllmAscendBackend

testing::InitGoogleTest(&argc, argv);

// Initialize Ascend backend
mllm::initAscendBackend();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init mllm context before init ascend backend

if(MLLM_BUILD_CUDA_BACKEND)
add_subdirectory(cuda)
endif()
if(DEFINED ENV{ASCEND_HOME_PATH})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (MLLM_BUILD_ASCEND_BACKEND)

set(ASCEND_CANN_PACKAGE_PATH "/usr/local/Ascend/ascend-toolkit/latest"
CACHE STRING "ASCEND CANN package installation directory"
)
set(SOC_VERSION "Ascend310B1" CACHE STRING "system on chip type")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this configurable in mllm's top cmake files

set(SOC_VERSION "Ascend310B1" CACHE STRING "system on chip type")

if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE "Debug" CACHE STRING "Build type Release/Debug (default Debug)" FORCE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent should use 2 blank


if(DEFINED ENV{ASCEND_HOME_PATH})
target_include_directories(MllmAscendBackend PUBLIC $ENV{ASCEND_HOME_PATH}/include)
target_link_directories(MllmAscendBackend PRIVATE $ENV{ASCEND_HOME_PATH}/lib64)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this PUBLIC


if(DEFINED ENV{ATB_HOME_PATH})
target_include_directories(MllmAscendBackend PUBLIC $ENV{ATB_HOME_PATH}/include)
target_link_directories(MllmAscendBackend PRIVATE $ENV{ATB_HOME_PATH}/lib)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this PUBLIC

endif()


target_link_libraries(MllmAscendBackend PRIVATE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PUBLIC

@@ -0,0 +1,121 @@
// Copyright (c) MLLM Team.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file name has typo Ascned -> Ascend

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
mllm/backends/ascend/ops/AscendElewiseOps.cpp (1)

44-48: Resource leak: ATB operation and workspace not destroyed on error paths.

If op->Setup() (line 79) or op->Execute() (line 95) fails, MLLM_ERROR_EXIT is called before atb::DestroyOperation(op) at line 106, leaking the ATB operation. Additionally, if Execute() fails after workspace allocation, the workspace is also leaked.

Consider using RAII for cleanup:

🔎 Proposed fix using scope guards
+#include <utility>  // for std::exchange if needed
+
+namespace {
+struct AtbOpGuard {
+  atb::Operation* op_ = nullptr;
+  explicit AtbOpGuard(atb::Operation* op) : op_(op) {}
+  ~AtbOpGuard() { if (op_) atb::DestroyOperation(op_); }
+  void release() { op_ = nullptr; }
+  AtbOpGuard(const AtbOpGuard&) = delete;
+  AtbOpGuard& operator=(const AtbOpGuard&) = delete;
+};
+}  // namespace
+
 void AscendAddOp::forward(const std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) {
   // ... validation code ...
 
   atb::Operation* op = nullptr;
   auto st = atb::CreateOperation(addParam, &op);
   if (st != atb::NO_ERROR || op == nullptr) {
     MLLM_ERROR_EXIT(ExitCode::kAscendError, "ATB CreateOperation(ELEWISE_ADD) failed, status={}", static_cast<int>(st));
   }
+  AtbOpGuard op_guard(op);
 
   // ... rest of forward() ...
 
   syncGlobalAtbStream();
 
   if (workspace_block_id != -1) {
     auto& mem_mgr = getAscendMemoryManager();
     mem_mgr.freeBlock(workspace_block_id);
   }
 
-  atb::DestroyOperation(op);
+  op_guard.release();
+  atb::DestroyOperation(op);
 }

Also applies to: 77-81, 94-96

🧹 Nitpick comments (2)
CMakeLists.txt (1)

48-51: Consider adding validation for allowed values.

The new cache variables are well-documented. For improved user experience, consider constraining allowed values using set_property:

set(MLLM_RUN_MODE "npu" CACHE STRING "Run mode for mLLM backends: cpu/sim/npu")
set_property(CACHE MLLM_RUN_MODE PROPERTY STRINGS "cpu" "sim" "npu")

set(MLLM_SOC_VERSION "Ascend310B1" CACHE STRING "SOC version for Ascend backend")

This provides dropdown selection in CMake GUIs and catches invalid values early.

tests/ascend/CMakeLists.txt (1)

24-24: Unused include(GoogleTest) without test discovery.

The GoogleTest module is included but gtest_discover_tests() is never called, so the test won't be registered with CTest. Either add discovery or remove the include:

🔎 Option A: Add test discovery
 include(GoogleTest)
+gtest_discover_tests(Mllm-Test-AscendKernel)
🔎 Option B: Remove unused include
-include(GoogleTest)
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7becc18 and 04e56be.

📒 Files selected for processing (8)
  • CMakeLists.txt
  • mllm/backends/ascend/CMakeLists.txt
  • mllm/backends/ascend/ops/AscendElewiseOps.cpp
  • mllm/backends/cpu/kernels/arm/rmsnorm.cpp
  • mllm/backends/cpu/kernels/arm/softmax.cpp
  • tests/CMakeLists.txt
  • tests/ascend/CMakeLists.txt
  • tests/ascend/KernelTest.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • mllm/backends/ascend/CMakeLists.txt
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/ascend/ops/AscendElewiseOps.cpp
  • mllm/backends/cpu/kernels/arm/rmsnorm.cpp
  • mllm/backends/cpu/kernels/arm/softmax.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/ascend/ops/AscendElewiseOps.cpp
  • mllm/backends/cpu/kernels/arm/rmsnorm.cpp
  • mllm/backends/cpu/kernels/arm/softmax.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/ascend/ops/AscendElewiseOps.cpp
  • mllm/backends/cpu/kernels/arm/rmsnorm.cpp
  • mllm/backends/cpu/kernels/arm/softmax.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/ascend/ops/AscendElewiseOps.cpp
  • mllm/backends/cpu/kernels/arm/rmsnorm.cpp
  • mllm/backends/cpu/kernels/arm/softmax.cpp
🧬 Code graph analysis (2)
mllm/backends/cpu/kernels/arm/rmsnorm.cpp (1)
mllm/backends/cpu/kernels/arm/math.hpp (2)
  • vsquare_mean_fp32 (270-284)
  • vsquare_mean_fp32 (270-270)
mllm/backends/cpu/kernels/arm/softmax.cpp (2)
mllm/nn/Functional.cpp (2)
  • exp (81-81)
  • exp (81-81)
mllm/nn/Functional.hpp (1)
  • exp (115-115)
🪛 Clang (14.0.6)
tests/ascend/KernelTest.cpp

[error] 16-16: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)


[error] 21-21: 8 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 21-21: 8 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 22-22: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 22-22: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 23-23: 32 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 23-23: 32 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

🔇 Additional comments (8)
mllm/backends/cpu/kernels/arm/rmsnorm.cpp (2)

20-20: LGTM! Improved consistency.

Replacing std::sqrtf with std::sqrt aligns this function with the rest of the file (rmsnorm_fp32_inplace at line 71 and rmsnorm_fp16_inplace at line 160 already use std::sqrt). The generic std::sqrt is more idiomatic in modern C++ and produces identical results for float arguments.


109-109: LGTM! Improved consistency.

Replacing std::sqrtf with std::sqrt aligns this function with rmsnorm_fp16_inplace (line 160). This change ensures all four RMS normalization functions in this file consistently use the generic std::sqrt, which is the more idiomatic C++ approach.

mllm/backends/cpu/kernels/arm/softmax.cpp (1)

28-28: Clarify the rationale for this change.

This modification to the CPU ARM softmax kernel appears unrelated to the PR's stated objective of adding an Ascend backend. Could you explain why these changes were made? If this is intended as a consistency improvement, please note that it introduces a performance regression (see previous comments).

Also applies to: 115-115, 166-166, 232-232

tests/ascend/KernelTest.cpp (2)

16-26: Test case structure looks good.

The magic numbers flagged by static analysis (8, 16, 32) are test input dimensions representing a range of sizes for FP16 add validation. In test code, explicit literal dimensions are appropriate since they serve as test case data rather than configuration values that need naming.

The test correctly validates multiple dimension pairs for the AddFloat16 operation.


28-44: Initialization and cleanup sequence is correct.

The main() function properly initializes the MLLM context before the Ascend backend, runs tests, reports memory usage, and shuts down the context. This is the correct order of operations.

tests/CMakeLists.txt (1)

14-16: Consistent pattern with existing backends.

The conditional block follows the same pattern as the CUDA backend inclusion, correctly guarding Ascend test compilation behind the MLLM_BUILD_ASCEND_BACKEND option.

mllm/backends/ascend/ops/AscendElewiseOps.cpp (2)

22-24: Setup delegates correctly to base class.

The setup() method properly delegates to BaseOp::setup() for tensor shape inference.


26-39: Good validation and clear NYI markers.

The input validation correctly asserts tensor counts, dtype consistency, and element count matching. The NYI markers clearly document the current limitations (same dtype, no broadcasting).

Comment on lines +85 to +88
if (workspaceSize > 0) {
auto& mem_mgr = getAscendMemoryManager();
mem_mgr.allocateBlock(static_cast<uint32_t>(workspaceSize), workspace_block_id);
mem_mgr.getBlockPtr(workspace_block_id, workspace);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Narrowing conversion: uint64_t to uint32_t may truncate large workspace sizes.

workspaceSize is uint64_t but allocateBlock takes uint32_t. For large workspaces exceeding 4GB, this truncates silently. Consider either:

  1. Checking if workspaceSize exceeds UINT32_MAX and erroring
  2. Updating allocateBlock to accept uint64_t
🔎 Proposed bounds check
   if (workspaceSize > 0) {
+    if (workspaceSize > static_cast<uint64_t>(UINT32_MAX)) {
+      MLLM_ERROR_EXIT(ExitCode::kAscendError, "Workspace size {} exceeds 32-bit limit", workspaceSize);
+    }
     auto& mem_mgr = getAscendMemoryManager();
     mem_mgr.allocateBlock(static_cast<uint32_t>(workspaceSize), workspace_block_id);
     mem_mgr.getBlockPtr(workspace_block_id, workspace);
   }
🤖 Prompt for AI Agents
In mllm/backends/ascend/ops/AscendElewiseOps.cpp around lines 85 to 88,
workspaceSize is a uint64_t but allocateBlock accepts uint32_t which can
truncate values > UINT32_MAX; add a guard that checks if workspaceSize >
UINT32_MAX and handle it (log an error and return/fail or throw) before calling
allocateBlock, or alternatively update the memory manager API to accept uint64_t
and propagate the type change through allocateBlock and its callers; if keeping
allocateBlock as uint32_t, cast only after the bounds check and ensure
workspace_block_id and workspace are handled consistently on the error path.

float sum = 0.f;
for (int i = 0; i < len; ++i) {
auto tmp = std::expf(X[i * stride] - max_value);
auto tmp = std::exp(X[i * stride] - max_value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Performance regression: prefer std::expf for single-precision computations.

Replacing std::expf with std::exp introduces unnecessary double-precision computation in the scalar fallback paths. std::exp(float_arg) promotes the float to double, computes the exponential in double precision, then demotes back to float—significantly slower than std::expf which computes directly in single precision.

These scalar paths handle stride≠1, small tensors (len≤16), and tail elements. While the vectorized paths (which use vexpq_fast_f32) are unaffected, this change degrades performance for non-contiguous memory layouts and small inputs with no apparent benefit.

🔎 Proposed fix
-      auto tmp = std::exp(X[i * stride] - max_value);
+      auto tmp = std::expf(X[i * stride] - max_value);

And for the tail loop:

-    float tmp = std::exp(X[i] - max_value);
+    float tmp = std::expf(X[i] - max_value);

Also applies to: 115-115

🤖 Prompt for AI Agents
In mllm/backends/cpu/kernels/arm/softmax.cpp around lines 28 and 115, the scalar
fallback and tail loops use std::exp on float values which promotes to double
and slows single-precision paths; change these calls to std::expf (and any other
scalar float exp usages in those nearby lines) so the exponential is computed in
single precision, keeping vectorized vexpq_fast_f32 unaffected; ensure the
argument types remain float and that <cmath> is available for expf.

float sum = 0.f;
for (int i = 0; i < len; ++i) {
auto tmp = std::expf(X[i * stride] - max_value);
auto tmp = std::exp(X[i * stride] - max_value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Performance regression: prefer std::expf for half-precision inputs.

Replacing std::expf with std::exp introduces an unnecessarily expensive promotion path for half-precision (fp16) inputs. With fp16 arguments, std::exp promotes fp16→double (8 bytes), computes in double precision, then the result is cast to fp16. Using std::expf would promote fp16→float (4 bytes), compute in single precision—both faster promotion and faster computation.

This affects the same scalar fallback paths as the fp32 version (stride≠1, len≤16, tail elements), and is inconsistent with the vectorized paths that use vexpq_fast_f16.

🔎 Proposed fix
-      auto tmp = std::exp(X[i * stride] - max_value);
+      auto tmp = std::expf(X[i * stride] - max_value);

And for the tail loop:

-    float tmp = std::exp(X[i] - max_value);
+    float tmp = std::expf(X[i] - max_value);

Also applies to: 232-232

🤖 Prompt for AI Agents
In mllm/backends/cpu/kernels/arm/softmax.cpp around lines 166 and 232, the
scalar fallback uses std::exp which forces fp16 inputs to promote to double and
back; change these calls to std::expf so fp16 promotes to float and computes in
single precision for better performance and consistency with vexpq_fast_f16,
ensuring the surrounding expressions still use float temporaries (or cast to
float) before calling std::expf and then cast the result back to the fp16
storage type as needed.

Comment on lines +8 to +13
target_link_libraries(Mllm-Test-AscendKernel PRIVATE
gtest_main
MllmRT
MllmCPUBackend
MllmAscendBackend
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Linker conflict: gtest_main provides main() but test has custom main().

KernelTest.cpp defines its own main() function for Ascend context initialization, but linking against gtest_main will also provide a main(). This causes a linker error (duplicate symbol).

Replace gtest_main with gtest:

🔎 Proposed fix
 target_link_libraries(Mllm-Test-AscendKernel PRIVATE
-  gtest_main
+  gtest
   MllmRT
   MllmCPUBackend
   MllmAscendBackend
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
target_link_libraries(Mllm-Test-AscendKernel PRIVATE
gtest_main
MllmRT
MllmCPUBackend
MllmAscendBackend
)
target_link_libraries(Mllm-Test-AscendKernel PRIVATE
gtest
MllmRT
MllmCPUBackend
MllmAscendBackend
)
🤖 Prompt for AI Agents
In tests/ascend/CMakeLists.txt around lines 8 to 13, the target currently links
against gtest_main which provides its own main(), causing a duplicate-symbol
linker error because KernelTest.cpp defines a custom main(); replace gtest_main
with gtest in the target_link_libraries invocation so the test uses the gtest
library without bringing its main(), then reconfigure/rebuild to verify the
duplicate main() linker error is resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (12)
mllm/backends/ascend/AscendCommon.hpp (1)

130-131: Fix __LINE__ concatenation in macros.

The __LINE__ macro requires a two-level expansion to work correctly in token concatenation. As flagged in a previous review, without the helper macros, the variable name will literally be timer_scope___LINE__.

tests/ascend/CMakeLists.txt (1)

8-13: Linker conflict: gtest_main provides main() but test has custom main().

KernelTest.cpp defines its own main() function for Ascend context initialization, but linking against gtest_main will also provide a main(). This causes a duplicate symbol linker error.

🔎 Proposed fix
 target_link_libraries(Mllm-Test-AscendKernel PRIVATE
-  gtest_main
+  gtest
   MllmRT
   MllmCPUBackend
   MllmAscendBackend
 )
mllm/backends/ascend/memory/MemoryBlock.hpp (1)

11-15: Initialize all struct members to prevent undefined behavior.

The struct members block_id_ and block_size_ are uninitialized. If a MemoryBlock is default-constructed, these fields will contain garbage values, leading to undefined behavior when accessed.

🔎 Proposed fix
 struct MemoryBlock {
-    int64_t block_id_;          
-    size_t block_size_;         
+    int64_t block_id_ = -1;          
+    size_t block_size_ = 0;         
     void* address_ = nullptr;  
 };
mllm/backends/ascend/memory/AscendMemoryPool.cpp (1)

42-71: Alignment and capacity check issues previously flagged.

The issues with alignment math, capacity checking (> vs >=), and missing block_id = -1 on failure were already identified in a previous review comment. Please address those concerns.

mllm/backends/ascend/ops/AscendElewiseOps.cpp (2)

44-48: ATB operation leak on error paths previously flagged.

This issue was identified in a previous review. The ATB operation created at line 45 is not destroyed if Setup() (line 78-81) or Execute() (lines 94-96) fails via MLLM_ERROR_EXIT. Consider using RAII as suggested in the prior review.


85-89: Truncation risk on workspace size previously flagged.

The cast from uint64_t workspaceSize to uint32_t was identified in a previous review as a potential truncation issue for workspaces exceeding 4GB.

mllm/backends/ascend/AscendDispatcher.hpp (1)

13-15: MLLM_EMPTY_SCOPE causes uninitialized field warning.

Static analysis reports that the implicit constructor doesn't initialize the field introduced by MLLM_EMPTY_SCOPE. If the options struct is truly empty, consider removing the macro or adding a default constructor.

🔎 Proposed fix
 struct AscendDispatcherOptions {
-  MLLM_EMPTY_SCOPE;
+  // Reserved for future options.
 };
mllm/backends/ascend/memory/AscendMemoryPool.hpp (1)

15-38: Missing copy/move semantics (Rule of Five) for class owning raw device memory.

AscendMemoryPool defines a destructor that frees base_mem_ptr_ but lacks copy/move constructors and assignment operators. Copying this class would cause double-free. Delete or explicitly define these operations.

🔎 Proposed fix
 class AscendMemoryPool {
 public:
     explicit AscendMemoryPool(size_t pool_size);
     ~AscendMemoryPool();

+    // Non-copyable, non-movable (owns device memory)
+    AscendMemoryPool(const AscendMemoryPool&) = delete;
+    AscendMemoryPool& operator=(const AscendMemoryPool&) = delete;
+    AscendMemoryPool(AscendMemoryPool&&) = delete;
+    AscendMemoryPool& operator=(AscendMemoryPool&&) = delete;
+
     void allocateBlock(uint32_t size, int& block_id);
mllm/backends/ascend/memory/AscendMemoryManager.cpp (2)

27-27: Check the return value of aclrtSetDevice.

If aclrtSetDevice fails, the memory pool will be created with an incorrect device context. This could cause device mismatch errors in subsequent operations.

🔎 Proposed fix
-        aclrtSetDevice(i);
+        auto set_ret = aclrtSetDevice(i);
+        MLLM_ACL_CHECK(set_ret);

43-50: Handle negative device ID before casting to size_t.

If getDeviceId() returns -1 (device not set), casting to size_t produces a huge value that passes the bounds check but accesses invalid memory. Check for negative values explicitly.

🔎 Proposed fix
 std::shared_ptr<AscendMemoryPool> &AscendMemoryManager::getMemoryPool()
 {
-    size_t device_id = static_cast<size_t>(getDeviceId());
+    if (memory_pools_.empty()) {
+        MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Ascend memory pools not initialized");
+    }
+    const int32_t device_id_signed = getDeviceId();
+    if (device_id_signed < 0) {
+        MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Invalid device id {}", device_id_signed);
+    }
+    const size_t device_id = static_cast<size_t>(device_id_signed);
     if (device_id >= memory_pools_.size()) {
         MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "Invalid device id {}", device_id);
     }
     return memory_pools_[device_id];
 }
mllm/backends/ascend/AscendCommon.cpp (2)

269-270: aclFinalize() in constructor terminates ACL runtime prematurely.

Calling aclFinalize() after device enumeration will shut down the ACL runtime. Subsequent ACL operations (e.g., in getGlobalAtbContext()) will fail. Remove this call and let backend shutdown handle ACL cleanup.

🔎 Proposed fix
     devices.push_back(info);
   }

-  // Finalize ACL after enumeration
-  aclFinalize();
+  // Note: ACL should remain initialized for subsequent operations.
+  // aclFinalize() should only be called during backend shutdown.
 #else

209-218: Rename parameter t to a more descriptive name.

The parameter name t is too short per coding guidelines. Rename to tensor for clarity.

🔎 Proposed fix
-void fillAtbTensorDesc(const Tensor& t, atb::TensorDesc& desc) {
+void fillAtbTensorDesc(const Tensor& tensor, atb::TensorDesc& desc) {
   desc.dtype = ACL_FLOAT16; // Currently hardcoded as per demo, can be expanded later
   desc.format = ACL_FORMAT_ND;

-  auto shape = t.shape();
+  auto shape = tensor.shape();
   desc.shape.dimNum = static_cast<uint64_t>(shape.size());
🧹 Nitpick comments (15)
mllm/backends/ascend/ops/AscendElewiseOps.hpp (1)

12-25: LGTM!

The AscendAddOp and AscendAddOpFactory classes follow the established factory pattern used by other backends. The class hierarchy and method signatures are appropriate.

Consider adding brief docstrings for setup and forward methods to document their expected behavior and any Ascend-specific semantics. As per coding guidelines, public APIs should have clear comments explaining purpose, parameters, and behavior.

mllm/backends/ascend/AscendCommon.hpp (2)

19-28: Parenthesize macro arguments to prevent operator precedence issues.

The err argument in both macros should be wrapped in parentheses to avoid unexpected behavior if a complex expression is passed.

🔎 Proposed fix
 #define MLLM_ACL_CHECK(err)                                                                                    \
-  if (err != ACL_SUCCESS) {                                                                                    \
+  if ((err) != ACL_SUCCESS) {                                                                                  \
     MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "ACL error code {}: {}", int(err), aclGetRecentErrMsg()); \
   }

 #define MLLM_ATB_CHECK(err)                                                                                    \
-  if (err != atb::NO_ERROR) {                                                                                  \
+  if ((err) != atb::NO_ERROR) {                                                                                \
     MLLM_ERROR_EXIT(::mllm::ExitCode::kAscendError, "ATB error code {}", int(err));                           \
   }

54-67: Consider adding move semantics to AscendDeviceMetaInfo.

The class deletes copy operations but doesn't explicitly handle move operations or define a destructor. Since it's a singleton with a std::vector member, the compiler-generated defaults are likely fine, but explicitly defaulting them would silence the static analysis warning and clarify intent.

🔎 Proposed fix
 class AscendDeviceMetaInfo {
  public:
   AscendDeviceMetaInfo();
+  ~AscendDeviceMetaInfo() = default;

   static AscendDeviceMetaInfo& instance() {
     static AscendDeviceMetaInfo instance;
     return instance;
   }

   AscendDeviceMetaInfo(const AscendDeviceMetaInfo&) = delete;
   AscendDeviceMetaInfo& operator=(const AscendDeviceMetaInfo&) = delete;
+  AscendDeviceMetaInfo(AscendDeviceMetaInfo&&) = delete;
+  AscendDeviceMetaInfo& operator=(AscendDeviceMetaInfo&&) = delete;

   std::vector<AscendDeviceInfo> devices;
 };
tests/ascend/CMakeLists.txt (1)

24-24: Missing gtest_discover_tests call.

The include(GoogleTest) is present but there's no gtest_discover_tests() call to register the tests with CTest. This means ctest won't discover and run these tests.

🔎 Proposed fix
 include(GoogleTest)
+gtest_discover_tests(Mllm-Test-AscendKernel)
mllm/backends/ascend/AscendBackend.cpp (1)

14-23: Move lambda outside the loop for efficiency.

The bytes_to_mb lambda is defined inside the loop body, causing it to be recreated on each iteration. Move it before the loop for cleaner code.

🔎 Proposed fix
 AscendBackend::AscendBackend() : Backend(kAscend, createAscendAllocator()) {
   regOpFactory<AscendAddOpFactory>();
   regOpFactory<AscendX2XOpFactory>();
+  const auto bytes_to_mb = [](size_t bytes) { return bytes / (1024.0 * 1024.0); };
   auto& devices = AscendDeviceMetaInfo::instance().devices;
   for (const auto& device : devices) {
-    const auto bytes_to_mb = [](size_t bytes) { return bytes / (1024.0 * 1024.0); };
     MLLM_INFO("Found Ascend device {} (ID: {}, SOC: {}, Memory: {:.2f} MB free / {:.2f} MB total)", device.name,
               device.id, device.soc_version, bytes_to_mb(device.free_memory), bytes_to_mb(device.total_memory));
   }
 }
mllm/backends/ascend/AscendAllocator.hpp (1)

15-18: Consider explicitly deleting copy/move operations.

The class contains a std::mutex which is non-copyable, but explicitly deleting copy/move constructors and assignment operators makes the intent clear and satisfies the Rule of Five.

🔎 Proposed fix
 class AscendAllocator final : public Allocator {
  public:
   AscendAllocator();
   ~AscendAllocator();
+
+  AscendAllocator(const AscendAllocator&) = delete;
+  AscendAllocator& operator=(const AscendAllocator&) = delete;
+  AscendAllocator(AscendAllocator&&) = delete;
+  AscendAllocator& operator=(AscendAllocator&&) = delete;
 
   inline bool ctrlByMemManager() override { return false; }
tests/ascend/AscendKernelTest.hpp (1)

18-50: Improve const-correctness and add failure diagnostics.

Two minor issues:

  1. Line 20: Use const auto& since shapes is passed by const reference.
  2. Lines 45-47: Log which shape failed to aid debugging.
🔎 Proposed fix
   bool AddFloat16Test(const std::vector<mllm::Tensor::shape_t>& shapes) {
     using namespace mllm;  // NOLINT
-    for (auto& shape : shapes) {
+    for (const auto& shape : shapes) {
       // 1. Construct random FP16 inputs on CPU
       Tensor x_cpu = Tensor::random(shape, -3, 3, kFloat16, kCPU);
       Tensor y_cpu = Tensor::random(shape, -3, 3, kFloat16, kCPU);
       // ... existing code ...
       auto result = mllm::test::allClose(z_cpu, ref_cpu, 1e-2f, 1e-2f);
       if (!result.is_close) {
+        // Log failure details for debugging
+        MLLM_ERROR("AddFloat16Test failed for shape, max_diff={}", result.max_diff);
         return false;
       }
     }
     return true;
   }
mllm/backends/ascend/Register.cpp (1)

19-19: Consider making pool size configurable.

The hardcoded 100MB pool size may not be optimal for all use cases. Consider making this configurable via environment variable or initialization options.

🔎 Example
// Could read from environment or options struct
size_t pool_size = getenv("MLLM_ASCEND_POOL_SIZE_MB") 
    ? std::stoul(getenv("MLLM_ASCEND_POOL_SIZE_MB")) * 1024 * 1024
    : 100 * 1024 * 1024;  // Default 100MB
mllm/backends/ascend/memory/AscendMemoryManager.hpp (2)

22-22: Returning reference to shared_ptr is unusual.

getMemoryPool() returns std::shared_ptr<AscendMemoryPool>&. Typically, shared_ptr is returned by value (for shared ownership) or const& (for read-only access). Returning a non-const reference allows callers to reset or reassign the internal pointer, which could cause issues.

🔎 Proposed fix
-    std::shared_ptr<AscendMemoryPool> &getMemoryPool();
+    std::shared_ptr<AscendMemoryPool> getMemoryPool();

24-24: uint32_t size limits allocations to 4GB.

Large model tensors or workspaces may exceed 4GB. Consider using size_t for future-proofing.

mllm/backends/ascend/memory/AscendMemoryPool.cpp (1)

90-104: Set addr to nullptr on failure for safer usage.

When block_id is invalid or not found, addr remains uninitialized/unchanged. Setting it to nullptr on error paths prevents use of stale values.

🔎 Proposed fix
 void AscendMemoryPool::getBlockPtr(int block_id, void *&addr) {
     std::unique_lock<std::mutex> lock(block_mutex_);
 
     if (block_id < 0) {
         MLLM_INFO("Invalid block id {} to get ptr", block_id);
+        addr = nullptr;
         return;
     }
 
     auto it = used_blocks_.find(block_id);
     if (it != used_blocks_.end()) {
         addr = it->second.address_;
     } else {
+        addr = nullptr;
         MLLM_ERROR("Get block address error, block id {}", block_id);
     }
 }
mllm/backends/ascend/ops/AscendX2XOp.cpp (1)

39-41: Redundant error check before MLLM_ACL_CHECK.

The pattern if (ret != ACL_SUCCESS) { MLLM_ACL_CHECK(ret); } is redundant—MLLM_ACL_CHECK already handles the success case internally. Simplify to just MLLM_ACL_CHECK(ret); directly after the aclrtMemcpy call.

🔎 Proposed fix
     auto ret = aclrtMemcpy(
         dst_data, data_size,
         src_data, data_size,
         ACL_MEMCPY_HOST_TO_DEVICE);
-
-    if (ret != ACL_SUCCESS) {
-      MLLM_ACL_CHECK(ret);
-    }
+    MLLM_ACL_CHECK(ret);

     syncGlobalAtbStream();

Apply the same pattern to the other two cases (lines 59-61 and 79-81).

Also applies to: 59-61, 79-81

mllm/backends/ascend/AscendAllocator.cpp (1)

82-91: Inconsistent logging: use MLLM_WARN instead of std::cout.

The rest of this file uses structured logging (MLLM_INFO, MLLM_WARN, MLLM_ERROR), but generalAlloc and generalFree use raw std::cout. For consistency and to enable log filtering, use the logging macros.

🔎 Proposed fix
 bool AscendAllocator::generalAlloc(void** ptr, size_t cap, size_t align) {
-    //we don't support generalAlloc , therefore return false
-    std::cout << "generalAlloc is not supported in AscendAllocator" << std::endl;
+    // generalAlloc is not supported
+    MLLM_WARN("generalAlloc is not supported in AscendAllocator");
     return false;
 }

 void AscendAllocator::generalFree(void* ptr) {
-    //we don't support generalFree , therefore do nothing
-    std::cout << "generalFree is not supported in AscendAllocator" << std::endl;
+    // generalFree is not supported
+    MLLM_WARN("generalFree is not supported in AscendAllocator");
 }
mllm/backends/ascend/AscendCommon.cpp (1)

90-90: const_cast on tensor parameter is a code smell.

The const_cast to call .to(kCPU) suggests the method might have side effects. Consider accepting the tensor by non-const reference if mutation is expected, or document why this is safe.

mllm/backends/ascend/CMakeLists.txt (1)

14-23: Using file(GLOB ...) for sources can miss newly added files.

CMake's glob doesn't re-run automatically when files are added. Consider either explicitly listing sources or ensuring developers know to re-run CMake when adding new files. This is acceptable for development but worth documenting.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 04e56be and 2fa8eed.

📒 Files selected for processing (33)
  • CMakeLists.txt
  • docs/ascend_backend/core_design.rst
  • docs/ascend_backend/index.rst
  • docs/index.rst
  • mllm/backends/ascend/AscendAllocator.cpp
  • mllm/backends/ascend/AscendAllocator.hpp
  • mllm/backends/ascend/AscendBackend.cpp
  • mllm/backends/ascend/AscendCommon.cpp
  • mllm/backends/ascend/AscendCommon.hpp
  • mllm/backends/ascend/AscendDispatcher.cpp
  • mllm/backends/ascend/AscendDispatcher.hpp
  • mllm/backends/ascend/CMakeLists.txt
  • mllm/backends/ascend/Register.cpp
  • mllm/backends/ascend/memory/AscendMemoryManager.cpp
  • mllm/backends/ascend/memory/AscendMemoryManager.hpp
  • mllm/backends/ascend/memory/AscendMemoryPool.cpp
  • mllm/backends/ascend/memory/AscendMemoryPool.hpp
  • mllm/backends/ascend/memory/MemoryBlock.hpp
  • mllm/backends/ascend/ops/AscendElewiseOps.cpp
  • mllm/backends/ascend/ops/AscendElewiseOps.hpp
  • mllm/backends/ascend/ops/AscendX2XOp.cpp
  • mllm/backends/ascend/ops/AscendX2XOp.hpp
  • mllm/backends/cpu/kernels/arm/rmsnorm.cpp
  • mllm/backends/cpu/kernels/arm/softmax.cpp
  • mllm/core/DeviceTypes.hpp
  • mllm/engine/Dispatcher.hpp
  • mllm/utils/Common.hpp
  • tasks/build_arm_ascend.yaml
  • tests/CMakeLists.txt
  • tests/ascend/AscendKernelTest.hpp
  • tests/ascend/CMakeLists.txt
  • tests/ascend/KernelTest.cpp
  • tests/ascend/KernelTestHelper.hpp
✅ Files skipped from review due to trivial changes (3)
  • mllm/backends/cpu/kernels/arm/rmsnorm.cpp
  • tasks/build_arm_ascend.yaml
  • docs/ascend_backend/core_design.rst
🚧 Files skipped from review as they are similar to previous changes (7)
  • docs/index.rst
  • mllm/backends/ascend/AscendDispatcher.cpp
  • tests/CMakeLists.txt
  • docs/ascend_backend/index.rst
  • mllm/backends/cpu/kernels/arm/softmax.cpp
  • mllm/backends/ascend/ops/AscendX2XOp.hpp
  • CMakeLists.txt
🧰 Additional context used
📓 Path-based instructions (5)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/ascend/AscendBackend.cpp
  • mllm/core/DeviceTypes.hpp
  • mllm/backends/ascend/Register.cpp
  • mllm/backends/ascend/ops/AscendX2XOp.cpp
  • mllm/backends/ascend/memory/AscendMemoryManager.hpp
  • mllm/backends/ascend/ops/AscendElewiseOps.cpp
  • mllm/backends/ascend/AscendDispatcher.hpp
  • mllm/utils/Common.hpp
  • mllm/backends/ascend/AscendAllocator.cpp
  • mllm/backends/ascend/CMakeLists.txt
  • mllm/backends/ascend/memory/AscendMemoryPool.hpp
  • mllm/backends/ascend/AscendCommon.cpp
  • mllm/backends/ascend/ops/AscendElewiseOps.hpp
  • mllm/backends/ascend/AscendCommon.hpp
  • mllm/engine/Dispatcher.hpp
  • mllm/backends/ascend/AscendAllocator.hpp
  • mllm/backends/ascend/memory/AscendMemoryManager.cpp
  • mllm/backends/ascend/memory/AscendMemoryPool.cpp
  • mllm/backends/ascend/memory/MemoryBlock.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/ascend/AscendBackend.cpp
  • mllm/core/DeviceTypes.hpp
  • mllm/backends/ascend/Register.cpp
  • mllm/backends/ascend/ops/AscendX2XOp.cpp
  • mllm/backends/ascend/memory/AscendMemoryManager.hpp
  • mllm/backends/ascend/ops/AscendElewiseOps.cpp
  • mllm/backends/ascend/AscendDispatcher.hpp
  • mllm/utils/Common.hpp
  • mllm/backends/ascend/AscendAllocator.cpp
  • mllm/backends/ascend/memory/AscendMemoryPool.hpp
  • mllm/backends/ascend/AscendCommon.cpp
  • mllm/backends/ascend/ops/AscendElewiseOps.hpp
  • mllm/backends/ascend/AscendCommon.hpp
  • mllm/engine/Dispatcher.hpp
  • mllm/backends/ascend/AscendAllocator.hpp
  • mllm/backends/ascend/memory/AscendMemoryManager.cpp
  • mllm/backends/ascend/memory/AscendMemoryPool.cpp
  • mllm/backends/ascend/memory/MemoryBlock.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/ascend/AscendBackend.cpp
  • mllm/backends/ascend/Register.cpp
  • mllm/backends/ascend/ops/AscendX2XOp.cpp
  • mllm/backends/ascend/ops/AscendElewiseOps.cpp
  • mllm/backends/ascend/AscendAllocator.cpp
  • mllm/backends/ascend/AscendCommon.cpp
  • mllm/backends/ascend/memory/AscendMemoryManager.cpp
  • mllm/backends/ascend/memory/AscendMemoryPool.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/ascend/AscendBackend.cpp
  • mllm/core/DeviceTypes.hpp
  • mllm/backends/ascend/Register.cpp
  • mllm/backends/ascend/ops/AscendX2XOp.cpp
  • mllm/backends/ascend/memory/AscendMemoryManager.hpp
  • mllm/backends/ascend/ops/AscendElewiseOps.cpp
  • mllm/backends/ascend/AscendDispatcher.hpp
  • mllm/utils/Common.hpp
  • mllm/backends/ascend/AscendAllocator.cpp
  • mllm/backends/ascend/memory/AscendMemoryPool.hpp
  • mllm/backends/ascend/AscendCommon.cpp
  • mllm/backends/ascend/ops/AscendElewiseOps.hpp
  • mllm/backends/ascend/AscendCommon.hpp
  • mllm/engine/Dispatcher.hpp
  • mllm/backends/ascend/AscendAllocator.hpp
  • mllm/backends/ascend/memory/AscendMemoryManager.cpp
  • mllm/backends/ascend/memory/AscendMemoryPool.cpp
  • mllm/backends/ascend/memory/MemoryBlock.hpp
{mllm,mllm-cli,pymllm}/**/*.{toml,json,txt,md}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

When suggesting new dependencies, evaluate their maturity, licensing, and maintenance status. Prefer well-established and actively maintained libraries.

Files:

  • mllm/backends/ascend/CMakeLists.txt
🧠 Learnings (5)
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Use named constants instead of magic numbers.

Applied to files:

  • mllm/backends/ascend/CMakeLists.txt
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Applied to files:

  • mllm/backends/ascend/CMakeLists.txt
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : Encourage consistent coding style and patterns with the existing codebase.

Applied to files:

  • mllm/backends/ascend/CMakeLists.txt
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Avoid unnecessary object creation in loops or hot paths.

Applied to files:

  • mllm/backends/ascend/CMakeLists.txt
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/* : No line may end with trailing whitespace.

Applied to files:

  • mllm/backends/ascend/memory/AscendMemoryPool.hpp
🧬 Code graph analysis (10)
mllm/backends/ascend/AscendBackend.cpp (4)
mllm/backends/ascend/AscendBackend.hpp (1)
  • AscendBackend (13-13)
mllm/backends/ascend/AscendAllocator.cpp (2)
  • createAscendAllocator (107-107)
  • createAscendAllocator (107-107)
mllm/backends/ascend/AscendAllocator.hpp (1)
  • createAscendAllocator (46-46)
mllm/backends/ascend/AscendCommon.hpp (1)
  • instance (58-61)
mllm/backends/ascend/Register.cpp (5)
mllm/backends/ascend/AscendCommon.hpp (1)
  • instance (58-61)
mllm/backends/ascend/memory/AscendMemoryManager.hpp (1)
  • pool_size (18-18)
mllm/engine/DispatcherManager.hpp (1)
  • dispatcher (49-49)
mllm/backends/ascend/AscendDispatcher.cpp (2)
  • createAscendDispatcher (83-86)
  • createAscendDispatcher (83-84)
mllm/backends/ascend/AscendDispatcher.hpp (1)
  • createAscendDispatcher (36-37)
mllm/backends/ascend/ops/AscendX2XOp.cpp (1)
mllm/backends/ascend/ops/AscendX2XOp.hpp (4)
  • AscendX2XOp (14-14)
  • options (21-23)
  • options (21-21)
  • inputs (16-16)
mllm/backends/ascend/memory/AscendMemoryManager.hpp (1)
mllm/backends/ascend/memory/AscendMemoryPool.hpp (3)
  • size (20-20)
  • block_id (22-22)
  • block_id (24-24)
mllm/backends/ascend/AscendAllocator.cpp (2)
mllm/backends/ascend/memory/AscendMemoryManager.cpp (2)
  • getAscendMemoryManager (15-18)
  • getAscendMemoryManager (15-15)
mllm/backends/ascend/memory/AscendMemoryPool.hpp (2)
  • block_id (22-22)
  • block_id (24-24)
mllm/backends/ascend/memory/AscendMemoryPool.hpp (2)
mllm/backends/ascend/memory/AscendMemoryManager.hpp (4)
  • pool_size (18-18)
  • size (24-24)
  • block_id (26-26)
  • block_id (28-28)
mllm/backends/ascend/memory/AscendMemoryPool.cpp (2)
  • AscendMemoryPool (14-26)
  • AscendMemoryPool (28-36)
mllm/backends/ascend/AscendCommon.cpp (3)
mllm/backends/ascend/AscendCommon.hpp (6)
  • AscendTensorHandle (71-71)
  • AscendTensorHandle (72-72)
  • AscendTensorHandle (73-73)
  • AscendTensorHandle (75-75)
  • AscendTensorHandle (77-77)
  • other (78-78)
mllm/backends/ascend/memory/AscendMemoryPool.hpp (3)
  • block_id (22-22)
  • block_id (24-24)
  • size (20-20)
mllm/backends/ascend/memory/AscendMemoryManager.cpp (2)
  • getAscendMemoryManager (15-18)
  • getAscendMemoryManager (15-15)
mllm/backends/ascend/AscendCommon.hpp (1)
mllm/backends/ascend/AscendCommon.cpp (7)
  • AscendTensorHandle (23-24)
  • AscendTensorHandle (26-26)
  • AscendTensorHandle (28-31)
  • other (33-41)
  • other (33-33)
  • AscendTimer (153-161)
  • AscendTimer (163-170)
mllm/backends/ascend/AscendAllocator.hpp (1)
mllm/backends/ascend/AscendAllocator.cpp (2)
  • AscendAllocator (11-13)
  • AscendAllocator (15-21)
mllm/backends/ascend/memory/AscendMemoryManager.cpp (3)
mllm/backends/ascend/memory/AscendMemoryManager.hpp (5)
  • AscendMemoryManager (16-16)
  • pool_size (18-18)
  • size (24-24)
  • block_id (26-26)
  • block_id (28-28)
mllm/backends/ascend/memory/AscendMemoryPool.cpp (6)
  • allocateBlock (42-71)
  • allocateBlock (42-42)
  • freeBlock (73-88)
  • freeBlock (73-73)
  • getBlockPtr (90-104)
  • getBlockPtr (90-90)
mllm/backends/ascend/memory/AscendMemoryPool.hpp (3)
  • size (20-20)
  • block_id (22-22)
  • block_id (24-24)
🪛 Clang (14.0.6)
mllm/backends/ascend/Register.cpp

[error] 19-19: variable 'pool_size' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

tests/ascend/KernelTestHelper.hpp

[error] 7-7: class 'KernelTest' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/ascend/AscendDispatcher.hpp

[error] 13-13: constructor does not initialize these fields: MLLM_EMPTY_SCOPE

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

tests/ascend/AscendKernelTest.hpp

[error] 12-12: class 'AscendKernelTest' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 18-18: method 'AddFloat16Test' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

mllm/backends/ascend/memory/AscendMemoryPool.hpp

[error] 15-15: class 'AscendMemoryPool' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/ascend/AscendCommon.cpp

[error] 54-54: 3 adjacent parameters of 'prepareAscendTensor' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 57-57: variable 'expected_elements' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 68-68: variable 'device_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 69-69: variable 'bytes' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 92-92: variable 'elem_cnt' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 95-95: variable 'src' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 102-102: 3 adjacent parameters of 'verifyAscendTensor' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 102-102: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 121-121: variable name 'ok' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 143-143: 3 adjacent parameters of 'verifyAscendTensor' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 143-143: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 173-173: variable 'ctx' provides global access to a non-const object; consider making the pointed-to data 'const'

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)


[error] 209-209: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

tests/ascend/KernelTest.cpp

[error] 16-16: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)


[error] 21-21: 8 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 21-21: 8 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 22-22: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 22-22: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 23-23: 32 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 23-23: 32 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

mllm/backends/ascend/AscendCommon.hpp

[error] 19-19: function-like macro 'MLLM_ACL_CHECK' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)


[error] 20-20: macro argument should be enclosed in parentheses

(bugprone-macro-parentheses,-warnings-as-errors)


[error] 25-25: function-like macro 'MLLM_ATB_CHECK' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)


[error] 26-26: macro argument should be enclosed in parentheses

(bugprone-macro-parentheses,-warnings-as-errors)


[error] 42-42: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 45-45: constructor does not initialize these fields: name, id, total_memory, free_memory, soc_version

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 54-54: class 'AscendDeviceMetaInfo' defines a copy constructor and a copy assignment operator but does not define a destructor, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 70-70: constructor does not initialize these fields: tensor_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 81-81: function 'valid' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)


[error] 84-84: function 'tensor' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)


[error] 85-85: function 'blockId' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)


[error] 101-101: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 103-103: 1e-2f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 103-103: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 104-104: 1e-2f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 104-104: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 109-109: parameter name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 111-111: 1e-2f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 111-111: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 112-112: 1e-2f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 112-112: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 117-117: class 'AscendTimer' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 130-130: function-like macro 'ASCEND_TIME_SCOPE' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)


[error] 131-131: function-like macro 'ASCEND_TIME_SCOPE_NOSYNC' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

mllm/engine/Dispatcher.hpp

[error] 28-28: invalid case style for global constant 'ascend_dispatcher_id'

(readability-identifier-naming,-warnings-as-errors)

mllm/backends/ascend/AscendAllocator.hpp

[error] 15-15: class 'AscendAllocator' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/ascend/memory/AscendMemoryManager.cpp

[error] 20-20: method 'createMemoryPool' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 22-22: variable 'device_count' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 37-37: variable 'device_id' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 45-45: variable 'device_id' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

mllm/backends/ascend/memory/AscendMemoryPool.cpp

[error] 14-14: constructor does not initialize these fields: block_mutex_, used_blocks_, free_blocks_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 42-42: method 'allocateBlock' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 45-45: variable 'align_size' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 59-59: variable 'cur_mem_ptr_align' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 73-73: method 'freeBlock' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 81-81: variable name 'it' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 90-90: method 'getBlockPtr' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 98-98: variable name 'it' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/ascend/memory/MemoryBlock.hpp

[error] 11-11: constructor does not initialize these fields: block_id_, block_size_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

🔇 Additional comments (17)
mllm/utils/Common.hpp (1)

34-34: LGTM!

The new kAscendError enum value follows the existing pattern for backend-specific error codes and is correctly placed alongside other backend errors.

mllm/core/DeviceTypes.hpp (1)

36-37: LGTM!

The string conversion functions are correctly extended to support kAscend and kAscendHost device types, following the established pattern for other backends.

Also applies to: 52-55

mllm/engine/Dispatcher.hpp (1)

28-28: LGTM!

The new ascend_dispatcher_id constant follows the established naming convention and initialization pattern used by other dispatcher IDs in this class.

tests/ascend/KernelTest.cpp (2)

28-44: LGTM!

The test harness correctly initializes the mllm context before the Ascend backend and properly cleans up resources after test execution. The initialization order addresses the concern raised in the previous review.


16-26: LGTM!

The test covers a good range of tensor sizes for the FP16 Add operation. The inline test data is appropriate for unit tests.

tests/ascend/KernelTestHelper.hpp (1)

7-17: LGTM!

The test fixture scaffolding is appropriate. The static analysis warning about special member functions is a false positive for GoogleTest fixtures—test instances are managed by the framework and are not copied or moved.

mllm/backends/ascend/memory/AscendMemoryPool.cpp (1)

73-88: LGTM: freeBlock implementation is correct.

The double-free detection and logging are appropriate. The early return for invalid block_id < 0 prevents unnecessary map lookups.

mllm/backends/ascend/ops/AscendElewiseOps.cpp (2)

26-40: LGTM: Input validation and constraint checks are appropriate.

The assertions for input/output sizes and the dtype/numel validation provide clear error messages for unsupported cases.


98-106: LGTM: Resource cleanup path is correct for the success case.

Stream synchronization, workspace deallocation, and ATB operation destruction are handled properly in the normal execution path.

mllm/backends/ascend/ops/AscendX2XOp.cpp (1)

17-91: Well-structured device transfer implementation.

The forward method correctly handles all three device transfer cases (CPU→Ascend, Ascend→CPU, Ascend→Ascend) with appropriate ACL memcpy directions and synchronization. The early-return pattern keeps the logic clear.

mllm/backends/ascend/AscendAllocator.cpp (1)

23-47: Good memory pool integration with proper error handling.

The allocation flow correctly handles failures at each step: block allocation failure, pointer retrieval failure (with cleanup via freeBlock), and successful mapping. Thread-safe access to the block map is ensured.

mllm/backends/ascend/AscendDispatcher.hpp (1)

17-37: Dispatcher class structure looks good.

The AscendDispatcher class properly inherits from Dispatcher, uses the shared pointer pattern consistently, and the factory function follows the established codebase patterns.

mllm/backends/ascend/memory/AscendMemoryManager.cpp (1)

52-65: Delegation methods are clean and correct.

The allocateBlock, freeBlock, and getBlockPtr methods correctly delegate to the per-device memory pool obtained via getMemoryPool().

mllm/backends/ascend/AscendCommon.cpp (2)

172-194: Thread-safe ATB context initialization is correctly implemented.

The use of std::once_flag with std::call_once ensures the ATB context and stream are initialized exactly once in a thread-safe manner. This addresses the previous race condition concern.


23-52: Well-implemented RAII handle with proper move semantics.

AscendTensorHandle correctly implements move constructor/assignment with proper resource transfer (setting other.block_id_ = -1), and the destructor properly releases resources via the release() helper.

mllm/backends/ascend/CMakeLists.txt (2)

27-39: Environment-based path handling is appropriate.

The warnings when ASCEND_HOME_PATH or ATB_HOME_PATH are not set provide good feedback to developers. This follows the established pattern for optional SDK paths.


42-48: PUBLIC linkage is correctly applied.

All dependencies (ascendcl, opapi, nnopbase, atb, MllmRT) are linked with PUBLIC visibility, allowing consumers of MllmAscendBackend to transitively access these libraries. This addresses previous review feedback.


constexpr size_t POOL_SIZE = 104857600; // 100 MiB,

AscendMemoryPool::AscendMemoryPool(size_t pool_size = POOL_SIZE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Default parameter should be in declaration, not definition.

C++ requires default arguments to be specified in the declaration (header), not the definition. This may cause compilation errors if the header declares the constructor without the default.

🔎 Proposed fix

In AscendMemoryPool.hpp, add the default parameter:

explicit AscendMemoryPool(size_t pool_size = 104857600);  // 100 MiB

In AscendMemoryPool.cpp, remove the default:

-AscendMemoryPool::AscendMemoryPool(size_t pool_size = POOL_SIZE) {
+AscendMemoryPool::AscendMemoryPool(size_t pool_size) {
🧰 Tools
🪛 Clang (14.0.6)

[error] 14-14: constructor does not initialize these fields: block_mutex_, used_blocks_, free_blocks_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

🤖 Prompt for AI Agents
In mllm/backends/ascend/memory/AscendMemoryPool.cpp around line 14, the
constructor definition currently includes a default parameter which must be
placed in the declaration instead; remove the "= POOL_SIZE" from the definition,
add the default to the declaration in AscendMemoryPool.hpp (e.g. explicit
AscendMemoryPool(size_t pool_size = POOL_SIZE); or the numeric value), and
ensure both header and cpp signatures match (no default in .cpp) before
rebuilding.

Comment on lines +18 to +21
// 1. Create memory pool
size_t pool_size = 100 * 1024 * 1024; // 100MB, can be adjusted as needed
ascend::getAscendMemoryManager().createMemoryPool(pool_size);
MLLM_INFO("Ascend memory pool initialized");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the createMemoryPool implementation
find . -type f -name "*.cpp" -o -name "*.h" | xargs grep -l "createMemoryPool" | head -20

Repository: UbiquitousLearning/mllm

Length of output: 156


🏁 Script executed:

# Check the memory manager header file
rg -n "createMemoryPool" --type=cpp --type=hpp -A 3 -B 1

Repository: UbiquitousLearning/mllm

Length of output: 94


🏁 Script executed:

# Examine the full Register.cpp file context around lines 18-21
rg -n "initAscendBackend|createMemoryPool" mllm/backends/ascend/Register.cpp -A 10 -B 5

Repository: UbiquitousLearning/mllm

Length of output: 814


🏁 Script executed:

# Find and examine the AscendMemoryManager header and implementation
find . -path "*/ascend/memory/AscendMemoryManager*" -type f

Repository: UbiquitousLearning/mllm

Length of output: 174


🏁 Script executed:

# Look at the createMemoryPool method signature and implementation
rg -n "createMemoryPool" mllm/backends/ascend/memory/ -A 10 -B 2

Repository: UbiquitousLearning/mllm

Length of output: 2196


🏁 Script executed:

# Check if there's any return value or exception handling expected
rg -n "class.*MemoryManager|void createMemoryPool|bool createMemoryPool|int createMemoryPool" mllm/backends/ascend/memory/ -A 5

Repository: UbiquitousLearning/mllm

Length of output: 813


🏁 Script executed:

# Check what MLLM_ACL_CHECK does - it seems to be an error handling macro
rg -n "MLLM_ACL_CHECK" mllm/ -B 2 -A 2 | head -50

Repository: UbiquitousLearning/mllm

Length of output: 3108


🏁 Script executed:

# Find the definition of MLLM_ACL_CHECK
rg -n "#define MLLM_ACL_CHECK" mllm/ -A 5

Repository: UbiquitousLearning/mllm

Length of output: 690


🏁 Script executed:

# Also check the rest of the createMemoryPool implementation
cat -n mllm/backends/ascend/memory/AscendMemoryManager.cpp | sed -n '20,40p'

Repository: UbiquitousLearning/mllm

Length of output: 843


🏁 Script executed:

# Check if aclrtSetDevice has error handling elsewhere or if it's documented
rg -n "aclrtSetDevice" mllm/backends/ascend/ -B 2 -A 2 | head -40

Repository: UbiquitousLearning/mllm

Length of output: 1597


🏁 Script executed:

# Check AscendMemoryPool constructor to see if it can fail
rg -n "class AscendMemoryPool|AscendMemoryPool::" mllm/backends/ascend/memory/AscendMemoryPool -A 10 | head -50

Repository: UbiquitousLearning/mllm

Length of output: 151


🏁 Script executed:

# Find AscendMemoryPool files
find . -name "*AscendMemoryPool*" -type f

Repository: UbiquitousLearning/mllm

Length of output: 168


🏁 Script executed:

# Check the AscendMemoryPool header
rg -n "class AscendMemoryPool" mllm/ -A 15

Repository: UbiquitousLearning/mllm

Length of output: 1253


🏁 Script executed:

# Check the AscendMemoryPool constructor implementation
cat -n mllm/backends/ascend/memory/AscendMemoryPool.cpp | head -60

Repository: UbiquitousLearning/mllm

Length of output: 2483


🏁 Script executed:

# Also check if there are any try-catch or error handling in the constructor
rg -n "AscendMemoryPool::AscendMemoryPool" mllm/backends/ascend/memory/ -A 20

Repository: UbiquitousLearning/mllm

Length of output: 1906


🏁 Script executed:

# Check if there's any pattern of validating memory pool after creation in Register.cpp
rg -n "getMemoryPool\|memory_pool" mllm/backends/ascend/Register.cpp -A 3 -B 3

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# Check how other backends handle initialization errors in their register functions
rg -n "void init.*Backend\(\)" mllm/backends/ -A 15 | head -80

Repository: UbiquitousLearning/mllm

Length of output: 4676


Add error handling and validation for memory pool initialization.

The createMemoryPool() call lacks error validation in Register.cpp. Additionally, the implementation has multiple error-handling gaps:

  • aclrtSetDevice(i) in AscendMemoryManager::createMemoryPool() is not checked for errors
  • AscendMemoryPool constructor fails silently (logs error but returns with null pointers) instead of throwing, leaving the pool in a broken state
  • Register.cpp should validate successful pool creation before proceeding

Consider either adding error checks with MLLM_ACL_CHECK for aclrtSetDevice, or having AscendMemoryPool throw an exception on allocation failure so the initialization chain fails explicitly.

🧰 Tools
🪛 Clang (14.0.6)

[error] 19-19: variable 'pool_size' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

🤖 Prompt for AI Agents
In mllm/backends/ascend/Register.cpp around lines 18-21, the call to
ascend::getAscendMemoryManager().createMemoryPool(pool_size) is unvalidated;
update this block to check the return or thrown exception from createMemoryPool
and abort/init-fail with a clear MLLM_ERROR/MLLM_ACL_CHECK if pool creation
fails. In the backend, ensure AscendMemoryManager::createMemoryPool checks and
handles aclrtSetDevice return codes (use MLLM_ACL_CHECK or equivalent) and
change AscendMemoryPool construction to signal failure explicitly (either throw
an exception on allocation/initialization errors or return a nullable/Status
result) so createMemoryPool can detect failure and propagate it back to
Register.cpp for proper logging and process exit.

Copy link
Collaborator

@chenghuaWang chenghuaWang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenghuaWang chenghuaWang merged commit 4b414f6 into UbiquitousLearning:main Dec 23, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants