feat(Ascend): Add some new Ascend Ops#621
feat(Ascend): Add some new Ascend Ops#621chenghuaWang merged 4 commits intoUbiquitousLearning:mainfrom
Conversation
📝 WalkthroughWalkthroughExtends the Ascend backend with multiple new ops (SiLU, Linear, RMSNorm, View, MatMul, Softmax, Sub, Mul), adds a tensor-to-ATB filler utility, adjusts ACL init flow during device enumeration, registers factories for the new ops, exposes functional wrappers (linear, rmsNorm), and adds extensive FP16 tests. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Op as Ascend Op
participant ATB as ATB Library
participant Mem as Ascend Memory Manager
participant Stream as ATB Stream
App->>Op: forward(inputs, outputs)
activate Op
Op->>Op: validate dtype/shape
Op->>ATB: create operation (e.g., Linear/MatMul/Softmax)
ATB-->>Op: op handle
Op->>Op: fill ATB tensors (fillAtbTensor)
opt workspace needed
Op->>Mem: allocate(workspace_size)
Mem-->>Op: workspace_ptr
end
Op->>ATB: Setup(VariantPack)
ATB-->>Op: setup OK
Op->>ATB: Execute(VariantPack)
ATB->>Stream: schedule compute
Stream-->>ATB: done
ATB-->>Op: execute OK
Op->>Stream: synchronize
Stream-->>Op: sync complete
opt workspace allocated
Op->>Mem: free(workspace_ptr)
end
Op->>ATB: destroy operation
Op-->>App: forward complete
deactivate Op
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 15
🤖 Fix all issues with AI agents
In `@mllm/backends/ascend/AscendCommon.cpp`:
- Around line 220-225: Add defensive runtime validation at the start of
fillAtbTensor: verify the input Tensor t has FP16 dtype and is on the Ascend
device (e.g., check t.dtype() == DType::FP16 and t.device/type indicates
Ascend), and ensure t.ptr<void>() is non-null; if any check fails, throw or
return a clear error (e.g., throw std::invalid_argument with a message
identifying fillAtbTensor), then proceed to call fillAtbTensorDesc(t,
atb_tensor.desc) and set atb_tensor.deviceData =
reinterpret_cast<uint8_t*>(t.ptr<void>()) and atb_tensor.dataSize = t.bytes();
this adds defense-in-depth for callers like RMSNormOp that may pass invalid
tensors.
In `@mllm/backends/ascend/ops/AscendElewiseOps.cpp`:
- Around line 168-172: AscendSubOp and AscendMulOp are truncating uint64_t
workspaceSize to uint32_t when calling getAscendMemoryManager().allocateBlock;
add the same guard used in AscendMatMulOp to check if workspaceSize > UINT32_MAX
(or otherwise unsafe to cast) before allocating, and handle the error path
(log/throw or set an error status) instead of silently truncating; update the
allocation calls to cast only after the guard and reference workspace_block_id,
getAscendMemoryManager(), and allocateBlock to locate the change.
In `@mllm/backends/ascend/ops/AscendLinearOp.cpp`:
- Around line 22-29: AscendLinearOp::reshape uses inputs[1] (weight) when
options().isRedirect() without validating inputs size; add a guard at the top of
AscendLinearOp::reshape to verify inputs.size() >= 2 before accessing inputs[1],
and handle the error path (e.g. log/throw or set outputs appropriately) so the
function does not read out-of-bounds; ensure the same validation covers
accessing inputs[0] as well to avoid further OOB access.
- Around line 38-56: The function AscendLinearOp::forward dereferences
outputs[0] without confirming outputs has at least one element; add a check
using MLLM_RT_ASSERT (or similar) to assert outputs.size() == 1 (or
outputs.size() >= 1 as appropriate for this op) before using outputs[0], and
fail early with a clear message; update any existing precondition comments to
reflect the requirement so callers and tests are consistent with the new
assertion.
In `@mllm/backends/ascend/ops/AscendLinearOp.hpp`:
- Around line 12-25: Add brief Doxygen-style API comments for the public classes
and methods: document AscendLinearOp (purpose as Ascend backend implementation
of aops::LinearOp), its constructor AscendLinearOp(const aops::LinearOpOptions&
options) (describe options parameter), and the public overrides setup, forward,
and reshape (describe inputs, outputs and expected behavior/side-effects). Also
document AscendLinearOpFactory (purpose to create AscendLinearOp instances) and
its createOpImpl method (describe returned shared_ptr and parameter). Keep
comments concise, mention parameter names and return type and any thrown
conditions.
In `@mllm/backends/ascend/ops/AscendMatMulOp.cpp`:
- Around line 120-126: The code currently casts workspaceSize (uint64_t) to
uint32_t when calling getAscendMemoryManager().allocateBlock, risking truncation
for sizes > UINT32_MAX; update the block-allocation logic in AscendMatMulOp.cpp
to first check workspaceSize against UINT32_MAX and handle overflow explicitly
(e.g., log error/throw or return a failure status) instead of silently
static_casting, then only call
mem_mgr.allocateBlock(static_cast<uint32_t>(workspaceSize), workspace_block_id)
when safe; keep references to workspaceSize, getAscendMemoryManager(),
allocateBlock(...), workspace_block_id, and getBlockPtr(...) so the check and
error path are added around the existing allocation calls.
- Around line 29-42: The helper fillAtbTensorWithShape currently hardcodes
desc.dtype = ACL_FLOAT16 which breaks non-FP16 tensors; update
fillAtbTensorWithShape to derive the ACL dtype from the input Tensor t (or call
the existing fillAtbTensorDesc helper used by AscendAddOp/AscendSoftmaxOp) and
set desc.dtype accordingly, then recompute atb_tensor.dataSize and assign
deviceData as before; also add validation to throw/log an error if t's dtype
cannot be mapped to a supported ACL dtype so unsupported types fail fast instead
of producing silent errors.
In `@mllm/backends/ascend/ops/AscendRMSNormOp.cpp`:
- Around line 27-33: Reinstate input-count validation at the top of
AscendRMSNormOp::forward to ensure inputs.size() is 1 or 2 before any indexing;
replace the commented-out check with an active assertion (e.g., MLLM_RT_ASSERT
or similar used in this file) that yields a clear error message like
"AscendRMSNormOp expects 1 or 2 inputs" so inputs[0] and the conditional access
of inputs[1] are safe.
In `@mllm/backends/ascend/ops/AscendRMSNormOp.hpp`:
- Around line 12-24: Add concise API documentation comments for the public
classes and methods: document the purpose of the AscendRMSNormOp class, its
constructor AscendRMSNormOp(const aops::RMSNormOpOptions& options), and its
public methods setup(const std::vector<Tensor>& inputs, std::vector<Tensor>&
outputs) and forward(const std::vector<Tensor>& inputs, std::vector<Tensor>&
outputs) (describe expected inputs, outputs, and behaviors); also document the
AscendRMSNormOpFactory class and its createOpImpl(const aops::RMSNormOpOptions&
options) method (describe what it returns and when to use it). Keep comments
brief, use Doxygen-style or single-line doc comments above each public
declaration, and include parameter/return descriptions for the constructor and
createOpImpl where applicable.
In `@mllm/backends/ascend/ops/AscendSiLUOp.hpp`:
- Around line 12-24: Add concise API doc comments for the public classes and
their constructor/methods: document AscendSiLUOp (purpose: Ascend backend SiLU
activation operator), its constructor AscendSiLUOp(const aops::SiLUOpOptions&
options) (describe options and ownership), and its public methods setup(const
std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) and forward(const
std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) (describe expected
inputs, outputs, and possible error conditions). Also add a brief comment for
AscendSiLUOpFactory explaining it creates AscendSiLUOp instances for
OpTypes::kSiLU via createOpImpl. Keep comments short and in Doxygen-style or
single-line form above each declaration.
In `@mllm/backends/ascend/ops/AscendSoftmaxOp.cpp`:
- Around line 55-63: After converting options_.axis to a positive index in
AscendSoftmaxOp.cpp, validate that the resulting axis is within [0, x.rank()-1];
if x.rank() is zero or axis is out of range, return or throw a clear error
(e.g., std::invalid_argument or the project’s status/error reporting mechanism)
before pushing to softmaxParam.axes. Update the block that sets axis (uses
options_.axis and x.rank()) to perform this bounds check and produce a
descriptive error mentioning the invalid axis and the input rank instead of
allowing an out-of-range axis to be used.
In `@mllm/nn/Functional.hpp`:
- Line 23: Add short API doc comments to the new public functional wrappers
(e.g., the function linear(const Tensor& x, const Tensor& weight, const Tensor&
bias = Tensor()) and the other functional wrappers around line 136) that state
the function purpose, describe each parameter (clearly mark optional parameters
like bias, epsilon, add_unit_offset), specify the return value (what Tensor
shape/semantics to expect), and outline error behavior (e.g., mismatched shapes,
null/empty tensors). Keep each doc comment brief (one to three sentences) and
attach it immediately above the function declaration so tools and readers can
discover the API easily.
In `@tests/ascend/AscendAttentionKernelTest.hpp`:
- Around line 224-239: The causal mask generation in the block that creates
mask_cpu when use_mask is true uses offset = S_kv - S_q and will produce
incorrect masking when S_kv < S_q (negative offset) causing entire rows to
become -inf; update the logic in the mask_cpu creation (the loops that write
mask_ptr) to skip creating the upper-triangular mask when S_kv < S_q (e.g.,
guard early and leave mask as zeros or compute only when S_kv >= S_q), and apply
the same guard to the GQA mask block referenced at the GQA mask generation
(mirror the check around its mask write loops) so no rows are fully masked when
S_kv < S_q.
In `@tests/ascend/AscendLinearKernelTest.hpp`:
- Around line 24-90: Add a guard at the start of LinearFloat16Test’s per-case
loop to validate input_shape is non-empty and that input_shape.back() ==
in_channels (and that in_channels>0 and out_channels>0) to prevent out-of-bounds
reads when computing the CPU reference; if the check fails, log a clear message
(using the existing "[LinearTest]" prefix) and return false (or skip the case as
desired) so malformed test cases do not cause invalid memory access in the loops
that use x_ptr and w_ptr.
In `@tests/ascend/AscendSoftmaxKernelTest.hpp`:
- Around line 41-110: The index math for middle-axis softmax is incorrect (it
multiplies by shape[pos_axis] twice); replace the repeated complex branches
computing idx inside the max/exp/result loops with a single flattened-index
expression using outer, axis_size and inner_size: compute idx = (outer *
axis_size + i) * inner_size + inner (use this in the three places where idx is
computed), instead of the current conditional expressions that use strides and
shape[pos_axis]; this fixes OOB and ensures correct indexing for 0 < pos_axis <
ndim - 1 when using outer_size, axis_size and inner_size.
🧹 Nitpick comments (5)
mllm/backends/ascend/ops/AscendElewiseOps.hpp (1)
27-56: Consider adding brief documentation for the public API classes.The coding guidelines recommend that public APIs have clear docstrings or comments. While the existing
AscendAddOpalso lacks documentation (so this is consistent), adding a brief comment describing each op's purpose would improve maintainability.📝 Optional: Add brief class documentation
+/// Ascend backend implementation of element-wise subtraction. class AscendSubOp final : public aops::SubOp {+/// Ascend backend implementation of element-wise multiplication. class AscendMulOp final : public aops::MulOp {mllm/backends/ascend/AscendCommon.hpp (1)
44-45: Document parameter/ownership expectations forfillAtbTensor.The current comment is brief but doesn’t describe parameter expectations or ownership/lifetime, which is important for a public API.
📝 Suggested doc comment expansion
-// Setup ATB Tensor with correct dataSize calculated by ATB Utils +// Setup ATB Tensor with correct dataSize calculated by ATB Utils. +// `@param` t Source tensor whose storage is referenced. +// `@param` atb_tensor ATB tensor to populate (metadata + deviceData). +// `@note` Caller must ensure `t` remains alive while `atb_tensor` is in use. void fillAtbTensor(const Tensor& t, atb::Tensor& atb_tensor);As per coding guidelines, ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
mllm/backends/ascend/ops/AscendSoftmaxOp.hpp (1)
12-24: Add brief API docs for AscendSoftmaxOp and its factory.
These new public types are introduced without any class-level documentation; a short purpose/usage comment would satisfy the API doc requirement.As per coding guidelines, Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.✍️ Suggested doc comments
+// Ascend backend implementation of Softmax. class AscendSoftmaxOp final : public aops::SoftmaxOp { @@ -class AscendSoftmaxOpFactory final : public TypedOpFactory<OpTypes::kSoftmax, aops::SoftmaxOpOptions> { +// Factory for creating AscendSoftmaxOp instances. +class AscendSoftmaxOpFactory final : public TypedOpFactory<OpTypes::kSoftmax, aops::SoftmaxOpOptions> {mllm/backends/ascend/ops/AscendMatMulOp.hpp (1)
12-23: Add brief API docs for AscendMatMulOp and its factory.
These new public types are introduced without any class-level documentation; a short purpose/usage comment would satisfy the API doc requirement.As per coding guidelines, Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.✍️ Suggested doc comments
+// Ascend backend implementation of MatMul. class AscendMatMulOp final : public aops::MatMulOp { @@ -class AscendMatMulOpFactory : public TypedOpFactory<OpTypes::kMatMul, aops::MatMulOpOptions> { +// Factory for creating AscendMatMulOp instances. +class AscendMatMulOpFactory : public TypedOpFactory<OpTypes::kMatMul, aops::MatMulOpOptions> {mllm/backends/ascend/ops/AscendElewiseOps.cpp (1)
106-189: LGTM, but significant code duplication with AscendAddOp.The
AscendSubOpimplementation is correct and follows the established ATB workflow pattern. However, thisforwardfunction is nearly identical toAscendAddOp::forward(lines 26-104), differing only in:
elewiseTypeenum value- Error message strings
Consider extracting the common logic into a shared helper function parameterized by the elewise type:
Refactor suggestion
namespace { void executeElewiseOp( atb::infer::ElewiseParam::ElewiseType type, const char* opName, const std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) { // Common implementation here... } } // namespace void AscendAddOp::forward(...) { executeElewiseOp(atb::infer::ElewiseParam::ELEWISE_ADD, "AddOp", inputs, outputs); } void AscendSubOp::forward(...) { executeElewiseOp(atb::infer::ElewiseParam::ELEWISE_SUB, "SubOp", inputs, outputs); }
| void fillAtbTensor(const Tensor& t, atb::Tensor& atb_tensor) { | ||
| fillAtbTensorDesc(t, atb_tensor.desc); | ||
| atb_tensor.deviceData = reinterpret_cast<uint8_t*>(t.ptr<void>()); | ||
| // Use MLLM tensor's actual bytes as dataSize to match allocated memory | ||
| atb_tensor.dataSize = t.bytes(); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n mllm/backends/ascend/AscendCommon.cpp | sed -n '215,230p'Repository: UbiquitousLearning/mllm
Length of output: 721
🏁 Script executed:
# Search for fillAtbTensorDesc definition to understand what it does
rg -n "fillAtbTensorDesc" mllm/backends/ascend/ -A 5Repository: UbiquitousLearning/mllm
Length of output: 4216
🏁 Script executed:
# Find all call sites of fillAtbTensor
rg -n "fillAtbTensor\(" mllm/backends/ascend/ -B 2 -A 2Repository: UbiquitousLearning/mllm
Length of output: 2581
🏁 Script executed:
# Check if there's a Tensor class definition to understand dtype() and device() methods
rg -n "class Tensor" -t h -t hppRepository: UbiquitousLearning/mllm
Length of output: 94
🏁 Script executed:
# Search for Tensor class definition
rg -n "class Tensor" -t c -t h --max-count=5Repository: UbiquitousLearning/mllm
Length of output: 814
🏁 Script executed:
# Search for MLLM_RT_ASSERT macro definition
rg -n "MLLM_RT_ASSERT" -t c -t h --max-count=5Repository: UbiquitousLearning/mllm
Length of output: 9863
🏁 Script executed:
# Check if Tensor has device() and dtype() methods
rg -n "\.device\(\)|\.dtype\(\)" mllm/backends/ascend/ -B 1 -A 1 | head -30Repository: UbiquitousLearning/mllm
Length of output: 2524
🏁 Script executed:
# Look for Tensor definition more broadly
fd "Tensor" mllm --type f -e h | head -5Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
# Check Tensor class definition for device() and dtype() methods
cat -n mllm/core/Tensor.hpp | sed -n '53,150p'Repository: UbiquitousLearning/mllm
Length of output: 3824
🏁 Script executed:
# Look at one of the call sites more closely to see if there's validation before fillAtbTensor
cat -n mllm/backends/ascend/ops/AscendSoftmaxOp.cpp | sed -n '30,85p'Repository: UbiquitousLearning/mllm
Length of output: 2143
🏁 Script executed:
# Check AscendRMSNormOp for dtype/device validation before fillAtbTensor
cat -n mllm/backends/ascend/ops/AscendRMSNormOp.cpp | sed -n '50,70p'Repository: UbiquitousLearning/mllm
Length of output: 830
🏁 Script executed:
# Check AscendLinearOp for dtype validation before fillAtbTensor
cat -n mllm/backends/ascend/ops/AscendLinearOp.cpp | sed -n '100,125p'Repository: UbiquitousLearning/mllm
Length of output: 989
🏁 Script executed:
# Check AscendRMSNormOp for any validation
cat -n mllm/backends/ascend/ops/AscendRMSNormOp.cpp | sed -n '1,70p'Repository: UbiquitousLearning/mllm
Length of output: 2800
🏁 Script executed:
# Search for dtype() and device() method definitions in Tensor class
cat -n mllm/core/Tensor.hpp | grep -A 1 "dtype()\|device()" | head -20Repository: UbiquitousLearning/mllm
Length of output: 281
🏁 Script executed:
# Check AscendLinearOp more thoroughly for dtype validation
cat -n mllm/backends/ascend/ops/AscendLinearOp.cpp | sed -n '1,100p'Repository: UbiquitousLearning/mllm
Length of output: 4524
🏁 Script executed:
# Check what MLLM_TYPE_F16 is and its relationship with kFloat16
rg -n "MLLM_TYPE_F16|kFloat16" mllm/core/DataTypes.hpp -A 1 | head -20Repository: UbiquitousLearning/mllm
Length of output: 354
Add input validation to fillAtbTensor.
fillAtbTensor hardcodes FP16 in the descriptor and assumes a device pointer without validating the input tensor. While some call sites (SoftmaxOp, LinearOp) validate dtype before calling this function, others (RMSNormOp) do not. Add runtime checks to ensure tensors are FP16 and on the Ascend device, providing defense-in-depth against invalid tensor usage.
Suggested validation
void fillAtbTensor(const Tensor& t, atb::Tensor& atb_tensor) {
+ MLLM_RT_ASSERT(t.device() == kAscend);
+ MLLM_RT_ASSERT(t.dtype() == MLLM_TYPE_F16);
+ MLLM_RT_ASSERT(!t.isNil());
fillAtbTensorDesc(t, atb_tensor.desc);
- atb_tensor.deviceData = reinterpret_cast<uint8_t*>(t.ptr<void>());
+ auto* ptr = t.ptr<void>();
+ MLLM_RT_ASSERT(ptr != nullptr);
+ atb_tensor.deviceData = reinterpret_cast<uint8_t*>(ptr);
// Use MLLM tensor's actual bytes as dataSize to match allocated memory
atb_tensor.dataSize = t.bytes();
}🤖 Prompt for AI Agents
In `@mllm/backends/ascend/AscendCommon.cpp` around lines 220 - 225, Add defensive
runtime validation at the start of fillAtbTensor: verify the input Tensor t has
FP16 dtype and is on the Ascend device (e.g., check t.dtype() == DType::FP16 and
t.device/type indicates Ascend), and ensure t.ptr<void>() is non-null; if any
check fails, throw or return a clear error (e.g., throw std::invalid_argument
with a message identifying fillAtbTensor), then proceed to call
fillAtbTensorDesc(t, atb_tensor.desc) and set atb_tensor.deviceData =
reinterpret_cast<uint8_t*>(t.ptr<void>()) and atb_tensor.dataSize = t.bytes();
this adds defense-in-depth for callers like RMSNormOp that may pass invalid
tensors.
| if (workspaceSize > 0) { | ||
| auto& mem_mgr = getAscendMemoryManager(); | ||
| mem_mgr.allocateBlock(static_cast<uint32_t>(workspaceSize), workspace_block_id); | ||
| mem_mgr.getBlockPtr(workspace_block_id, workspace); | ||
| } |
There was a problem hiding this comment.
Same workspaceSize truncation concern as in AscendMatMulOp.
Both AscendSubOp and AscendMulOp cast uint64_t workspaceSize to uint32_t when calling allocateBlock. Consider adding the same guard suggested for AscendMatMulOp for consistency across all Ascend ops.
Also applies to: 253-257
🤖 Prompt for AI Agents
In `@mllm/backends/ascend/ops/AscendElewiseOps.cpp` around lines 168 - 172,
AscendSubOp and AscendMulOp are truncating uint64_t workspaceSize to uint32_t
when calling getAscendMemoryManager().allocateBlock; add the same guard used in
AscendMatMulOp to check if workspaceSize > UINT32_MAX (or otherwise unsafe to
cast) before allocating, and handle the error path (log/throw or set an error
status) instead of silently truncating; update the allocation calls to cast only
after the guard and reference workspace_block_id, getAscendMemoryManager(), and
allocateBlock to locate the change.
| class AscendLinearOp final : public aops::LinearOp { | ||
| public: | ||
| explicit AscendLinearOp(const aops::LinearOpOptions& options); | ||
|
|
||
| void setup(const std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) override; | ||
| void forward(const std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) override; | ||
| void reshape(const std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) override; | ||
| }; | ||
|
|
||
| class AscendLinearOpFactory final : public TypedOpFactory<OpTypes::kLinear, aops::LinearOpOptions> { | ||
| public: | ||
| std::shared_ptr<BaseOp> createOpImpl(const aops::LinearOpOptions& options) override { | ||
| return std::make_shared<AscendLinearOp>(options); | ||
| } |
There was a problem hiding this comment.
Add brief API docs for the public Ascend Linear types.
These public declarations should document purpose and parameters to meet the API comment requirement.
📝 Suggested doc comments
class AscendLinearOp final : public aops::LinearOp {
public:
+ /// Create an Ascend Linear op.
+ /// `@param` options Linear op configuration options.
explicit AscendLinearOp(const aops::LinearOpOptions& options);
+ /// Prepare output metadata for the operation.
void setup(const std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) override;
+ /// Execute Linear on inputs and write to outputs.
void forward(const std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) override;
+ /// Reshape output tensors as needed for the operation.
void reshape(const std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) override;
};
+/// Factory for AscendLinearOp.
class AscendLinearOpFactory final : public TypedOpFactory<OpTypes::kLinear, aops::LinearOpOptions> {As per coding guidelines, ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
🤖 Prompt for AI Agents
In `@mllm/backends/ascend/ops/AscendLinearOp.hpp` around lines 12 - 25, Add brief
Doxygen-style API comments for the public classes and methods: document
AscendLinearOp (purpose as Ascend backend implementation of aops::LinearOp), its
constructor AscendLinearOp(const aops::LinearOpOptions& options) (describe
options parameter), and the public overrides setup, forward, and reshape
(describe inputs, outputs and expected behavior/side-effects). Also document
AscendLinearOpFactory (purpose to create AscendLinearOp instances) and its
createOpImpl method (describe returned shared_ptr and parameter). Keep comments
concise, mention parameter names and return type and any thrown conditions.
| Tensor matmul(const Tensor& A, const Tensor& B, bool transpose_A = false, bool transpose_B = false, | ||
| aops::MatMulOpType type = aops::MatMulOpType::kDefault); | ||
|
|
||
| Tensor linear(const Tensor& x, const Tensor& weight, const Tensor& bias = Tensor()); |
There was a problem hiding this comment.
Add brief API docs for the new functional wrappers.
These are new public APIs; please add short doc comments describing purpose, parameters (including optional bias/epsilon/add_unit_offset), return value, and error behavior.
✏️ Suggested doc comment scaffold
+/// Linear: y = x * weight (+ bias).
+/// `@param` x Input tensor.
+/// `@param` weight Weight matrix.
+/// `@param` bias Optional bias (nil => no bias).
+/// `@return` Output tensor.
Tensor linear(const Tensor& x, const Tensor& weight, const Tensor& bias = Tensor());+/// RMSNorm: y = x * weight / sqrt(mean(x^2) + epsilon).
+/// `@param` x Input tensor.
+/// `@param` weight Scale tensor.
+/// `@param` epsilon Numerical stability term.
+/// `@param` add_unit_offset Whether to add unit offset.
+/// `@return` Output tensor.
Tensor rmsNorm(const Tensor& x, const Tensor& weight, float epsilon = 1e-5f, bool add_unit_offset = false);Also applies to: 136-136
🤖 Prompt for AI Agents
In `@mllm/nn/Functional.hpp` at line 23, Add short API doc comments to the new
public functional wrappers (e.g., the function linear(const Tensor& x, const
Tensor& weight, const Tensor& bias = Tensor()) and the other functional wrappers
around line 136) that state the function purpose, describe each parameter
(clearly mark optional parameters like bias, epsilon, add_unit_offset), specify
the return value (what Tensor shape/semantics to expect), and outline error
behavior (e.g., mismatched shapes, null/empty tensors). Keep each doc comment
brief (one to three sentences) and attach it immediately above the function
declaration so tools and readers can discover the API easily.
| // 2. Create causal mask if needed | ||
| // Causal mask: mask[i, j] = 0 if j <= i, else -inf (large negative value) | ||
| Tensor mask_cpu; | ||
| if (use_mask) { | ||
| mask_cpu = Tensor::zeros({1, 1, S_q, S_kv}, kFloat16, kCPU); | ||
| auto* mask_ptr = mask_cpu.ptr<mllm_fp16_t>(); | ||
|
|
||
| // Fill causal mask: upper triangular part is masked (-inf) | ||
| for (int32_t i = 0; i < S_q; ++i) { | ||
| for (int32_t j = 0; j < S_kv; ++j) { | ||
| int32_t offset = S_kv - S_q; | ||
| if (j > i + offset) { | ||
| mask_ptr[i * S_kv + j] = MLLM_FP32_TO_FP16(-10000.0f); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Guard causal mask generation when S_kv < S_q.
With offset = S_kv - S_q, a negative offset can mask entire rows, leading to all--inf scores and NaNs. Add a guard when use_mask is true (and mirror the same guard in the GQA mask block at Line 339–352).
✅ Suggested guard
if (use_mask) {
+ MLLM_RT_ASSERT_EQ(S_kv >= S_q, true);
mask_cpu = Tensor::zeros({1, 1, S_q, S_kv}, kFloat16, kCPU);
auto* mask_ptr = mask_cpu.ptr<mllm_fp16_t>();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // 2. Create causal mask if needed | |
| // Causal mask: mask[i, j] = 0 if j <= i, else -inf (large negative value) | |
| Tensor mask_cpu; | |
| if (use_mask) { | |
| mask_cpu = Tensor::zeros({1, 1, S_q, S_kv}, kFloat16, kCPU); | |
| auto* mask_ptr = mask_cpu.ptr<mllm_fp16_t>(); | |
| // Fill causal mask: upper triangular part is masked (-inf) | |
| for (int32_t i = 0; i < S_q; ++i) { | |
| for (int32_t j = 0; j < S_kv; ++j) { | |
| int32_t offset = S_kv - S_q; | |
| if (j > i + offset) { | |
| mask_ptr[i * S_kv + j] = MLLM_FP32_TO_FP16(-10000.0f); | |
| } | |
| } | |
| } | |
| // 2. Create causal mask if needed | |
| // Causal mask: mask[i, j] = 0 if j <= i, else -inf (large negative value) | |
| Tensor mask_cpu; | |
| if (use_mask) { | |
| MLLM_RT_ASSERT_EQ(S_kv >= S_q, true); | |
| mask_cpu = Tensor::zeros({1, 1, S_q, S_kv}, kFloat16, kCPU); | |
| auto* mask_ptr = mask_cpu.ptr<mllm_fp16_t>(); | |
| // Fill causal mask: upper triangular part is masked (-inf) | |
| for (int32_t i = 0; i < S_q; ++i) { | |
| for (int32_t j = 0; j < S_kv; ++j) { | |
| int32_t offset = S_kv - S_q; | |
| if (j > i + offset) { | |
| mask_ptr[i * S_kv + j] = MLLM_FP32_TO_FP16(-10000.0f); | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In `@tests/ascend/AscendAttentionKernelTest.hpp` around lines 224 - 239, The
causal mask generation in the block that creates mask_cpu when use_mask is true
uses offset = S_kv - S_q and will produce incorrect masking when S_kv < S_q
(negative offset) causing entire rows to become -inf; update the logic in the
mask_cpu creation (the loops that write mask_ptr) to skip creating the
upper-triangular mask when S_kv < S_q (e.g., guard early and leave mask as zeros
or compute only when S_kv >= S_q), and apply the same guard to the GQA mask
block referenced at the GQA mask generation (mirror the check around its mask
write loops) so no rows are fully masked when S_kv < S_q.
| bool LinearFloat16Test(const std::vector<std::tuple<mllm::Tensor::shape_t, int, int>>& test_cases) { | ||
| using namespace mllm; // NOLINT | ||
| for (auto& test_case : test_cases) { | ||
| auto input_shape = std::get<0>(test_case); | ||
| int in_channels = std::get<1>(test_case); | ||
| int out_channels = std::get<2>(test_case); | ||
|
|
||
| std::cout << "[LinearTest] Testing shape=["; | ||
| for (size_t i = 0; i < input_shape.size(); ++i) { | ||
| std::cout << input_shape[i] << (i < input_shape.size() - 1 ? ", " : ""); | ||
| } | ||
| std::cout << "], in=" << in_channels << ", out=" << out_channels << std::endl; | ||
|
|
||
| // 1. Construct random FP16 inputs on CPU | ||
| // x: [M, K] where K = in_channels | ||
| Tensor x_cpu = Tensor::random(input_shape, -1, 1, kFloat16, kCPU); | ||
|
|
||
| // Weight shape for ATB: [K, N] where K=in_channels, N=out_channels | ||
| Tensor weight_cpu = Tensor::random({in_channels, out_channels}, -0.5, 0.5, kFloat16, kCPU); | ||
|
|
||
| // 2. Compute reference result on CPU | ||
| // y = x @ weight, where x is [M, K], weight is [K, N], output is [M, N] | ||
| auto output_shape = input_shape; | ||
| output_shape[output_shape.size() - 1] = out_channels; | ||
| Tensor ref_cpu = Tensor::zeros(output_shape, kFloat16, kCPU); | ||
|
|
||
| { | ||
| auto* x_ptr = x_cpu.ptr<mllm_fp16_t>(); | ||
| auto* w_ptr = weight_cpu.ptr<mllm_fp16_t>(); | ||
| auto* r_ptr = ref_cpu.ptr<mllm_fp16_t>(); | ||
|
|
||
| size_t batch_size = 1; | ||
| for (size_t i = 0; i < input_shape.size() - 1; ++i) { | ||
| batch_size *= input_shape[i]; | ||
| } | ||
|
|
||
| for (size_t b = 0; b < batch_size; ++b) { | ||
| for (int o = 0; o < out_channels; ++o) { | ||
| float sum = 0.0f; | ||
| for (int i = 0; i < in_channels; ++i) { | ||
| float x_val = MLLM_FP16_TO_FP32(x_ptr[b * in_channels + i]); | ||
| float w_val = MLLM_FP16_TO_FP32(w_ptr[i * out_channels + o]); // weight is [K, N] | ||
| sum += x_val * w_val; | ||
| } | ||
| r_ptr[b * out_channels + o] = MLLM_FP32_TO_FP16(sum); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // 3. Move inputs to Ascend and run Linear via matmul | ||
| auto x_ascend = x_cpu.to(kAscend); | ||
| auto weight_ascend = weight_cpu.to(kAscend); | ||
|
|
||
| // Use matmul: y = x @ weight | ||
| auto y_ascend = nn::functional::matmul(x_ascend, weight_ascend, false, false); | ||
|
|
||
| // 4. Move result back to CPU and compare with reference | ||
| auto y_cpu = y_ascend.to(kCPU); | ||
| auto result = mllm::test::allClose(y_cpu, ref_cpu, 1e-2f, 1e-2f); | ||
| if (!result.is_close) { | ||
| std::cout << "[LinearTest] FAILED!" << std::endl; | ||
| return false; | ||
| } | ||
| std::cout << "[LinearTest] PASSED" << std::endl; | ||
| } | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Guard test cases against shape/channel mismatches.
The reference math indexes using in_channels and assumes input_shape.back() == in_channels; add an early check to avoid OOB access on malformed cases.
🧪 Suggested guard
int in_channels = std::get<1>(test_case);
int out_channels = std::get<2>(test_case);
+
+ if (input_shape.empty() || input_shape.back() != static_cast<size_t>(in_channels)) {
+ std::cout << "[LinearTest] Invalid test case: last dim != in_channels" << std::endl;
+ return false;
+ }
@@
int in_channels = std::get<1>(test_case);
int out_channels = std::get<2>(test_case);
+
+ if (input_shape.empty() || input_shape.back() != static_cast<size_t>(in_channels)) {
+ std::cout << "[LinearWithBiasTest] Invalid test case: last dim != in_channels" << std::endl;
+ return false;
+ }Also applies to: 93-163
🤖 Prompt for AI Agents
In `@tests/ascend/AscendLinearKernelTest.hpp` around lines 24 - 90, Add a guard at
the start of LinearFloat16Test’s per-case loop to validate input_shape is
non-empty and that input_shape.back() == in_channels (and that in_channels>0 and
out_channels>0) to prevent out-of-bounds reads when computing the CPU reference;
if the check fails, log a clear message (using the existing "[LinearTest]"
prefix) and return false (or skip the case as desired) so malformed test cases
do not cause invalid memory access in the loops that use x_ptr and w_ptr.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@mllm/backends/ascend/ops/AscendLinearOp.cpp`:
- Around line 103-167: The Setup and Execute failure paths leak resources:
ensure the ATB operation and any allocated workspace block are cleaned up before
calling MLLM_ERROR_EXIT; specifically, when st != atb::NO_ERROR after op->Setup
or op->Execute, call atb::DestroyOperation(op) and if workspace_block_id != -1
call getAscendMemoryManager().freeBlock(workspace_block_id) (and null out
workspace if needed) before invoking MLLM_ERROR_EXIT — or wrap op and workspace
allocation in an RAII/scope-guard so atb::DestroyOperation(op) and
mem_mgr.freeBlock(...) run on all exit paths.
In `@mllm/backends/ascend/ops/AscendRMSNormOp.cpp`:
- Around line 31-43: The fallback to weight_ needs explicit validation: before
calling fillAtbTensor(...) validate that weight (the variable resolved in the
snippet as weight and weight_for_atb) is initialized/non-empty when
inputs.size()==1, that its dtype matches x.dtype(), and that its length equals
the expected size (typically x.size(x.dim()-1) or the last dimension length) —
if any check fails, return the same NYI/error path used for x/y mismatches;
perform these checks immediately after resolving weight and before using
weight_for_atb.
- Around line 1-18: In forward(), add validation for the weight tensor whenever
it is used (both when inputs.size() == 1 using member weight_ and when
inputs.size() == 2 using the second input): ensure the weight is 1D, its length
equals x.shape().back() (the last dimension of input x), and its dtype equals
x.dtype(); return or error if these checks fail. Replace the raw atb::Operation*
op with an RAII wrapper (e.g., std::unique_ptr<atb::Operation, custom_deleter>
or a small scope-guard class) so the operation is always destroyed via its
destroy function on all exit paths (including where MLLM_ERROR_EXIT is called)
to guarantee proper cleanup in forward().
In `@mllm/backends/ascend/ops/AscendSoftmaxOp.cpp`:
- Around line 94-107: The code reads workspaceSize (uint64_t) from op->Setup
then passes it to mem_mgr.allocateBlock as a uint32_t, risking truncation for
sizes > UINT32_MAX; add an explicit guard after op->Setup to check if
workspaceSize > UINT32_MAX and handle it (return/log an error via
MLLM_ERROR_EXIT or switch to a 64-bit allocation path), avoiding calling
mem_mgr.allocateBlock/getBlockPtr with a truncated size and leaving
workspace_block_id invalid; reference symbols: workspaceSize, op->Setup,
allocateBlock, getBlockPtr, workspace_block_id.
In `@tests/ascend/AscendSoftmaxKernelTest.hpp`:
- Around line 34-47: The code computes pos_axis and then indexes shape[pos_axis]
without validating axis or ndim; add a guard at the top of the block (before
computing pos_axis or accessing shape) to ensure ndim > 0 and axis is within
[-ndim, ndim-1], and fail fast if not (e.g., assert/EXPECT/throw
std::invalid_argument) so that variables pos_axis, outer_size, and axis_size are
only computed when shape[pos_axis] is safe to access.
🧹 Nitpick comments (2)
tests/ascend/AscendSoftmaxKernelTest.hpp (1)
67-75: Reuseexp_valsto avoid per-slice allocations.
Allocatingexp_valsinside the inner loop adds avoidable overhead for large shapes. Reuse a single buffer per (shape, axis).♻️ Suggested refactor
- for (size_t outer = 0; outer < outer_size; ++outer) { + std::vector<float> exp_vals(axis_size); + for (size_t outer = 0; outer < outer_size; ++outer) { for (size_t inner = 0; inner < inner_size; ++inner) { auto idx_at = [&](size_t i) -> size_t { return (outer * axis_size + i) * inner_size + inner; }; @@ - std::vector<float> exp_vals(axis_size); for (size_t i = 0; i < axis_size; ++i) { size_t idx = idx_at(i); float val = MLLM_FP16_TO_FP32(x_ptr[idx]); exp_vals[i] = std::exp(val - max_val); sum_exp += exp_vals[i]; }As per coding guidelines, Avoid unnecessary object creation in loops or hot paths.
mllm/backends/ascend/ops/AscendRMSNormOp.cpp (1)
49-104: Consider RAII guards for ATB op/workspace cleanup on early exit.If
MLLM_ERROR_EXITbehavior ever changes from process-exit to exception/return,opand workspace could leak. A small RAII guard keeps cleanup correct and future-proof.♻️ Suggested RAII pattern
- atb::Operation* op = nullptr; - auto st = atb::CreateOperation(rmsNormParam, &op); + atb::Operation* op = nullptr; + auto st = atb::CreateOperation(rmsNormParam, &op); if (st != atb::NO_ERROR || op == nullptr) { MLLM_ERROR_EXIT(ExitCode::kAscendError, "ATB CreateOperation(RMS_NORM) failed, status={}", static_cast<int>(st)); } + auto op_guard = std::unique_ptr<atb::Operation, decltype(&atb::DestroyOperation)>(op, &atb::DestroyOperation); atb::Context* atb_ctx = getGlobalAtbContext(); @@ - void* workspace = nullptr; - int workspace_block_id = -1; + void* workspace = nullptr; + int workspace_block_id = -1; + auto& mem_mgr = getAscendMemoryManager(); + struct WorkspaceGuard { + AscendMemoryManager* mgr; + int id; + ~WorkspaceGuard() { if (mgr && id != -1) mgr->freeBlock(id); } + } ws_guard{&mem_mgr, -1}; if (workspaceSize > 0) { - auto& mem_mgr = getAscendMemoryManager(); mem_mgr.allocateBlock(static_cast<uint32_t>(workspaceSize), workspace_block_id); mem_mgr.getBlockPtr(workspace_block_id, workspace); + ws_guard.id = workspace_block_id; } @@ - if (workspace_block_id != -1) { - auto& mem_mgr = getAscendMemoryManager(); - mem_mgr.freeBlock(workspace_block_id); - } - - atb::DestroyOperation(op); + // op and workspace are released by guardsAs per coding guidelines, Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
| atb::Operation* op = nullptr; | ||
| auto st = atb::CreateOperation(linearParam, &op); | ||
| if (st != atb::NO_ERROR || op == nullptr) { | ||
| MLLM_ERROR_EXIT(ExitCode::kAscendError, "ATB CreateOperation(Linear) failed, status={}", static_cast<int>(st)); | ||
| } | ||
|
|
||
| atb::Context* atb_ctx = getGlobalAtbContext(); | ||
|
|
||
| atb::Tensor atb_x; | ||
| atb::Tensor atb_weight; | ||
| atb::Tensor atb_y; | ||
| atb::Tensor atb_bias; | ||
|
|
||
| fillAtbTensor(x, atb_x); | ||
| fillAtbTensor(*weight_ptr, atb_weight); | ||
| fillAtbTensor(y, atb_y); | ||
|
|
||
| atb::SVector<atb::Tensor> inTensors; | ||
| atb::SVector<atb::Tensor> outTensors; | ||
| inTensors.push_back(atb_x); | ||
| inTensors.push_back(atb_weight); | ||
|
|
||
| if (bias_ptr != nullptr) { | ||
| fillAtbTensor(*bias_ptr, atb_bias); | ||
| inTensors.push_back(atb_bias); | ||
| } | ||
|
|
||
| outTensors.push_back(atb_y); | ||
|
|
||
| atb::VariantPack vp; | ||
| vp.inTensors = inTensors; | ||
| vp.outTensors = outTensors; | ||
|
|
||
| uint64_t workspaceSize = 0; | ||
| st = op->Setup(vp, workspaceSize, atb_ctx); | ||
| if (st != atb::NO_ERROR) { | ||
| MLLM_ERROR_EXIT(ExitCode::kAscendError, "ATB LinearOp Setup failed, status={}", static_cast<int>(st)); | ||
| } | ||
|
|
||
| void* workspace = nullptr; | ||
| int workspace_block_id = -1; | ||
| if (workspaceSize > 0) { | ||
| auto& mem_mgr = getAscendMemoryManager(); | ||
| mem_mgr.allocateBlock(static_cast<uint32_t>(workspaceSize), workspace_block_id); | ||
| mem_mgr.getBlockPtr(workspace_block_id, workspace); | ||
| } | ||
|
|
||
| { | ||
| ASCEND_TIME_SCOPE("AscendLinearOp::forward"); | ||
| st = op->Execute(vp, reinterpret_cast<uint8_t*>(workspace), workspaceSize, atb_ctx); | ||
| } | ||
|
|
||
| if (st != atb::NO_ERROR) { | ||
| MLLM_ERROR_EXIT(ExitCode::kAscendError, "ATB LinearOp Execute failed, status={}", static_cast<int>(st)); | ||
| } | ||
|
|
||
| syncGlobalAtbStream(); | ||
|
|
||
| if (workspace_block_id != -1) { | ||
| auto& mem_mgr = getAscendMemoryManager(); | ||
| mem_mgr.freeBlock(workspace_block_id); | ||
| } | ||
|
|
||
| atb::DestroyOperation(op); | ||
| } |
There was a problem hiding this comment.
Resource leak on error paths: ATB operation and workspace not cleaned up.
If op->Setup fails (line 138-140) or op->Execute fails (line 155-157), MLLM_ERROR_EXIT is called without first destroying the ATB operation or freeing the workspace memory. This leaks resources on error paths.
🛡️ Suggested fix using cleanup before exit
st = op->Setup(vp, workspaceSize, atb_ctx);
if (st != atb::NO_ERROR) {
+ atb::DestroyOperation(op);
MLLM_ERROR_EXIT(ExitCode::kAscendError, "ATB LinearOp Setup failed, status={}", static_cast<int>(st));
}
void* workspace = nullptr;
int workspace_block_id = -1;
if (workspaceSize > 0) {
auto& mem_mgr = getAscendMemoryManager();
mem_mgr.allocateBlock(static_cast<uint32_t>(workspaceSize), workspace_block_id);
mem_mgr.getBlockPtr(workspace_block_id, workspace);
}
{
ASCEND_TIME_SCOPE("AscendLinearOp::forward");
st = op->Execute(vp, reinterpret_cast<uint8_t*>(workspace), workspaceSize, atb_ctx);
}
if (st != atb::NO_ERROR) {
+ if (workspace_block_id != -1) {
+ getAscendMemoryManager().freeBlock(workspace_block_id);
+ }
+ atb::DestroyOperation(op);
MLLM_ERROR_EXIT(ExitCode::kAscendError, "ATB LinearOp Execute failed, status={}", static_cast<int>(st));
}Alternatively, consider using an RAII wrapper or a scope guard to ensure cleanup on all exit paths.
As per coding guidelines: Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
🤖 Prompt for AI Agents
In `@mllm/backends/ascend/ops/AscendLinearOp.cpp` around lines 103 - 167, The
Setup and Execute failure paths leak resources: ensure the ATB operation and any
allocated workspace block are cleaned up before calling MLLM_ERROR_EXIT;
specifically, when st != atb::NO_ERROR after op->Setup or op->Execute, call
atb::DestroyOperation(op) and if workspace_block_id != -1 call
getAscendMemoryManager().freeBlock(workspace_block_id) (and null out workspace
if needed) before invoking MLLM_ERROR_EXIT — or wrap op and workspace allocation
in an RAII/scope-guard so atb::DestroyOperation(op) and mem_mgr.freeBlock(...)
run on all exit paths.
| // Copyright (c) MLLM Team. | ||
| // Licensed under the MIT License. | ||
|
|
||
| #include "mllm/backends/ascend/ops/AscendRMSNormOp.hpp" | ||
|
|
||
| #include <acl/acl.h> | ||
| #include <iostream> | ||
| #include <atb/atb_infer.h> | ||
| #include <atb/types.h> | ||
| #include <atb/utils.h> | ||
| #include <atb/infer_op_params.h> | ||
|
|
||
| #include "mllm/utils/Common.hpp" | ||
| #include "mllm/core/DataTypes.hpp" | ||
| #include "mllm/core/Tensor.hpp" | ||
| #include "mllm/backends/ascend/memory/AscendMemoryManager.hpp" | ||
| #include "mllm/backends/ascend/AscendCommon.hpp" | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
from pathlib import Path
path = Path("mllm/backends/ascend/ops/AscendRMSNormOp.cpp")
data = path.read_bytes()
errors = []
if data.startswith(b"\xef\xbb\xbf"):
errors.append("UTF-8 BOM detected")
bad = [b for b in data if (b < 0x20 and b not in (0x09, 0x0A)) or (0x7F <= b <= 0x9F)]
if bad:
errors.append(f"Disallowed control codes: {sorted(set(bad))}")
if b"\r" in data:
errors.append("CR characters detected (not LF-only)")
lines = data.split(b"\n")
tw = [i + 1 for i, l in enumerate(lines) if l.rstrip(b" \t") != l]
if tw:
errors.append(f"Trailing whitespace on lines: {tw[:10]}")
print("OK" if not errors else "FAIL: " + "; ".join(errors))
PYRepository: UbiquitousLearning/mllm
Length of output: 69
🏁 Script executed:
cat -n mllm/backends/ascend/ops/AscendRMSNormOp.cppRepository: UbiquitousLearning/mllm
Length of output: 4094
Encoding compliant; add weight tensor validation and use RAII for operation cleanup.
Encoding and whitespace constraints verified as compliant.
However, the forward() method has two issues:
-
Missing weight tensor validation: When
inputs.size() == 1, the code usesweight_member variable without validating its shape or dtype compatibility with inputx. Similarly, wheninputs.size() == 2, the weight tensor is not validated. Per guidelines, inputs to critical functions must be validated. Add checks for weight shape (must be 1D and match x's last dimension) and dtype (must match x's dtype). -
Resource cleanup not following RAII: The
atb::Operation* opis created at line 50 but only destroyed at line 103 in success paths. IfMLLM_ERROR_EXITis called at lines 52, 78, or 93, the operation is not properly destroyed. Wrap the operation in a smart pointer (e.g., custom deleter or RAII wrapper) to ensure cleanup even during early exit or exceptions.
🧰 Tools
🪛 Clang (14.0.6)
[error] 4-4: 'mllm/backends/ascend/ops/AscendRMSNormOp.hpp' file not found
(clang-diagnostic-error)
🤖 Prompt for AI Agents
In `@mllm/backends/ascend/ops/AscendRMSNormOp.cpp` around lines 1 - 18, In
forward(), add validation for the weight tensor whenever it is used (both when
inputs.size() == 1 using member weight_ and when inputs.size() == 2 using the
second input): ensure the weight is 1D, its length equals x.shape().back() (the
last dimension of input x), and its dtype equals x.dtype(); return or error if
these checks fail. Replace the raw atb::Operation* op with an RAII wrapper
(e.g., std::unique_ptr<atb::Operation, custom_deleter> or a small scope-guard
class) so the operation is always destroyed via its destroy function on all exit
paths (including where MLLM_ERROR_EXIT is called) to guarantee proper cleanup in
forward().
| const auto& x = inputs[0]; | ||
| const auto& weight = (inputs.size() == 2) ? inputs[1] : weight_; | ||
| auto& y = outputs[0]; | ||
|
|
||
| const Tensor& weight_for_atb = weight; | ||
|
|
||
| if (x.dtype() != y.dtype()) { | ||
| NYI("AscendRMSNormOp currently requires x/y have same dtype"); | ||
| } | ||
| if (x.numel() != y.numel()) { | ||
| NYI("AscendRMSNormOp requires x/y have same numel"); | ||
| } | ||
|
|
There was a problem hiding this comment.
Add explicit validation for weight when using the fallback path.
Line 32 falls back to weight_ when only one input is provided, but there’s no guard that weight_ is initialized and compatible with x (dtype and expected length). This can lead to ATB failures or incorrect results. Please add checks before fillAtbTensor(...) to ensure the weight is present and matches the expected size (typically the last dimension of x) and dtype.
As per coding guidelines, Validate inputs for public APIs and critical internal functions.
🤖 Prompt for AI Agents
In `@mllm/backends/ascend/ops/AscendRMSNormOp.cpp` around lines 31 - 43, The
fallback to weight_ needs explicit validation: before calling fillAtbTensor(...)
validate that weight (the variable resolved in the snippet as weight and
weight_for_atb) is initialized/non-empty when inputs.size()==1, that its dtype
matches x.dtype(), and that its length equals the expected size (typically
x.size(x.dim()-1) or the last dimension length) — if any check fails, return the
same NYI/error path used for x/y mismatches; perform these checks immediately
after resolving weight and before using weight_for_atb.
| uint64_t workspaceSize = 0; | ||
| st = op->Setup(vp, workspaceSize, atb_ctx); | ||
| if (st != atb::NO_ERROR) { | ||
| MLLM_ERROR_EXIT(ExitCode::kAscendError, | ||
| "ATB SoftmaxOp Setup failed, status={}", | ||
| static_cast<int>(st)); | ||
| } | ||
|
|
||
| void* workspace = nullptr; | ||
| int workspace_block_id = -1; | ||
| if (workspaceSize > 0) { | ||
| auto& mem_mgr = getAscendMemoryManager(); | ||
| mem_mgr.allocateBlock(static_cast<uint32_t>(workspaceSize), workspace_block_id); | ||
| mem_mgr.getBlockPtr(workspace_block_id, workspace); |
There was a problem hiding this comment.
Guard against workspace size truncation.
workspaceSize is uint64_t, but allocation truncates to uint32_t. If ATB returns >4 GB, you’ll under-allocate and risk overwrite when Execute uses the full size. Add an explicit guard (or upgrade the allocator to 64-bit).
🛠️ Proposed fix
`#include` <atb/types.h>
`#include` <atb/utils.h>
`#include` <atb/infer_op_params.h>
+#include <limits>
@@
if (workspaceSize > 0) {
+ if (workspaceSize > std::numeric_limits<uint32_t>::max()) {
+ MLLM_ERROR_EXIT(ExitCode::kAscendError,
+ "AscendSoftmaxOp: workspaceSize {} exceeds 32-bit limit",
+ workspaceSize);
+ }
auto& mem_mgr = getAscendMemoryManager();
mem_mgr.allocateBlock(static_cast<uint32_t>(workspaceSize), workspace_block_id);
mem_mgr.getBlockPtr(workspace_block_id, workspace);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| uint64_t workspaceSize = 0; | |
| st = op->Setup(vp, workspaceSize, atb_ctx); | |
| if (st != atb::NO_ERROR) { | |
| MLLM_ERROR_EXIT(ExitCode::kAscendError, | |
| "ATB SoftmaxOp Setup failed, status={}", | |
| static_cast<int>(st)); | |
| } | |
| void* workspace = nullptr; | |
| int workspace_block_id = -1; | |
| if (workspaceSize > 0) { | |
| auto& mem_mgr = getAscendMemoryManager(); | |
| mem_mgr.allocateBlock(static_cast<uint32_t>(workspaceSize), workspace_block_id); | |
| mem_mgr.getBlockPtr(workspace_block_id, workspace); | |
| uint64_t workspaceSize = 0; | |
| st = op->Setup(vp, workspaceSize, atb_ctx); | |
| if (st != atb::NO_ERROR) { | |
| MLLM_ERROR_EXIT(ExitCode::kAscendError, | |
| "ATB SoftmaxOp Setup failed, status={}", | |
| static_cast<int>(st)); | |
| } | |
| void* workspace = nullptr; | |
| int workspace_block_id = -1; | |
| if (workspaceSize > 0) { | |
| if (workspaceSize > std::numeric_limits<uint32_t>::max()) { | |
| MLLM_ERROR_EXIT(ExitCode::kAscendError, | |
| "AscendSoftmaxOp: workspaceSize {} exceeds 32-bit limit", | |
| workspaceSize); | |
| } | |
| auto& mem_mgr = getAscendMemoryManager(); | |
| mem_mgr.allocateBlock(static_cast<uint32_t>(workspaceSize), workspace_block_id); | |
| mem_mgr.getBlockPtr(workspace_block_id, workspace); |
🤖 Prompt for AI Agents
In `@mllm/backends/ascend/ops/AscendSoftmaxOp.cpp` around lines 94 - 107, The code
reads workspaceSize (uint64_t) from op->Setup then passes it to
mem_mgr.allocateBlock as a uint32_t, risking truncation for sizes > UINT32_MAX;
add an explicit guard after op->Setup to check if workspaceSize > UINT32_MAX and
handle it (return/log an error via MLLM_ERROR_EXIT or switch to a 64-bit
allocation path), avoiding calling mem_mgr.allocateBlock/getBlockPtr with a
truncated size and leaving workspace_block_id invalid; reference symbols:
workspaceSize, op->Setup, allocateBlock, getBlockPtr, workspace_block_id.
| // Convert axis to positive index | ||
| int ndim = static_cast<int>(shape.size()); | ||
| int pos_axis = axis; | ||
| if (pos_axis < 0) { | ||
| pos_axis = ndim + pos_axis; | ||
| } | ||
|
|
||
| size_t outer_size = 1; | ||
| for (int i = 0; i < pos_axis; ++i) { | ||
| outer_size *= shape[i]; | ||
| } | ||
|
|
||
| size_t axis_size = shape[pos_axis]; | ||
|
|
There was a problem hiding this comment.
Validate axis range before indexing shape[pos_axis].
If axis is out of range (e.g., axis >= ndim or axis < -ndim) or ndim == 0, shape[pos_axis] is OOB and the test will crash. Add a guard and fail fast.
🛠️ Proposed fix
int ndim = static_cast<int>(shape.size());
int pos_axis = axis;
if (pos_axis < 0) {
pos_axis = ndim + pos_axis;
}
+ if (ndim == 0 || pos_axis < 0 || pos_axis >= ndim) {
+ return false;
+ }As per coding guidelines, Validate inputs for public APIs and critical internal functions.
🤖 Prompt for AI Agents
In `@tests/ascend/AscendSoftmaxKernelTest.hpp` around lines 34 - 47, The code
computes pos_axis and then indexes shape[pos_axis] without validating axis or
ndim; add a guard at the top of the block (before computing pos_axis or
accessing shape) to ensure ndim > 0 and axis is within [-ndim, ndim-1], and fail
fast if not (e.g., assert/EXPECT/throw std::invalid_argument) so that variables
pos_axis, outer_size, and axis_size are only computed when shape[pos_axis] is
safe to access.
Please check Guidelines for Contributing.
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.