Sparse test by XucSh · Pull Request #7 · XucSh/Mooncake

XucSh · 2026-01-17T07:12:49Z

Description

Type of Change

Types
- Bug fix
- New feature
  - Transfer Engine
  - Mooncake Store
  - Mooncake EP
  - Integration
  - P2P Store
  - Python Wheel
- Breaking change
- CI/CD
- Documentation update
- Other

How Has This Been Tested?

Checklist

I have performed a self-review of my own code.
I have updated the documentation.
I have added tests to prove my changes are effective.

Summary by CodeRabbit

Bug Fixes
- Updated tensor validation logic and boundary condition checking to improve tensor data processing robustness in edge cases.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

coderabbitai · 2026-01-17T07:13:04Z

📝 Walkthrough

Walkthrough

This change modifies tensor validation logic in store_py.cpp by relaxing boundary condition checks from <= to < for metadata size comparisons and making PyTensorInfo::valid() unconditionally return true instead of checking tensor size. This alters error handling and control flow around tensor data boundary validation.

Changes

Cohort / File(s)	Summary
Tensor Validation Logic `mooncake-integration/store/store_py.cpp`	Modified `PyTensorInfo::valid()` to always return true; changed boundary condition checks from `<=` to `<` for `sizeof(TensorMetadata)` comparisons in `buffer_to_tensor()`, relaxing validation thresholds; removed tensor size as validity gate while preserving UNKNOWN dtype error handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A tensor dances through the gate,
Where boundaries now are loose, not strait!
Valid() claims truth absolute and clear,
Metadata checks relax their sternest fear—
Size and shape in harmony appear! 🐇

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title 'Sparse test' does not match the actual changes, which are modifications to tensor validity checks and boundary conditions in store_py.cpp, not testing-related code.	Update the title to reflect the actual changes, such as 'Fix tensor boundary checks and validity logic in buffer_to_tensor' to accurately describe the modifications.
Description check	⚠️ Warning	The PR description is entirely empty—all template sections lack content, including the description, type of change selection, testing approach, and checklist completion.	Provide a clear description of changes, select the appropriate change type, explain how changes were tested, and complete the developer checklist items.
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

mooncake-integration/store/store_py.cpp (2)

18-25: Restore validity gating to prevent invalid tensors from being stored.

extract_tensor_info returns a zeroed struct on errors (unsupported dtype, invalid dims, exceptions). With valid() hardcoded to true, put_tensor_impl and batch paths will store invalid metadata and a null data pointer instead of failing fast.

🔧 Suggested fix

 struct PyTensorInfo {
     uintptr_t data_ptr;
     size_t tensor_size;
     TensorMetadata metadata;
+    bool ok{false};

     // Check validity
-    bool valid() const { return true; }
+    bool valid() const { return ok; }
 };

 PyTensorInfo extract_tensor_info(const py::object &tensor,
                                  const std::string &key_name = "") {
     PyTensorInfo info = {
         0,
         0,
         {},
     };
@@
         if (dtype_enum == TensorDtype::UNKNOWN) {
             LOG(ERROR) << "Unsupported tensor dtype"
                        << (key_name.empty() ? "" : " for " + key_name);
-            return {0, 0, {}};
+            return info;
         }
@@
         if (ndim > 4) {
             LOG(ERROR) << "Tensor has more than 4 dimensions: " << ndim;
-            return {0, 0, {}};
+            return info;
         }
@@
     } catch (const std::exception &e) {
         LOG(ERROR) << "Error extracting tensor info: " << e.what();
-        return {0, 0, {}};
+        return info;
     }
-
-    return info;
+    info.ok = true;
+    return info;
 }

94-116: Guard metadata-only buffers to prevent shape/data mismatches and silent failures.

When total_length == sizeof(TensorMetadata), the resulting tensor_size becomes 0. If the metadata declares a non-empty shape (e.g., shape = [10, 20]), the subsequent reshape at line 163 will fail silently—the exception is caught and only logged, causing the caller to receive pybind11::none() without clear indication that data is missing.

This should only be accepted for truly empty tensors: ndim == 0 or at least one shape dimension is 0.

Suggested validation

    TensorDtype dtype_enum = static_cast<TensorDtype>(metadata.dtype);
    size_t tensor_size = total_length - sizeof(TensorMetadata);

    if (dtype_enum == TensorDtype::UNKNOWN) {
        if (take_ownership) {
            delete[] exported_data;
        }
        LOG(ERROR) << "Unknown tensor dtype";
        return pybind11::none();
    }
+
+    // Reject metadata-only buffers unless the tensor is truly empty
+    if (tensor_size == 0 && metadata.ndim > 0) {
+        bool empty_expected = false;
+        for (int i = 0; i < metadata.ndim; ++i) {
+            if (metadata.shape[i] == 0) {
+                empty_expected = true;
+                break;
+            }
+        }
+        if (!empty_expected) {
+            if (take_ownership) {
+                delete[] exported_data;
+            }
+            LOG(ERROR)
+                << "Invalid tensor metadata: zero data for non-empty tensor";
+            return pybind11::none();
+        }
+    }

Also applies to: 129-138

fix for sparse tensor

f65aaa6

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

github-actions bot added the run-ci label Jan 17, 2026

coderabbitai bot reviewed Jan 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse test#7

Sparse test#7
XucSh wants to merge 1 commit intomainfrom
sparse

XucSh commented Jan 17, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 17, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

XucSh commented Jan 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

How Has This Been Tested?

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

XucSh commented Jan 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 17, 2026 •

edited

Loading