Added `multi_tensor_copier` package by RmSchaffert · Pull Request #13 · NVIDIA/ACCV-Lab

RmSchaffert · 2026-03-23T07:08:13Z

Description

Added the Multi-Tensor Copier functionality as well as the corresponding documentation & example & simple evaluation

Type of Change

Please select (at least one):

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation / examples / tutorials / demos
Supporting functionality change (fix or feature in documentation generation, helper scripts, ...)
Refactoring / internal change
Other (please describe):

Testing

Checklist for testing:

Tests added or updated if/as needed
Repository test runner executed: scripts/run_tests.sh

Documentation, Examples, Tutorials, Demos

Checklist for documentation:

User-facing documentation updated if/as needed (including API docs)
Examples / tutorials / demos updated or added (if relevant)
Limitations and constraints documented (if relevant)
Performance documented (if relevant)
Documentation building successful & checks outlined in the Documentation Checks section of the Contribution Guide are performed

Code Quality

Checklist for dependencies:

Dependencies updated in the relevant pyproject.toml if/as needed
Code formatted according to the Code Formatting Guide

Related Issues / Context

If applicable, link related issues, discussions etc.

DCO / Sign-Off

Please refer to the section on Signing Your Work & Developer Certificate of Origin (DCO)
in the Contribution Guide before submitting your contribution.

References

For additional details, please refer to the Contribution Guide.
The following guides are available (referenced in the Contribution Guide for further details):

Please also refer to the summary checklist in the Contribution Guide,
which is a guideline for what to consider when submitting your contribution and covers the same topics as the checklists above.

xupinjie · 2026-03-23T08:03:21Z

packages/multi_tensor_copier/accvlab/multi_tensor_copier/csrc/multi_tensor_copier.cpp

+    // Heuristic thresholds: only pack "small" tensors.
+    constexpr int64_t kPackMaxBytesPerTensor = 256 * 1024;  // 256KB


Do we need to limit the total size of both packet and tensor?
Such as 32MB limit for a packet.

Good point.

For the per-tensor size, the threshold is to focus on small tensors (which benefit the most from the packing on the one hand, and do not add too much copying overhead on the CPU on the other hand).

A very large total size may lead to problems such as allocation failure or larger allocation overhead. I adjusted the implementation and now, multiple chunks are allocated if needed (32MB by default, configurable). If there is more than 32 MB needed, multiple chunks will be used.

xupinjie · 2026-03-23T08:15:13Z

packages/multi_tensor_copier/docs/evaluation.rst

+Results
+-------
+
+.. list-table:: Runtime and Speedup (mean +/- std over 10 runs)


can we add a performance compare with pytorch nested tensor?

The use-cases are different: With the copier, we can have more general input structures (e.g. dicts/lists/tuples containing tensors (typical for meta-data); individual inputs can have different dtypes and be on different devices).

However, the underlying implementation has similarities: For a nested tensor, a single memory buffer is also used. I made a small evaluation and compared the copy runtime to both an already created nested tensor, and to copying by creating a nested tensor from a list & copy the tensor (without splitting it back into a list). The results are (copy of 500 tensors, each has 32 - 1024 entries, use of pinned memory when creating the nested tensor):
multi_tensor_copier: 0.388 ms
nested tensor (from list): 1.071 ms
nested tensor (pre-built): 0.158 ms

So, if lists are used, the multi_tensor_copier copy is faster, but using a nested tensor directly is even faster.
I would say that this is expected as a nested tensor is already in a format similar to what we use internally (and have to convert to and from) for the copier.

xupinjie

Thank you, i add some comment in the code.

Signed-off-by: Roman Schaffert <rschaffert@nvidia.com>

RmSchaffert · 2026-03-24T06:00:23Z

Thank you for the insightful comments @xupinjie ! I prepared a new version. Apart from the changes related to your comments, I also reworked how streams are handled. Previously, some copy directions were not synchronized properly and the way multiple stream were used was not meaningful (as there are only copy operations involved).

xupinjie · 2026-03-24T06:12:56Z

That is great! Merged.

RmSchaffert requested a review from xupinjie March 23, 2026 07:08

xupinjie reviewed Mar 23, 2026

View reviewed changes

RmSchaffert force-pushed the multi_tensor_copier branch from 4f5ed1f to 3aab06f Compare March 24, 2026 03:05

Added multi_tensor_copier package

16b8213

Signed-off-by: Roman Schaffert <rschaffert@nvidia.com>

RmSchaffert force-pushed the multi_tensor_copier branch from 3aab06f to 16b8213 Compare March 24, 2026 05:25

xupinjie approved these changes Mar 24, 2026

View reviewed changes

xupinjie merged commit 218a821 into NVIDIA:main Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added `multi_tensor_copier` package#13

Added `multi_tensor_copier` package#13
xupinjie merged 1 commit intoNVIDIA:mainfrom
RmSchaffert:multi_tensor_copier

RmSchaffert commented Mar 23, 2026 •

edited

Loading

Uh oh!

xupinjie Mar 23, 2026 •

edited

Loading

Uh oh!

RmSchaffert Mar 24, 2026

Uh oh!

xupinjie Mar 23, 2026

Uh oh!

RmSchaffert Mar 24, 2026 •

edited

Loading

Uh oh!

xupinjie left a comment •

edited

Loading

Uh oh!

RmSchaffert commented Mar 24, 2026

Uh oh!

xupinjie commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// Heuristic thresholds: only pack "small" tensors.
		constexpr int64_t kPackMaxBytesPerTensor = 256 * 1024; // 256KB

Conversation

RmSchaffert commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Documentation, Examples, Tutorials, Demos

Code Quality

Related Issues / Context

DCO / Sign-Off

References

Uh oh!

xupinjie Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RmSchaffert Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

xupinjie Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

RmSchaffert Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xupinjie left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RmSchaffert commented Mar 24, 2026

Uh oh!

xupinjie commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RmSchaffert commented Mar 23, 2026 •

edited

Loading

xupinjie Mar 23, 2026 •

edited

Loading

RmSchaffert Mar 24, 2026 •

edited

Loading

xupinjie left a comment •

edited

Loading