Use vectorcall for all-positional-argument calls #5896

swolchok · 2025-11-14T06:50:54Z

If a handle or object is called with only positional arguments, it is straightforward to use PyObject_Vectorcall instead of PyObject_CallObject.

Benchmarked by adding a trivial function to pybind11_benchmark:

    m.def("call_func_with_int", [](py::object func) {
      return func(py::cast(1));
    });

and then running python -m timeit --setup 'from pybind11_benchmark import call_func_with_int; f = lambda x: x + 1' 'call_func_with_int(f)'.

Before on M4 mac: 57.6 nsec per loop
After on M4 mac: 48.4 nsec per loop

For comparison, the included collatz benchmark takes 33.1 nsec per loop, just calling f(1) directly takes 17.8 nec per loop, and simply running pass takes 4.19 nsec per loop.

Suggested changelog entry:

Use vectorcall for simple C++-to-Python calls (only positional arguments, no tuple expansion).

If a handle or object is called with only positional arguments, it is straightforward to use PyObject_Vectorcall instead of PyObject_CallObject. Benchmarked by adding a trivial function to pybind11_benchmark: ``` m.def("call_func_with_int", [](py::object func) { return func(py::cast(1)); }); ``` and then running `python -m timeit --setup 'from pybind11_benchmark import call_func_with_int; f = lambda x: x + 1' 'call_func_with_int(f)'`. Before on M4 mac: 57.6 nsec per loop After on M4 mac: 48.4 nsec per loop For comparison, the included collatz benchmark takes 33.1 nsec per loop, just calling `f(1)` directly takes 17.8 nec per loop, and simply running `pass` takes 4.19 nsec per loop.

…1 and 14.

swolchok · 2025-11-14T22:08:58Z

include/pybind11/cast.h

+    // Disable warnings about useless comparisons when N == 0.
+    PYBIND11_WARNING_PUSH
+    PYBIND11_WARNING_DISABLE_GCC("-Wtype-limits")
+    PYBIND11_WARNING_DISABLE_INTEL(186)


not sure why suppressing the icc warning didn't work :(

swolchok · 2025-11-17T18:25:28Z

FWIW, I attempted to extend this to use vectorcall for the unpacking/kwargs/etc. cases, but my naive/straightforward attempt ended up adding too much cost for kwarg handling to end up actually improving performance. (the tuple unpacking case in particular is bottlenecked on args_proxy having to use slow PyIter-based iteration instead of PyTuple_GET_ITEM as currently written, so we couldn't help there either even though it doesn't use kwargs.)

swolchok added 6 commits November 13, 2025 15:51

Make simple_collector non-copyable and non-movable

ddcc5d2

Restore PyObject_CallObject compatibility path for old Python versions

9120b9c

Fix the fix for Python 3.8. Allow moving of simple_collector for C++1…

d5a88c3

…1 and 14.

suppress -Wtype-limits

7e66762

suppress intel version of -Wtype-limits

f6aaf68

swolchok commented Nov 14, 2025

View reviewed changes

swolchok added 3 commits November 14, 2025 14:12

Try putting the suppression at class level

7529fbe

try suppressing for each loop

6c1b2f7

Suppress for NVCC as well

278f081

gentlegiantJGC mentioned this pull request Nov 15, 2025

Don't allow keep_alive or call_guard on properties #5533

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use vectorcall for all-positional-argument calls #5896

Use vectorcall for all-positional-argument calls #5896

Uh oh!

swolchok commented Nov 14, 2025

Uh oh!

swolchok Nov 14, 2025

Uh oh!

swolchok commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Use vectorcall for all-positional-argument calls #5896

Are you sure you want to change the base?

Use vectorcall for all-positional-argument calls #5896

Uh oh!

Conversation

swolchok commented Nov 14, 2025

Suggested changelog entry:

Uh oh!

swolchok Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

swolchok commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant