Skip to content

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Sep 14, 2025

Description

Closes #5786.

This PR fixes two independent bugs in the smart_holder machinery, both exposed by new tests involving virtual inheritance.

1. Fix in pybind11/detail/type_caster_base.h

The return-value caster for std::shared_ptr<T> was incorrectly using src.get() (the most-derived pointer) instead of st.first (the adjusted subobject pointer appropriate for the registered tinfo).
On MSVC with virtual inheritance, this mismatch led to passing an Animal* (see below) to register_instance when binding Tiger, which caused the virtual-base offset walker to dereference a bogus vbptr.
This is the bug reported in #5786. Using st.first ensures the correct subobject is registered.

2. Fix in pybind11/detail/struct_smart_holder.h

smart_holder::from_unique_ptr was storing the subobject pointer as the owned resource in its control block. Under MSVC, destroying through that misaligned pointer caused NX faults in the virtual destructor path.
The fix mirrors the shared_ptr path: always own the real T* object start with the deleter, and, if needed, create an aliasing shared_ptr<void> at the subobject pointer for registration/identity.

Tests

  • The original Animal–Cat–Tiger reproducer from [BUG]: Using smart_holder in virtual inheritance relationship error #5786 is kept as a targeted regression test. It fails reliably only on MSVC, which is why it is important to keep it.
  • A new Diamond virtual-inheritance test (test_class_sh_mi_thunks) was added. This was designed to also fail on Linux/Clang/GCC without the fix (not just on MSVC). Indeed, the diamond case fails across all ci.yml jobs and several tests-cibw.yml jobs pre-fix, and passes after fix (see comment).
  • The diamond test was trivially extended to exercise smart_holder_from_unique_ptr, which immediately exposed the second bug. That test produced 20 MSVC failures pre-fix (comment), and passes after the fix.
  • A pseudo-test test_virtual_base_at_offset_0 is included to make it explicit when a compiler/layout places the virtual base at offset 0. This isn’t a true skip, but a way to document in the pytest summary when the test can’t exercise the MI/VI code path. In practice, this skip has never been triggered in any GitHub Actions job.
  • Debug-only asserts remain in pybind11/detail/class.h. They were invaluable during debugging and experimentation, have no impact on non-debug builds, and may be useful again if MI/VI edge cases resurface.

Context

The changes here are in the same spirit as PR #4380, which fixed earlier oversights in smart_holder MI handling. As noted there, MI use cases are uncommon in downstream projects (and even discouraged in some organizations), so coverage has historically been thin. The new tests give us much better confidence that both shared_ptr and unique_ptr adoption paths are now robust under virtual and multiple inheritance.


Full ChatGPT conversation guiding the work on the fixes (very long and involved): https://chatgpt.com/share/68c7950a-61b8-8008-8678-53b96643ce7e

A lot of the credit for the fixes goes to ChatGPT 5 Pro

Suggested changelog entry:

Fixed two smart_holder bugs in shared_ptr and unique_ptr adoption with multiple/virtual inheritance:

```
/__w/pybind11/pybind11/tests/test_class_sh_mi_thunks.cpp:44:13: error: prefer using 'override' or (rarely) 'final' instead of 'virtual' [modernize-use-override,-warnings-as-errors]
   44 |     virtual ~Left() = default;
      |     ~~~~~~~ ^
      |                     override
/__w/pybind11/pybind11/tests/test_class_sh_mi_thunks.cpp:48:13: error: prefer using 'override' or (rarely) 'final' instead of 'virtual' [modernize-use-override,-warnings-as-errors]
   48 |     virtual ~Right() = default;
      |     ~~~~~~~ ^
      |                      override
```
@rwgk
Copy link
Collaborator Author

rwgk commented Sep 14, 2025

CI results @ commit 5f77da3:

66 failing, 2 skipped, 17 successful checks

CI: https://github.com/pybind/pybind11/actions/runs/17707681525?pr=5836

gh_pr_checks_2025-09-14+002918.txt

@rwgk
Copy link
Collaborator Author

rwgk commented Sep 14, 2025

CI results @ commit 3620ceb:

20 failing, 2 skipped, 63 successful checks

CI: https://github.com/pybind/pybind11/actions/runs/17713020523?pr=5836

gh_pr_checks_2025-09-14+091113.txt

rwgk added 10 commits September 14, 2025 09:16
ChatGPT:

* shared_ptr’s ctor can throw (control-block alloc). Using get() keeps unique_ptr owning the memory if that happens, so no leak.

* Only after the shared_ptr is successfully constructed do you release(), transferring ownership exactly once.
…ampoline code (which often uses the term "alias", too)
```
/__w/pybind11/pybind11/tests/test_class_sh_mi_thunks.cpp:67:5: error: 'auto ptr' can be declared as 'auto *ptr' [readability-qualified-auto,-warnings-as-errors]
   67 |     auto ptr = new Diamond;
      |     ^~~~
      |     auto *
```
@rwgk rwgk changed the title [WIP] Add diamond virtual-inheritance test case Fix smart_holder multiple/virtual inheritance bugs in shared_ptr and unique_ptr to-Python conversions Sep 15, 2025
@rwgk
Copy link
Collaborator Author

rwgk commented Sep 15, 2025

@henryiii This PR is ready for review.

@MannixYang If you get a chance to try this out, please report back.

@MannixYang
Copy link

@rwgk Thank you very much indeed. Yes, I will give it a try. It should be within the next two days.

@MannixYang
Copy link

@rwgk Great! This is very useful. My project can now run. Thank you again.

@rwgk
Copy link
Collaborator Author

rwgk commented Sep 15, 2025

@rwgk Great! This is very useful. My project can now run. Thank you again.

Thanks a lot for confirming!

@rwgk
Copy link
Collaborator Author

rwgk commented Sep 27, 2025

Hi @iwanders, is there a chance that you could help reviewing this PR? — I just remembered that you're familiar with the affected code, from your work on #4921 and #4924. The production code changes in this PR are small/surgical.

Copy link
Contributor

@iwanders iwanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @iwanders, is there a chance that you could help reviewing this PR?

👍 definitely, I did go through it and made two minor comments. The code around the handling of deleters looks good to me.

// Relinquish ownership only after successful construction of owner
(void) unq_ptr.release();

// Publish either the subobject alias (for identity/VI) or the full object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd elaborate a bit in this comment. This part feels a bit like magic to me why this is necessary and fixes a bug when this is used in msvc. I think I understand the problem to be that the shared pointer 'owner' must always point to the start of the entire object (and the deleter operates on that), but that the actual smart holder should 'hold' the subclass, which may not be at the start of the object itself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: commit 31886ec

int vbase_tag = 42; // ensure it's not empty
};

// Left/right add some weight to steer layout differences across compilers
Copy link
Contributor

@iwanders iwanders Sep 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this matters, but Left and Right are pretty similar a long long is guaranteed to be 64 bits, and the 7 long array is likely going to be padded to 8, which would make them the same size? To me the comment contradicts the code a bit in that sense. Maybe just having any attribute is enough, but if the sizes ought to be different I'd make it very clear, say char pad_l[4] and char pad_r[16].

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: commit ba89e1d

Here is what ChatGPT is saying:

Good catch — the original comment was misleading, since char[7] and long long can end up the same size after padding. The real intent is simply to make the virtual bases non-empty and, ideally, asymmetrically sized/aligned so compilers are more likely to give us non-zero subobject offsets. I’ve updated the code to use small but clearly asymmetric paddings (char[4] vs char[16], with optional alignment) and clarified the comment to reflect this. The test still works correctly even if a compiler chooses offset 0, since that case is explicitly logged via test_virtual_base_at_offset_0().


In Ralf's own words: I was just super happy when the GhatGPT-generated test triggered failures on both Linux and Windows [see comment from two weeks ago] and didn't look as closely at the test code as you did. However, now I pushed ChatGPT harder, to "really bend things out of shape", but it assured me that the new code is as good as we can make it.

I'll rerun the CI. When it's done I'll look for test_virtual_base_at_offset_0() skip messages again. (I think I did that before, although I forgot to log that here.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just super happy when the GhatGPT-generated test triggered failures on both Linux and Windows

Agreed, that is pretty impressive and definitely speeds things up. Thanks for sharing your ChatGPT conversation btw, I’ve bookmarked that to read through when I’m not on the road. I don’t have much experience using llms, so expect it’ll be educational for me.

Copy link
Collaborator Author

@rwgk rwgk Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just super happy when the GhatGPT-generated test triggered failures on both Linux and Windows

Agreed, that is pretty impressive and definitely speeds things up. Thanks for sharing your ChatGPT conversation btw, I’ve bookmarked that to read through when I’m not on the road. I don’t have much experience using llms, so expect it’ll be educational for me.

My time budget for pybind11 maintenance work has become super tiny. I think without ChatGPT I'd have to give up.

Thanks for your review, this was super helpful.

A couple minutes ago I replaced the pytest.skip() with an assert, to guarantee test quality in the future. It'd be great if you could take a final look and formally approve if it looks good to you.

```
       "D:\a\pybind11\pybind11\build\ALL_BUILD.vcxproj" (default target) (1) ->
       "D:\a\pybind11\pybind11\build\tests\pybind11_tests.vcxproj" (default target) (13) ->
       (ClCompile target) ->
         D:\a\pybind11\pybind11\tests\test_class_sh_mi_thunks.cpp(70,17): warning C4316: 'test_class_sh_mi_thunks::Diamond': object allocated on the heap may not be aligned 16 [D:\a\pybind11\pybind11\build\tests\pybind11_tests.vcxproj]
         D:\a\pybind11\pybind11\tests\test_class_sh_mi_thunks.cpp(80,43): warning C4316: 'test_class_sh_mi_thunks::Diamond': object allocated on the heap may not be aligned 16 [D:\a\pybind11\pybind11\build\tests\pybind11_tests.vcxproj]
         C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.44.35207\include\memory(2913,46): warning C4316: 'std::_Ref_count_obj2<_Ty>': object allocated on the heap may not be aligned 16 [D:\a\pybind11\pybind11\build\tests\pybind11_tests.vcxproj]
       C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.44.35207\include\memory(2913,46): warning C4316:         with [D:\a\pybind11\pybind11\build\tests\pybind11_tests.vcxproj]
       C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.44.35207\include\memory(2913,46): warning C4316:         [ [D:\a\pybind11\pybind11\build\tests\pybind11_tests.vcxproj]
       C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.44.35207\include\memory(2913,46): warning C4316:             _Ty=test_class_sh_mi_thunks::Diamond [D:\a\pybind11\pybind11\build\tests\pybind11_tests.vcxproj]
       C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.44.35207\include\memory(2913,46): warning C4316:         ] [D:\a\pybind11\pybind11\build\tests\pybind11_tests.vcxproj]
         D:\a\pybind11\pybind11\include\pybind11\detail\init.h(77,21): warning C4316: 'test_class_sh_mi_thunks::Diamond': object allocated on the heap may not be aligned 16 [D:\a\pybind11\pybind11\build\tests\pybind11_tests.vcxproj]
```

The warning came from alignas(16) making Diamond over-aligned, while regular new/make_shared aren’t guaranteed to return 16-byte aligned memory on MSVC (hence C4316). I’ve removed the explicit alignment and switched to asymmetric payload sizes (char[4] vs char[24]), which still nudges MI layout without relying on over-alignment. This keeps the test goal and eliminates the warning across all MSVC builds. If we ever want to stress over-alignment explicitly, we can add aligned operator new/delete under __cpp_aligned_new, but that’s more than we need here.
@rwgk
Copy link
Collaborator Author

rwgk commented Sep 29, 2025

Checking for test_virtual_base_at_offset_0() SKIPPED messages

CI workflow

https://github.com/pybind/pybind11/actions/runs/18080769590

$ grep test_class_sh_mi_thunks.py *.txt | wc -l
691
$ grep SKIPPED *.txt | wc -l
3404
$  grep test_class_sh_mi_thunks.py *.txt | grep SKIPPED | wc -l
0

CIBW workflow

https://github.com/pybind/pybind11/actions/runs/18080769566

$ grep test_class_sh_mi_thunks.py *.txt | wc -l
4
$ grep SKIPPED *.txt | wc -l
135
$ grep test_class_sh_mi_thunks.py *.txt | grep SKIPPED | wc -l
0

…set_0() and replace pytest.skip() with assert. Add helpful comment for future maintainers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: Using smart_holder in virtual inheritance relationship error
3 participants