-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8371637: allocateNativeInternal sometimes return incorrectly aligned memory #28235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…memory jdk.internal.foreign.SegmentFactories::allocateNativeInternal assumes that the underlying implementation of malloc aligns allocations on 16 byte boundaries for 64 bit platforms, and 8 byte boundaries on 32 bit platforms. So for any allocation where the requested alignment is less than or equal to this default alignment it makes no adjustment. However, this assumption does not hold for all allocators. Specifically jemallc, used by libc on FreeBSD will align small allocations on 8 or 4 byte boundaries, respectively. This causes allocateNativeInternal to sometimes return memory that is not properly aligned when the requested alignment is exactly 16 bytes. To make sure we honour the requested alignment when it exaclty matches the quantum as defined by MAX_MALLOC_ALIGN, this patch ensures that we adjust the alignment also in this case. This should make no difference for platforms where malloc allready aligns on the quantum, except for a few unnecessary trivial calculations. This work was sponsored by: The FreeBSD Foundation
|
👋 Welcome back haraldei! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
Webrevs
|
|
Are you saying that jemalloc aligns allocations that are exactly 16 bytes on an 8 byte boundary? How does this work when you want to allocate space for a single 16-byte size, 16-byte aligned value? (e.g. long double? I think FreeBSD uses the SysV ABI, right?) |
No. A 16 byte allocation will fall on a 16 byte boundary. But a 8 byte or smaller allocation will fall on a 8 byte boundary. And since the original code would not adjust the size if |
|
Thanks, I think I get it. What happens if you try to allocate 4 bytes of memory and request 8 byte alignment? Won't the result only be 4 byte aligned? I think the assumption of the current optimization is wrong: that malloc always returns memory aligned to a constant |
No, it will still be 8 byte aligned. See the implementation notes in the jemalloc man page:
Yes, that is the root of the problem.
Having something like
Yeah, this would definitely make the code clearer! I spent quite some time trying to understand where this assumption around |
|
Would it make sense to add |
|
From a logical point of view, what we'd need would be a couple of extra constants:
Then, we have three cases:
The problem is: how do we discover these constants?
Yeah, that would be nice -- but I noticed that
So, allowing this would require some discussion. Also, going down this path will likely require its own |
Note: we can't just do a "trial allocation" of 4 bytes and see what comes out -- we might just be "lucky" and see an 8-byte aligned address, even though the underlying allocator is aligning at 4 bytes... |
If we can't establish a min alignment, we could at least have some way to determine the max alignment (I'd say probably 16 is a good number because of system ABI?), and then just use two rules:
This should still deliver the kind of compaction we were aiming for with the optimization, but hopefully get there in a more portable way? |
|
@mcimadamore Thanks for the input! I will have to think more about this to be sure I see it clearly. I was made aware by @bsdkurt that my original proposal here is flawed. It will work the same for platforms with a constant Currently I think @JornVernee's suggestion looks the most promising. It allows for flexibility in determining the underlying architecture's alignment preferences base on the size of the allocation. |
I think so, but for this case we would also need |
OpenBSD's malloc smallest arena is 16 bytes so it matches the current assumption |
This seems reasonable to me. While the current code's constant is named
Doesn't this assume that all malloc implementations follow power of 2 pattern of arena sizes: 8, 16, 32, 64 and pointer alignments between min and max? malloc could also be implemented skipping some of those intermediate sizes. e.g. 16, 64, 256.
|
|
I think what Maurizio is suggesting is probably the most flexible. We can assume that e.g. a 4 byte allocation is at least 4 byte aligned, and an 8 byte allocation is also at least 8 bytes aligned (which implies 4 byte alignment as well), up to a value equal to
If an 8 byte value is allocated in a 16 byte arena, I assume it is 16 byte aligned, which implies 8 byte alignment. |
I see now. This makes sense to me. Thank you for explaining it. |
This work was sponsored by: The FreeBSD Foundation
Introducing a helper function as suggested by JornVernee to decide on the proper alignment based on the segment size. This work was sponsored by: The FreeBSD Foundation Co-authored-by: JornVernee
|
I've pushed a new version now, by adding a helper function as suggested by @JornVernee, but if you want I can have another go with @mcimadamore's suggestion as well. Also I extended the alignedAccess test to allocate a second segment and fill it, to detect if the first segment causes out of bounds access due to the alignment. Let me know if I should add this as a separate test instead. |
For what it's worth, I think the described behavior is non-conforming to the C "The pointer returned if the allocation succeeds is suitably aligned so that (That's from C11 7.22.3/1. C99 and C17 have the same wording. I can't find my DR75 reiterated that the malloc result must be suitably aligned for any A consequence of the pre-C23 behavior is that is always valid. C23 permits that to be UB. (You aren't allowed to create C23 added the phrase "and size less than or equal to the size requested" after I would not be surprised if HotSpot also has code that assumes the result from |
That is because NMT doesn't support allocations with alignment Supporting |
That may be, but it's nevertheless the behaviour of the allocator used by libc on FreeBSD. It's also something that will only affect very small allocations (8 bytes or less on a 64bit system.) |
|
My 0.02$ -- regardless of what Hotspot code might do today, I think it would be preferrable to make the FFM allocation code a bit more flexible (along the lines described above). Whether fully supported by the Java runtime or not, to my eyes it just seem that, at least in some configurations, the API doesn't do what it says on the tin. |
|
The fix is simple and pragmatic. The main difference between this and what I described is that by singling out FreeBSD, we won't be able to support cases where e.g. a developer runs on Linux, but using LD_PRELOAD to use a different allocator like jemalloc? It's a cornery case, but one I've seen from time to time to take advantage of some of the "hardening" features provided by jemalloc (and diagnose memory issues). Of course, we can also keep this PR as is, and then address other (more cornery cases) in separate PRs. |
|
@mcimadamore That's a very good point! I'll try to update the patch, and see how it works out. |
|
The VM (which the JDK interfaces with malloc through) guarantees at-most 16 byte alignment from |
Only align up the requested memory if the requested alignment is larget than max alignment provided by malloc, or if the requested size is not a multiple of the alignment size. This work was sponsored by: The FreeBSD Foundation Co-authored-by: mcimadamore
The standard was ambiguous prior to that and https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2293.htm is a good read on that topic. Modified test works with initialized segments and there allocated size is at least 8 bytes. Allocating non-initialized segments of smaller sizes might also help to reveal alignment related bugs. Does it worth adding test like in this PR? |
I think it might be a good idea, yes. |
Note that, since this is just a change on a Java API, I don't think should affect the VM? Then of course if someone runs java in an environment with LD_PRELOAD where jemalloc is used instead of malloc, and all allocation (even ones from native code) aligns differently, I can't speak on how that would impact the JVM. But this change, alone, should not impact the VM in any way? |
|
The latest changes look good -- but there seem to be failures in the test pipelines. |
I'm looking at the test failures now. At least it seems they fail consistently across all platforms. |
jdk.internal.foreign.SegmentFactories::allocateNativeInternalassumes that the underlying implementation of malloc aligns allocations on 16 byte boundaries for 64 bit platforms, and 8 byte boundaries on 32 bit platforms. So for any allocation where the requested alignment is less than or equal to this default alignment it makes no adjustment.However, this assumption does not hold for all allocators. Specifically jemallc, used by libc on FreeBSD will align small allocations on 8 or 4 byte boundaries, respectively. This causes allocateNativeInternal to sometimes return memory that is not properly aligned when the requested alignment is exactly 16 bytes.
To make sure we honour the requested alignment when it exaclty matches the quantum as defined by MAX_MALLOC_ALIGN, this patch ensures that we adjust the alignment also in this case.
This should make no difference for platforms where malloc allready aligns on the quantum, except for a few unnecessary trivial calculations.
This work was sponsored by: The FreeBSD Foundation
Progress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28235/head:pull/28235$ git checkout pull/28235Update a local copy of the PR:
$ git checkout pull/28235$ git pull https://git.openjdk.org/jdk.git pull/28235/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 28235View PR using the GUI difftool:
$ git pr show -t 28235Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28235.diff
Using Webrev
Link to Webrev Comment