8371637: allocateNativeInternal sometimes return incorrectly aligned memory #28235

snake66 · 2025-11-11T14:11:28Z

jdk.internal.foreign.SegmentFactories::allocateNativeInternal assumes that the underlying implementation of malloc aligns allocations on 16 byte boundaries for 64 bit platforms, and 8 byte boundaries on 32 bit platforms. So for any allocation where the requested alignment is less than or equal to this default alignment it makes no adjustment.

However, this assumption does not hold for all allocators. Specifically jemallc, used by libc on FreeBSD will align small allocations on 8 or 4 byte boundaries, respectively. This causes allocateNativeInternal to sometimes return memory that is not properly aligned when the requested alignment is exactly 16 bytes.

To make sure we honour the requested alignment when it exaclty matches the quantum as defined by MAX_MALLOC_ALIGN, this patch ensures that we adjust the alignment also in this case.

This should make no difference for platforms where malloc allready aligns on the quantum, except for a few unnecessary trivial calculations.

This work was sponsored by: The FreeBSD Foundation

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8371637: allocateNativeInternal sometimes return incorrectly aligned memory (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28235/head:pull/28235
$ git checkout pull/28235

Update a local copy of the PR:
$ git checkout pull/28235
$ git pull https://git.openjdk.org/jdk.git pull/28235/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28235

View PR using the GUI difftool:
$ git pr show -t 28235

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28235.diff

Using Webrev

Link to Webrev Comment

…memory jdk.internal.foreign.SegmentFactories::allocateNativeInternal assumes that the underlying implementation of malloc aligns allocations on 16 byte boundaries for 64 bit platforms, and 8 byte boundaries on 32 bit platforms. So for any allocation where the requested alignment is less than or equal to this default alignment it makes no adjustment. However, this assumption does not hold for all allocators. Specifically jemallc, used by libc on FreeBSD will align small allocations on 8 or 4 byte boundaries, respectively. This causes allocateNativeInternal to sometimes return memory that is not properly aligned when the requested alignment is exactly 16 bytes. To make sure we honour the requested alignment when it exaclty matches the quantum as defined by MAX_MALLOC_ALIGN, this patch ensures that we adjust the alignment also in this case. This should make no difference for platforms where malloc allready aligns on the quantum, except for a few unnecessary trivial calculations. This work was sponsored by: The FreeBSD Foundation

bridgekeeper · 2025-11-11T14:12:52Z

👋 Welcome back haraldei! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-11-11T14:13:35Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-11-11T14:14:26Z

@snake66 The following label will be automatically applied to this pull request:

core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-11-11T14:17:55Z

Webrevs

JornVernee · 2025-11-11T15:32:14Z

Are you saying that jemalloc aligns allocations that are exactly 16 bytes on an 8 byte boundary? How does this work when you want to allocate space for a single 16-byte size, 16-byte aligned value? (e.g. long double? I think FreeBSD uses the SysV ABI, right?)

snake66 · 2025-11-11T15:51:45Z

Are you saying that jemalloc aligns allocations that are exactly 16 bytes on an 8 byte boundary?

No. A 16 byte allocation will fall on a 16 byte boundary. But a 8 byte or smaller allocation will fall on a 8 byte boundary. And since the original code would not adjust the size if byteAlignment was exactly 16, this would mean that the requested alignment was not honoured. This caused the TestMemoryAlignment test to fail intermittently for exacrly 16 byte alignment requests. For all other valid values of byteAlignment the alignment would be as expected.

JornVernee · 2025-11-11T16:24:23Z

Thanks, I think I get it. What happens if you try to allocate 4 bytes of memory and request 8 byte alignment? Won't the result only be 4 byte aligned?

I think the assumption of the current optimization is wrong: that malloc always returns memory aligned to a constant MAX_MALLOC_ALIGN, and instead it depends on the size of the allocation, and the underlying allocator. I think ideally we'd be able to ask the allocator what the alignment of the memory it will return of a certain size is (or ask it to do an aligned allocation). We could also use a helper method that returns the alignment for a certain allocation size:

private static final boolean IS_FREEBSD = System.getProperty("os.name").equals(...);

private long alignmentForSize(long size) {
    if (IS_FREEBSD) {
        ...
    } else {
        return Unsafe.ADDRESS_SIZE == 4 ? 8 : 16;
    }
}

snake66 · 2025-11-11T17:58:11Z

Thanks, I think I get it. What happens if you try to allocate 4 bytes of memory and request 8 byte alignment? Won't the result only be 4 byte aligned?

No, it will still be 8 byte aligned. See the implementation notes in the jemalloc man page:

Allocation requests that are no more than half the quantum (8 or 16, depending on architecture) are rounded up to the nearest power of two that is at least sizeof(double). All other object size classes are multiples of the quantum, (,,,)

I think the assumption of the current optimization is wrong: that malloc always returns memory aligned to a constant MAX_MALLOC_ALIGN, and instead it depends on the size of the allocation, and the underlying allocator.

Yes, that is the root of the problem.

I think ideally we'd be able to ask the allocator what the alignment of the memory it will return of a certain size is (or ask it to do an aligned allocation).

Having something like os::posix_memalign() could eliminate the problem completely, and probably simplify the code in allocateNativeInternal quite a bit.

We could also use a helper method that returns the alignment for a certain allocation size:

private static final boolean IS_FREEBSD = System.getProperty("os.name").equals(...);

private long alignmentForSize(long size) {
    if (IS_FREEBSD) {
        ...
    } else {
        return Unsafe.ADDRESS_SIZE == 4 ? 8 : 16;
    }
}

Yeah, this would definitely make the code clearer! I spent quite some time trying to understand where this assumption around MAX_MALLOC_ALIGN came from :)

minborg · 2025-11-12T08:27:32Z

Would it make sense to add OperatingSystem.BSD to consolidate any such predicates?

mcimadamore · 2025-11-12T11:19:51Z

From a logical point of view, what we'd need would be a couple of extra constants:

MIN_ALIGN, this is the minimum alignment provided by the allocator/OS/platform combo
MAX_ALIGN, this is the maximum alignment provided by the allocator/OS/platform combo

Then, we have three cases:

if the requested alignment A is A <= MIN_ALIGN, we can just allocate and don't adjust for alignment
if the requested alignment A is MIN_ALIGN < A <= MAX_ALIGN and the requested size is a multiple of the alignment, also just allocate and don't adjust for alignment
otherwise, allocate a bigger segment and manually align the result

The problem is: how do we discover these constants?

Having something like os::posix_memalign() could eliminate the problem completely, and probably simplify the code in allocateNativeInternal quite a bit.

Yeah, that would be nice -- but I noticed that posix_memalign is currently not allowed in hotspot code:

jdk/src/hotspot/os/posix/forbiddenFunctions_posix.hpp

Line 45 in 400a83d

    
           FORBID_C_FUNCTION(int posix_memalign(void**, size_t, size_t), noexcept, "don't use");

So, allowing this would require some discussion. Also, going down this path will likely require its own Unsafe primitive, and intrinsics, plus potential tweaks to support NMT. So, not straightforward to pull off.

mcimadamore · 2025-11-12T11:21:40Z

The problem is: how do we discover these constants?

Note: we can't just do a "trial allocation" of 4 bytes and see what comes out -- we might just be "lucky" and see an 8-byte aligned address, even though the underlying allocator is aligning at 4 bytes...

mcimadamore · 2025-11-12T11:27:06Z

Then, we have three cases:

* if the requested alignment `A` is `A <= MIN_ALIGN`, we can just allocate and don't adjust for alignment

* if the requested alignment `A` is `MIN_ALIGN < A <= MAX_ALIGN` and the requested size is a multiple of the alignment, also just allocate and don't adjust for alignment

* otherwise, allocate a bigger segment and manually align the result

If we can't establish a min alignment, we could at least have some way to determine the max alignment (I'd say probably 16 is a good number because of system ABI?), and then just use two rules:

if the requested alignment A is A <= MAX_ALIGN and the requested size is a multiple of the alignment, also just allocate and don't adjust for alignment
otherwise, allocate a bigger segment and manually align the result

This should still deliver the kind of compaction we were aiming for with the optimization, but hopefully get there in a more portable way?

snake66 · 2025-11-12T12:07:44Z

@mcimadamore Thanks for the input! I will have to think more about this to be sure I see it clearly.

I was made aware by @bsdkurt that my original proposal here is flawed. It will work the same for platforms with a constant MAX_MALLOC_ALIGN of 16, but for the FreeBSD/jemalloc case, it still allocates only 8 bytes, but may cause access outside of the allocated memory. The test pass because the out of bounds accessed memory is not allocated or overwritten by somebody else.

Currently I think @JornVernee's suggestion looks the most promising. It allows for flexibility in determining the underlying architecture's alignment preferences base on the size of the allocation.

snake66 · 2025-11-12T12:15:56Z

Would it make sense to add OperatingSystem.BSD to consolidate any such predicates?

I think so, but for this case we would also need OperatingSystem.FreeBSD, as I am uncerain if OpenBSD has the same issue. (NetBSD seems to also use jemalloc, and should behave like FreeBSD, though.)

bsdkurt · 2025-11-12T15:40:27Z

Would it make sense to add OperatingSystem.BSD to consolidate any such predicates?

I think so, but for this case we would also need OperatingSystem.FreeBSD, as I am uncerain if OpenBSD has the same issue. (NetBSD seems to also use jemalloc, and should behave like FreeBSD, though.)

OpenBSD's malloc smallest arena is 16 bytes so it matches the current assumption MAX_MALLOC_ALIGN makes and does not exhibit a problem with the tests.

bsdkurt · 2025-11-12T16:05:39Z

From a logical point of view, what we'd need would be a couple of extra constants:

MIN_ALIGN, this is the minimum alignment provided by the allocator/OS/platform combo

MAX_ALIGN, this is the maximum alignment provided by the allocator/OS/platform combo

Then, we have three cases:

if the requested alignment A is A <= MIN_ALIGN, we can just allocate and don't adjust for alignment

This seems reasonable to me. While the current code's constant is named MAX_MALLOC_ALIGN, I believe in practice it is really the MIN_ALIGN and probably should be renamed. It seems to me the current code is written as if it is the MIN_ALIGN.

if the requested alignment A is MIN_ALIGN < A <= MAX_ALIGN and the requested size is a multiple of the alignment, also just allocate and don't adjust for alignment

Doesn't this assume that all malloc implementations follow power of 2 pattern of arena sizes: 8, 16, 32, 64 and pointer alignments between min and max? malloc could also be implemented skipping some of those intermediate sizes. e.g. 16, 64, 256.

otherwise, allocate a bigger segment and manually align the result

JornVernee · 2025-11-13T14:59:53Z

I think what Maurizio is suggesting is probably the most flexible. We can assume that e.g. a 4 byte allocation is at least 4 byte aligned, and an 8 byte allocation is also at least 8 bytes aligned (which implies 4 byte alignment as well), up to a value equal to alignof(max_align_t), which we currently assume to be 16 (though, we could have a native method that actually returns alignof(max_align_t)).

Doesn't this assume that all malloc implementations follow power of 2 pattern of arena sizes: 8, 16, 32, 64 and pointer alignments between min and max? malloc could also be implemented skipping some of those intermediate sizes. e.g. 16, 64, 256.

If an 8 byte value is allocated in a 16 byte arena, I assume it is 16 byte aligned, which implies 8 byte alignment.

bsdkurt · 2025-11-13T17:02:39Z

I think what Maurizio is suggesting is probably the most flexible. We can assume that e.g. a 4 byte allocation is at least 4 byte aligned, and an 8 byte allocation is also at least 8 bytes aligned (which implies 4 byte alignment as well), up to a value equal to alignof(max_align_t), which we currently assume to be 16 (though, we could have a native method that actually returns alignof(max_align_t)).

I see now. This makes sense to me. Thank you for explaining it.

This work was sponsored by: The FreeBSD Foundation

Introducing a helper function as suggested by JornVernee to decide on the proper alignment based on the segment size. This work was sponsored by: The FreeBSD Foundation Co-authored-by: JornVernee

snake66 · 2025-11-13T19:18:25Z

I've pushed a new version now, by adding a helper function as suggested by @JornVernee, but if you want I can have another go with @mcimadamore's suggestion as well.

Also I extended the alignedAccess test to allocate a second segment and fill it, to detect if the first segment causes out of bounds access due to the alignment. Let me know if I should add this as a separate test instead.

kimbarrett · 2025-11-13T21:18:58Z

However, this assumption does not hold for all allocators. Specifically
jemallc, used by libc on FreeBSD will align small allocations on 8 or 4 byte
boundaries, respectively.

For what it's worth, I think the described behavior is non-conforming to the C
standards before C23. Before C23, the description of the allocation functions
all say

"The pointer returned if the allocation succeeds is suitably aligned so that
it may be assigned to a pointer to any type of object with a fundamental
alignment requirement and then used to access such an object or an array of
such objects in the space allocated ... "

(That's from C11 7.22.3/1. C99 and C17 have the same wording. I can't find my
copy of C89 right now, but expect it's pretty much the same.)

DR75 reiterated that the malloc result must be suitably aligned for any
(emphasis in the DR) type.
https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_075.html

A consequence of the pre-C23 behavior is that

max_align_t* p = malloc(1);

is always valid. C23 permits that to be UB. (You aren't allowed to create
misaligned pointers.)

C23 added the phrase "and size less than or equal to the size requested" after
"fundamental alignment requirement". I think that's sufficient to permit the
described behavior. But we're not using C23 (yet), we're using C11.

I would not be surprised if HotSpot also has code that assumes the result from
malloc and friends is always aligned to at least max_align_t's alignment.

jdksjolen · 2025-11-14T08:12:21Z

Yeah, that would be nice -- but I noticed that posix_memalign is currently not allowed in hotspot code:

That is because NMT doesn't support allocations with alignment >16 bytes at the moment. In Hotspot, we've been working with the assumption of 16 byte alignment, due to the C standard (as per Kim's comment).

Supporting posix_memalign isn't impossible, it's simply not been a priority.

snake66 · 2025-11-14T09:44:19Z

@kimbarrett

For what it's worth, I think the described behavior is non-conforming to the C standards before C23

That may be, but it's nevertheless the behaviour of the allocator used by libc on FreeBSD. It's also something that will only affect very small allocations (8 bytes or less on a 64bit system.)

mcimadamore · 2025-11-14T10:11:14Z

My 0.02$ -- regardless of what Hotspot code might do today, I think it would be preferrable to make the FFM allocation code a bit more flexible (along the lines described above). Whether fully supported by the Java runtime or not, to my eyes it just seem that, at least in some configurations, the API doesn't do what it says on the tin.

mcimadamore · 2025-11-14T10:19:09Z

The fix is simple and pragmatic. The main difference between this and what I described is that by singling out FreeBSD, we won't be able to support cases where e.g. a developer runs on Linux, but using LD_PRELOAD to use a different allocator like jemalloc? It's a cornery case, but one I've seen from time to time to take advantage of some of the "hardening" features provided by jemalloc (and diagnose memory issues). Of course, we can also keep this PR as is, and then address other (more cornery cases) in separate PRs.

snake66 · 2025-11-14T10:46:14Z

@mcimadamore That's a very good point! I'll try to update the patch, and see how it works out.

jdksjolen · 2025-11-14T10:54:39Z

The VM (which the JDK interfaces with malloc through) guarantees at-most 16 byte alignment from malloc, so any alignment less than that is also going to be fine. We should, as always, test this, but I don't think that anything will break on the VM side with this change.

Only align up the requested memory if the requested alignment is larget than max alignment provided by malloc, or if the requested size is not a multiple of the alignment size. This work was sponsored by: The FreeBSD Foundation Co-authored-by: mcimadamore

Yqwed · 2025-11-14T13:31:15Z

For what it's worth, I think the described behavior is non-conforming to the C standards before C23

The standard was ambiguous prior to that and https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2293.htm is a good read on that topic.

Modified test works with initialized segments and there allocated size is at least 8 bytes. Allocating non-initialized segments of smaller sizes might also help to reveal alignment related bugs. Does it worth adding test like

        int[] alignments = {2, 4, 8, 16};

        try (Arena arena = Arena.ofConfined()) {
            for (int alignment : alignments) {
                var seg = arena.allocateFrom(JAVA_BYTE.withByteAlignment(alignment), (byte) 0);
                assertTrue(seg.address() % alignment == 0);
            }
        }

in this PR?

mcimadamore · 2025-11-14T14:45:21Z

For what it's worth, I think the described behavior is non-conforming to the C standards before C23

The standard was ambiguous prior to that and https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2293.htm is a good read on that topic.

Modified test works with initialized segments and there allocated size is at least 8 bytes. Allocating non-initialized segments of smaller sizes might also help to reveal alignment related bugs. Does it worth adding test like
        int[] alignments = {2, 4, 8, 16};

        try (Arena arena = Arena.ofConfined()) {
            for (int alignment : alignments) {
                var seg = arena.allocateFrom(JAVA_BYTE.withByteAlignment(alignment), (byte) 0);
                assertTrue(seg.address() % alignment == 0);
            }
        }
in this PR?

I think it might be a good idea, yes.

mcimadamore · 2025-11-14T14:47:51Z

The VM (which the JDK interfaces with malloc through) guarantees at-most 16 byte alignment from malloc, so any alignment less than that is also going to be fine. We should, as always, test this, but I don't think that anything will break on the VM side with this change.

Note that, since this is just a change on a Java API, I don't think should affect the VM? Then of course if someone runs java in an environment with LD_PRELOAD where jemalloc is used instead of malloc, and all allocation (even ones from native code) aligns differently, I can't speak on how that would impact the JVM.

But this change, alone, should not impact the VM in any way?

mcimadamore · 2025-11-14T14:49:31Z

The latest changes look good -- but there seem to be failures in the test pipelines.

snake66 · 2025-11-14T15:05:16Z

The latest changes look good -- but there seem to be failures in the test pipelines.

I'm looking at the test failures now. At least it seems they fail consistently across all platforms.

openjdk bot added the core-libs core-libs-dev@openjdk.org label Nov 11, 2025

openjdk bot added the rfr Pull request is ready for review label Nov 11, 2025

snake66 added 2 commits November 13, 2025 19:57

Test that native segments don't overlap

2090700

This work was sponsored by: The FreeBSD Foundation

Second try to fix alignment for native segments

2b8266f

Introducing a helper function as suggested by JornVernee to decide on the proper alignment based on the segment size. This work was sponsored by: The FreeBSD Foundation Co-authored-by: JornVernee

8371637: allocateNativeInternal sometimes return incorrectly aligned memory #28235

Are you sure you want to change the base?

8371637: allocateNativeInternal sometimes return incorrectly aligned memory #28235

Conversation

snake66 commented Nov 11, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Nov 11, 2025

Uh oh!

openjdk bot commented Nov 11, 2025

Uh oh!

openjdk bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlbridge bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

JornVernee commented Nov 11, 2025

Uh oh!

snake66 commented Nov 11, 2025

Uh oh!

JornVernee commented Nov 11, 2025

Uh oh!

snake66 commented Nov 11, 2025

Uh oh!

minborg commented Nov 12, 2025

Uh oh!

mcimadamore commented Nov 12, 2025

Uh oh!

mcimadamore commented Nov 12, 2025

Uh oh!

mcimadamore commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snake66 commented Nov 12, 2025

Uh oh!

snake66 commented Nov 12, 2025

Uh oh!

bsdkurt commented Nov 12, 2025

Uh oh!

bsdkurt commented Nov 12, 2025

Uh oh!

JornVernee commented Nov 13, 2025

Uh oh!

bsdkurt commented Nov 13, 2025

Uh oh!

snake66 commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kimbarrett commented Nov 13, 2025

Uh oh!

jdksjolen commented Nov 14, 2025

Uh oh!

snake66 commented Nov 14, 2025

Uh oh!

mcimadamore commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcimadamore commented Nov 14, 2025

Uh oh!

snake66 commented Nov 14, 2025

Uh oh!

jdksjolen commented Nov 14, 2025

Uh oh!

Yqwed commented Nov 14, 2025 • edited by bridgekeeper bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcimadamore commented Nov 14, 2025

Uh oh!

mcimadamore commented Nov 14, 2025

Uh oh!

mcimadamore commented Nov 14, 2025

Uh oh!

snake66 commented Nov 14, 2025

Uh oh!

Reviewers

snake66 commented Nov 11, 2025 •

edited by openjdk bot

Loading

openjdk bot commented Nov 11, 2025 •

edited

Loading

mlbridge bot commented Nov 11, 2025 •

edited

Loading

mcimadamore commented Nov 12, 2025 •

edited

Loading

snake66 commented Nov 13, 2025 •

edited

Loading

mcimadamore commented Nov 14, 2025 •

edited

Loading

Yqwed commented Nov 14, 2025 •

edited by bridgekeeper bot

Loading