Skip to content

Conversation

@acidicoala
Copy link
Collaborator

Greetings

Hello, @stevemk14ebr. I hope you're doing well and having fun building cool things.

Preamble

As stated in its documentation, Polyhook2 aims to support Linux. However, over the years the Linux code-base wasn't maintained & tested as thoroughly as Windows code-base was, so several issues slowly crept up and needed resolving. As I am adding support for Linux build to my projects, I have an opportunity to test drive Polyhook2's Linux support under real conditions, in addition to synthetic tests. This PR aims to minimize the scope of changes, because it is already quite large. I will be listing the most important changes made one by one. In the ending section I will also outline areas of future improvement that warrant a separate discussion, once this PR has been merged.

Without further ado, let us look at the changes.

Implemented fixes

Update Catch.hpp to latest v2

When working on Linux support, it is of paramount importance to run unit tests. However, under Clang compiler the Catch.hpp that was in use by the project was producing build errors. In particular the error message stated:

Catch.hpp:10881:33: error: variable length array declaration not allowed at file scope

This issue was quite easy to fix by updating it with the latest header from the official repo: https://github.com/catchorg/Catch2/releases/download/v2.13.10/catch.hpp

.clang-format config file

Working on the code base right now presents considerable challenges. One of those challenges is formatting. With no standardized formatting configuration in place, IDEs will format the code according to their local settings, which will result in inconsistent formatting on one hand, and unintentional formatting on the other (one side will format with spaces, another side will format with tabs). To start remedying this issue, I introduced a new .clang-format config, which is used by the clang-format utility that comes bundled with Clang compiler or with IDEs like CLion. I spent a little time trying to set up the config to match the existing code-base style as much as possible, but it obviously won't be 100% perfect. However, I did not format the entire project with it, because that would make reviewing this PR a nightmare. This reformatting will be done in a separate PR instead, where we can fine-tune the config according to your tastes. I personally can adapt to any style, for me the most important issue is consistency. I ask you to overlook the specifics of the style for now, as there will be plenty of opportunities to tinker with style that suites your tastes once the main issues of this PR have been resolved.

In places which we don't want clang-format to touch, we can wrap the code with comments that turn it off:

// clang-format off

void wont_be_formatted();

// clang-format on

New TestUtils.hpp header with Linux macros

Many tests make use of the HOOK_CALLBACK macro to define hooked functions. However the problem was that its implementation causing build errors on Linux. The crux of the issue was that it relied on template specializations of the make_callback to support various calling conventions. 32-bit Windows has multiple calling conventions, and those conventions form part of the function's signature. Hence, they can be used to generate function overloads / template specializations. But on Linux there is a single standard convention on 32-bit mode and on 64-bit mode. The __attribute__((convention)) specifiers were simply ignored by compiler, resulting in identical specializations, which lead to redefinition build errors.

Now, to fix the build I've disabled all but one specialization on Linux, but this wasn't the only issue on Linux. Even though calling convention specifiers are ignored on Linux, the noexcept is forming a part of the signature. Even though noexcept was not explicitly present in make_callback definitions, it was nevertheless implicitly evaluated to noexcept(false) by compiler. Hence it lead to errors when trying to use it for functions like malloc, which has the following signature on Linux:

void *malloc(size_t __size) noexcept(true) 

Because I wanted to limit the scope of changes in this already large change-list, I opted to create a new temporary macro for Linux tests. It solves the issue of noexcept using C++20 features. In the future I want to reconcile it with HOOK_CALLBACK so there is only one macro that supports both Linux and Windows, but for now I ask you to overlook this duplication for a while.

You can check out the macro implementation here: https://github.com/acidicoala/PolyHook_2_0/blob/2cd24dd7ac1e537315d53a4ab4765a0368c46e9f/UnitTests/TestUtils.hpp#L6-L31

I decided to place this new macro in a new TestUtils.hpp header, located in UnitTests. I don't think that such test utilities should be exposed as part of public API, since we would need to modify it at will to suit project's testing needs. And this might break builds for projects that ended up relying on such exported helpers. In the future I think all utilities from polyhook2/Tests should be move to UnitTests, but this is a discussion for another day.

In the new PLH_TEST_CALLBACK I also incorporated StackCanary and effects.PeakEffect().trigger() calls, since they were present in every hooked function. With the help of a small PLH_TEST_DETOUR_CALLBACK macro, the definition of hooked functions went from this:

uint64_t hookMallocTramp = NULL;
HOOK_CALLBACK(&malloc, h_hookMalloc, { // NOLINT(cert-err58-cpp)
    PLH::StackCanary canary;
    volatile int i = 0;
    PH_UNUSED(i);
    effects.PeakEffect().trigger();

    return PLH::FnCast(hookMallocTramp, &malloc)(_args...);
});

To this:

PLH_TEST_DETOUR_CALLBACK(malloc);

I do intend to bring these improvements to Windows tests as well, but in another PR.

Run tests in GitHub Actions CI

The current CI workflow only checks if the project can build successfully for windows-msvc, linux-gcc, and linux-clang. However, given the project's complexity I think it is important to institute regular testing as well. To that end, I reworked the CI workflow to not only build PolyHook2 as dynamic library, but also to build and run corresponding tests. You can see the resulting action runs here: https://github.com/acidicoala/PolyHook_2_0/actions/runs/17800985282

The tests are failing at the moment for 32-bit builds because there are non-trivial issues that I will elaborate on further in this post. But overall it works well. The windows-clang combination is excluded for now, but I see no reason why Polyhook2 shouldn't support it as well. But this is again work left for the future. I plan to move from MSVC to Clang compiler for Windows builds in my own projects in the future. When that time comes, I will work on adding Clang support for windows in PolyHook2 as well.

Fix scheme retry logic

During my tests of x64 detours, I noticed that Polyhook2 might chose a scheme that is supported, like INPLACE, and succeed in generating trampolines. But if it turned out that the generated hook instructions are not large enough to fit into the function prologue, Polyhook2 would just fail without trying other schemes like INPLACE_SHORT which would have succeeded. To fix this I created function fitHookInstsIntoPrologue that is used when checking a certain detour scheme.

Fix Static Initialization Order Fiasco in x64 detours

Defining and initializing complex types in the scope of a translation unit works fine on MSVC. But on Clang this sometimes leads to Static Initialization Order Fiasco. I was not able to reproduce this issue during unit tests, but it was reliably reproducible in my project that was making use of PolyHook2 and Clang compiler. I solved this problem with the standard Construct On First Use Idiom.
Before:

const static std::map<ZydisRegisterClass, ZydisRegister> class_to_reg{ // NOLINT(cert-err58-cpp)
    {ZYDIS_REGCLASS_GPR64, ZYDIS_REGISTER_RAX},
    {ZYDIS_REGCLASS_GPR32, ZYDIS_REGISTER_EAX},
    {ZYDIS_REGCLASS_GPR16, ZYDIS_REGISTER_AX},
    {ZYDIS_REGCLASS_GPR8,  ZYDIS_REGISTER_AL},
};

After:

const auto& get_class_to_reg() {
	const static std::map<ZydisRegisterClass, ZydisRegister> class_to_reg{
		{ZYDIS_REGCLASS_GPR64, ZYDIS_REGISTER_RAX},
		{ZYDIS_REGCLASS_GPR32, ZYDIS_REGISTER_EAX},
		{ZYDIS_REGCLASS_GPR16, ZYDIS_REGISTER_AX},
		{ZYDIS_REGCLASS_GPR8,  ZYDIS_REGISTER_AL},
	};

	return class_to_reg;
}

Only the global variables in x64Detours.cpp were fixed. I did not go hunting for other globals across the project to keep the scope small, but it is worth noting that if such globals do exists, they will lead to errors. Hence, it is left up for a future PR to fix this problem completely.

Fix safe_mem_read and mem_protect

The previous implementations on Linux assumed that the memory slice in question was within the bounds of the memory mapping (memory page) where the starting address is located. But in my tests this was consistently not the case. I had a trampoline that was allocated right at the end of a heap mapping, so it spanned multiple mappings. Because of that, memory protector was not applying its permissions for the entire length of the given memory slice, which eventually led to seg-faults. I fixed this issue by applying memory protections not only for the page with starting address, but for all pages that a given memory slice spans. In practice that was only two, but in the future we should construct unit test cases for memory slices that span more than two mappings.

I suppose the same kind of fix ideally needs to be applied to Windows as well. I guess that we haven't seen such issues on Windows because memory pages on Windows are on average larger that on Linux, where they can be as small as 0x1000.

Fix detour unhooking

When creating a detour hook instance, Polyhook2 would save the m_userTrampVar taken from constructor so that it could write the trampoline address to it during the hook() invocation. But Polyhook2 also attempted to write null to this trampoline address during unhooking/instance destruction. This in turn required for this m_userTrampVar to have remained valid up to the point of destruction. The first problem with it was that such a requirement was not documented anywhere. The second, and more important problem, is that PolyHook2 shouldn't be managing m_userTrampVar after it has done hooking in the first place. For instance, during hooking a user might use a variable from local stack to receive the trampoline address and then save that trampoline address elsewhere, like an std::map. But during unhooking Polyhook2 would attempt to write null to that address that used to be a local stack variable. This in effect silently corrupted the stack which later resulted in seg-faults. Given such use cases, I think that Polyhook2 should not manage m_userTrampVar after it has done its hooking. It should be the responsibility of the calling to assign nullptr to trampoline variable after unhooking if he wishes to do so. In cases where trampolines are stored in a structure like map, this might be unnecessary since the variable holding it will be deallocated anyway. Therefore, I simply commented out this code section in ADetour.cpp:

if (m_userTrampVar != nullptr) {
    *m_userTrampVar = NULL;
}

Remaining issues

The above-mentioned fixes enabled me to build and run Polyhook2 for Linux on 64-bit using Clang. But there are certain issues on 32-bit that I'm not sure how to solve, and they are not be limited to just Linux. Hence, I was looking forward to your opinion and advice on the following matters.

Issue A: Invalid makex86Jmp

Note

This is section is just my speculation from my limited understanding. I might be completely wrong here, so please read this with a grain of salt.

When making a detour, we need to create a jump from trampoline to prologue jmpToProl. On 32-bit this is done by makex86Jmp, which uses jmp instruction with 0xE9 opcode. But this instruction uses relative addressing mode, and can encode ranges from -2Gb to +2Gb. But what if the distance from trampoline to prologue is larger than 2Gb? That's exactly what happens on my linux tests and I think leads to generation of invalid trampoline jumps.

For example, take a look at this run: https://github.com/acidicoala/PolyHook_2_0/actions/runs/17800985282/job/50600426921
At the end of step Run tests we can see the following logs:

 [+] Info: m_fnAddress: 0x00000000f16c95f0

[+] Info: Original function:
f16c95f0 [1]: 55                                      push ebp
f16c95f1 [1]: 57                                      push edi
f16c95f2 [5]: e8 6a 9f 0d 00                          call 0xF17A3561 -> f17a3561
f16c95f7 [6]: 81 c7 3d c8 18 00                       add edi, 0x18C83D
f16c95fd [1]: 56                                      push esi
f16c95fe [1]: 53                                      push ebx
f16c95ff [3]: 83 ec 1c                                sub esp, 0x1C
f16c9602 [4]: 8b 74 24 30                             mov esi, dword ptr ss:[esp+0x30]
f16c9606 [7]: 80 bf 04 3c 00 00 00                    cmp byte ptr ds:[edi+0x3C04], 0x00
f16c960d [6]: 0f 84 ad 01 00 00                       jz 0xF16C97C0 -> f16c97c0
f16c9613 [2]: 85 f6                                   test esi, esi
f16c9615 [6]: 0f 88 b2 01 00 00                       js 0xF16C97CD -> f16c97cd
f16c961b [6]: 8b af 78 00 00 00                       mov ebp, dword ptr ds:[edi+0x78]
f16c9621 [3]: 8d 5e 13                                lea ebx, dword ptr ds:[esi+0x13]
f16c9624 [3]: 83 e3 f0                                and ebx, 0xFFFFFFF0
f16c9627 [4]: 65 8b 55 00                             mov edx, dword ptr gs:[ebp]
f16c962b [3]: 83 eb 01                                sub ebx, 0x01
f16c962e [3]: c1 eb 04                                shr ebx, 0x04
f16c9631 [2]: 85 d2                                   test edx, edx
f16c9633 [6]: 0f 84 a7 00 00 00                       jz 0xF16C96E0 -> f16c96e0
f16c9639 [6]: 3b 9f 68 03 00 00                       cmp ebx, dword ptr ds:[edi+0x368]
f16c963f [6]: 0f 82 cb 00 00 00                       jb 0xF16C9710 -> f16c9710
f16c9645 [6]: 65 a1 0c 00 00 00                       mov eax, dword ptr gs:[0x0000000C]
f16c964b [2]: 85 c0                                   test eax, eax
f16c964d [6]: 0f 84 fd 00 00 00                       jz 0xF16C9750 -> f16c9750


[+] Info: Prologue to overwrite:
f16c95f0 [1]: 55                                      push ebp
f16c95f1 [1]: 57                                      push edi
f16c95f2 [5]: e8 6a 9f 0d 00                          call 0xF17A3561 -> f17a3561


[+] Info: Trampoline:
6167f9a0 [1]: 55                                      push ebp
6167f9a1 [1]: 57                                      push edi
6167f9a2 [5]: e8 05 00 00 00                          call 0x6167F9AC -> 6167f9ac
6167f9a7 [5]: e9 4b 9c 04 90                          jmp 0xF16C95F7 -> fffffffff16c95f7



[+] Info: Trampoline Jmp Tbl:
6167f9ac [5]: e9 b0 3b 12 90                          jmp 0x00000000f17a3561



[+] Info: Hook instructions:
f16c95f0 [5]: e9 6b 5e 09 6e                          jmp 0x000000005f75f460


/home/runner/work/_temp/452c25c0-e1d3-426e-86d3-529598ceb77f.sh: line 1:  3931 Segmentation fault      (core dumped)

When debugging locally, it crashed on the line

6167f9a7 [5]: e9 4b 9c 04 90                          jmp 0xF16C95F7 -> fffffffff16c95f7

Did I correctly identify the cause? If so, what should be the solution? Should makex86Jmp be implemented like makex64MinimumJump, which writes destination address at the bottom of trampoline and uses indirect jmp? If so, should this be a single function like makeIndirectJmp for both x86 and x64?

Issue B: Invalid instruction.getDestination()

Take a look at this test run: https://github.com/acidicoala/PolyHook_2_0/actions/runs/17800985282/job/50600427010

It fails on the case Test Disassemblers x86 FF25.
x86ASM_FF25 is defined as:

std::vector<uint8_t> x86ASM_FF25 = {
	0xFF, 0x25, 0x00, 0x00, 0x00, 0x00, // this displacement is re-written at test time since it's absolute in x86
	0xAB, 0x00, 0x00, 0xAA
};

The failing condition is:

REQUIRE(instruction.getDestination() == 0xaa0000ab);

In particular:

D:\a\PolyHook_2_0\PolyHook_2_0\UnitTests\windows\TestDisassembler.cpp(284): FAILED:
  REQUIRE( instruction.getDestination() == 0xaa0000ab )
with expansion:
  18446744072266711211 (0xffffffffaa0000ab)
  ==
  2852126891 (0xaa0000ab)

Looking into this further, I traced the issue to ZydisDisassembler::setDisplacementFields. At one point, it reads absolute displacement like this:

inst.setAbsoluteDisplacement(zydisInst->raw.disp.value);

The raw.disp.value variable is of type ZyanI64, so a signed 64-bit integer. And it already stores value 0xffffffffaa0000ab. My guess is that Zydis must have read the 32-bit relative offset as signed, and it extended it using signed extension, because 0xAA0000AB is larger than 2GB (larger than 0x7FFFFFFF), which means it's a negative number, so the result is just an extension of that negative number.

Now, this is an indirect jump instruction, so we do expect the destination to be 0xAA0000AB. So, how do we deal with this issue? Do we just force the first 32-bits to 0 and call it a day?
i.e.

inst.setAbsoluteDisplacement(zydisInst->raw.disp.value & 0x00000000FFFFFFFF);

Or am I missing something else here? What do you think a proper solution should be here?

Future discussion

We've come to the end of the PR notes. But there is much more that needs to be done to bring the best out of Polyhook2. While working on this PR, many TODOs have naturally manifested themselves. I will briefly outline them here for the sake of bookkeeping, but they are not included in the scope of this PR. Once all the issues in this PR are fixed, I might start tackling the following features one by one in their own PRs. Each of them require extensive discussion so I won't do them without having planned them out with you first.

  • Release lifecycle
    • Semver
    • Replace submodules with CPM
  • Refactor implementation:
    • Refactor private static functions into an anonymous namespace
    • Refactor implementations into explicit pure and side-effect functions
    • Do not use global static values (leads to C++ static initialization order fiasco)
  • Tests:
    • Catch v3
    • Deduplicate tests
    • Test linux-gcc (Current macro helper for callbacks relies on non-standard clang extensions)
    • Move test headers/sources to UnitTests
    • Test specific schemas in addition to recommended
    • Test different hooking functions with explicit calling conventions on windows
    • Add optional diagnostic data to hooks, like instructions that were translated
      • Reason: Ensure during translations tests that translation did actually occur
  • Logging:
    • Use lambda callback instead of a class (less boilerplate)
    • Don't add newline at the end of log lines (should be up to the consumer)
  • Misc:
    • Adopt a consistent naming convention (right now it's a mix of camelCase, snake_case, and Hungarian)
    • Create a glossary
    • Goal: Zero clang-tidy warnings

Long-term:

  • Windows-clang support
  • Generate test hooks with asmjit?

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @acidicoala, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances Polyhook2's compatibility and reliability on Linux, with a particular focus on resolving issues impacting 32-bit test environments. The changes aim to improve the overall development experience by standardizing code formatting and integrating comprehensive unit testing into the continuous integration pipeline. Key areas of improvement include more resilient detour placement, robust memory management, and addressing C++ initialization pitfalls, laying a stronger foundation for future development and cross-platform support.

Highlights

  • Linux Support & 32-bit Fixes: Addressed several long-standing issues affecting Polyhook2's stability and functionality on Linux, particularly for 32-bit architectures.
  • Build System & CI Improvements: Integrated a .clang-format configuration for consistent code style and updated the GitHub Actions CI workflow to build and run unit tests for Linux, enhancing code quality and test coverage.
  • Detour Logic Enhancements: Improved the scheme retry logic for x64 detours to ensure more robust hook placement and fixed issues related to memory protection and unhooking behavior.
  • Static Initialization Order Fiasco Mitigation: Implemented the 'Construct On First Use' idiom to resolve static initialization order fiasco issues encountered with Clang on Linux.
  • Test Utilities Refinement: Introduced a new TestUtils.hpp header with simplified macros for defining hooked functions, streamlining test code and improving readability.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent and substantial pull request that significantly improves Linux support, fixes several critical bugs, and enhances the overall code quality and test infrastructure of Polyhook2. The detailed pull request description is very helpful. I've found a few issues, including a critical bug in a new test utility function and some potential improvements in memory region handling on Linux. I've also provided feedback on some of the questions you raised in the description. Overall, this is a fantastic contribution.

@stevemk14ebr
Copy link
Owner

Wow awesome work, thank you for spending the time to work on this! And I appreciate you suffering through the AI bot, I was just trying it out, if it's being dumb just ignore it. I will need to ask for some time to review in depth, but I will get to it.

@acidicoala
Copy link
Collaborator Author

Wow awesome work, thank you for spending the time to work on this! And I appreciate you suffering through the AI bot, I was just trying it out, if it's being dumb just ignore it. I will need to ask for some time to review in depth, but I will get to it.

Hey, thanks for the swift reply! I understand that reviewing such a PR will take quite some time, and that is exactly what I would expect. There is no need to rush such things, we're dealing with quite complex piece of software here so it's OK if it takes even several days to review it.

Regarding the bot: It helped me catch an unused function in the test utilities 😅 The remaining comments mostly just paraphrased what I had already described. Overall it had a slightly positive impact, so I suppose it's worth using it, but not relying on it, as it incorrectly states certain facts, like $ in identifiers not being supported by MSVC, when it certainly is.

@stevemk14ebr
Copy link
Owner

stevemk14ebr commented Sep 27, 2025

The sign extension issue with FF 25 is due to the instruction width varying on x64 vs x86 and me not masking it correctly (and yes reading then sign extending).

Fix must be done in the instruction lib so that it still works for x64, we branch by which mode we're decoding as.

	uint64_t getDestination() const {
		uint64_t dest = isDisplacementRelative() ? getRelativeDestination() : getAbsoluteDestination();

		// ff 25 00 00 00 00 goes from jmp qword ptr [rip + 0] to jmp word ptr [rip + 0] on x64 -> x86
		if (m_isIndirect) {
			size_t read = 0;
			if (m_mode == Mode::x64) {
				// *(uint64_t*)dest;
				m_accessor.safe_mem_read(dest, (uint64_t)&dest, sizeof(uint64_t), read);
			} else {
				// *(uint32_t*)dest;
				dest &= 0x00000000FFFFFFFF;
				m_accessor.safe_mem_read(dest, (uint64_t)&dest, sizeof(uint32_t), read);
			}
		}
		return dest;
	}

For Invalid makex86Jmp that should basically not happen. I am not sure what the issue is exactly sorry.

I've reviewed most of this and it seems reasonable, the PR is quite large so definitely possible I am missing something, but I'm going to merge. In the future please try to submit smaller PRs that focus on one change at a time, this library is too complicated to handle PRs that fix multiple issues at once. Also please keep format related PRs seperate from ones that change logic, changing formatting creates noisy diffs that make it harder to review. Regardless thanks for the work.

@stevemk14ebr
Copy link
Owner

TODO: Generate test hooks with asmjit?

This has been a long time goal, would be awesome to have this.

@stevemk14ebr stevemk14ebr merged commit 10529b9 into stevemk14ebr:master Sep 27, 2025
10 of 12 checks passed
@acidicoala
Copy link
Collaborator Author

Thanks a lot, @stevemk14ebr for reviewing this PR!

In the future please try to submit smaller PRs that focus on one change at a time, this library is too complicated to handle PRs that fix multiple issues at once

Of course, that is indeed how it should be. But there were many issues related to Linux build and Linux tests, so it didn't make much sense to me to submit PRs that don't produce working Linux build or tests. Such PRs won't normally happen if we maintain all parts of the codebase regularly and that's why I added unit tests to the CI process as well.

Also please keep format related PRs seperate from ones that change logic, changing formatting creates noisy diffs that make it harder to review.

I also agree here. That's why I didn't actively go out of my way to format unrelated parts of the code. My IDE was formatting bits and pieces here and there, but I was manually reverting such changes most of the time. Some of those changes slipped through, however. Anyway, I intend to address this in the next PR.

The sign extension issue with FF 25 is due to the instruction width varying on x64 vs x86 and me not masking it correctly (and yes reading then sign extending).

I see, as I suspected. Thanks for the fix!

For Invalid makex86Jmp that should basically not happen. I am not sure what the issue is exactly sorry.

That is most unfortunate 😞... This is a major roadblock, since it prevents hooking standard libraries on Linux. On windows all libraries (User and System) are loaded in the first 2GB range (0 - 7FFF FFFF). But on Linux it seems that system libraries are mapped to the second 2GB range (8000 0000 - FFFF FFFF). For example dlmopen is consistently at e8e685f0. After researching it a bit more, I think that it should be possible to reach any memory location with +-2 GB offset available in E9 jmp instructions. Anyway, I will investigate this further.

@stevemk14ebr
Copy link
Owner

No worries, i appreciate the contributions a lot! I agree we should fix the e9 jmp bug, unfortunately I don't have time to investigate root cause myself at the moment but if you find leads I would be happy to confirm or go over fixes

@acidicoala acidicoala deleted the fix/linux branch September 28, 2025 16:06
@stevemk14ebr
Copy link
Owner

stevemk14ebr commented Sep 28, 2025

@acidicoala I looked into the e9 encoding issue more, you're analysis was correct the address space on linux for heap and your target are too far apart to be encoded on the jump from the trampoline back to the original.

Trampoline JMP Location: 0x6167f9a7 <-- trampoline heap
Intended Destination:      0xf16c95f7   <-- loaded linux library code
Required Forward Offset:   0x90049c4b (2.41 GB)

JMP rel32 Encoding Range:
  Max Positive Offset: 0x7fffffff (~+2GB)
  Min Negative Offset: -0x80000000 (~-2GB)

You are on the right idea to fix. The x86 logic is always assuming m_fnAddress, m_callback, and the trampoline are within +-2GB. This is to be honest not a good assumption, we should use a strategy approach like the x64 hook does, there's a lot of similarities between the two and x64detour handles the cases much better as the range problem of memory is nearly gauranteed on x64, we're relying on luck basically for x86 with regards to the memory layout. I would constrain the distances of the trampolines by controlling the allocation to +- 2GB, if that fails try other schemes. makex86jump should stay but places that use it should be modified including in hook() and makeTrampoline. I'd turn this section

const auto jmpToProl = makex86Jmp(jmpToProlAddr, prologue.front().getAddress() + prolSz);
ZydisDisassembler::writeEncoding(jmpToProl, *this);
const auto makeJmpFn = [=](uint64_t a, Instruction& inst) mutable {
// move inst to trampoline and point instruction to entry
auto oldDest = inst.getDestination();
inst.setAddress(inst.getAddress() + delta);
inst.setDestination(a);
return makex86Jmp(a, oldDest);
};
const uint64_t jmpTblStart = jmpToProlAddr + getJmpSize();
trampolineOut = relocateTrampoline(prologue, jmpTblStart, delta, makeJmpFn, instsNeedingReloc, instsNeedingEntry);
from makex86Jump into an FF 25 jump with the destination holdering being in the jump table the same way x64detour does it since we have ample space we control for the trampoline.

Use x64Detour as an example here for the changes necessary. There is possibly a unification possible here between x86detour and x64detour, but that seems like maybe a lot of effort?

@acidicoala
Copy link
Collaborator Author

acidicoala commented Sep 29, 2025

@stevemk14ebr, I was wrong to suggest that the issue is related to jump offset limitation. Even though the signed 32-bit offset allows us to encode +- 2GB, it is still possible to reach any address within 32-bit space thanks to arithmetic overflow and underflow.

Allow me to explain this using a simple example (I disregard instruction size of 5 bytes to simplify arithmetic, but it doesn't affect the logical outcome). So, imagine we need to jump from address 0x3000_0000 to address 0xC000_0000.Then the immediate jump range in 32-bit is 0xFFFF_FFFF - 0x7FFF_FFFF [-2gb, 2gb]. It is indeed true that max positive offset (0x7FFF_FFFF) is not enough to reach 0xC000_0000 (3_221_225_472) from 0x3000_0000(805_306_368). We can get up to just 0xAFFF_FFFF. But what we can do instead is to provide a negative offset so that it underflows. In this case it would be 0x9000000 (—1_879_048_192). Using a more intuitive decimal numbering system system, we can see that 805_306_368 + (—1_879_048_192) = —1_073_741_824. Since it's an underflow, we add 2^32 (4_294_967_296) and get 3_221_225_472, which is exactly 0xC000_0000 in hexadecimal.

Thus, there is no issue with reaching any address in 32-bit architecture using jmp imm32 instructions. I realized this when I temporarily replaced 5-byte trampolines with 6-byte trampolines jumps (push uimm32, ret) and arrived at the exact same address as before and encountered the exact same issue I described in #215. Sorry for misleading you with my initial mistaken conjecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants