-
-
Notifications
You must be signed in to change notification settings - Fork 252
Improve Linux support, fix 32-bit tests & more #214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @acidicoala, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances Polyhook2's compatibility and reliability on Linux, with a particular focus on resolving issues impacting 32-bit test environments. The changes aim to improve the overall development experience by standardizing code formatting and integrating comprehensive unit testing into the continuous integration pipeline. Key areas of improvement include more resilient detour placement, robust memory management, and addressing C++ initialization pitfalls, laying a stronger foundation for future development and cross-platform support.
Highlights
- Linux Support & 32-bit Fixes: Addressed several long-standing issues affecting Polyhook2's stability and functionality on Linux, particularly for 32-bit architectures.
- Build System & CI Improvements: Integrated a
.clang-formatconfiguration for consistent code style and updated the GitHub Actions CI workflow to build and run unit tests for Linux, enhancing code quality and test coverage. - Detour Logic Enhancements: Improved the scheme retry logic for x64 detours to ensure more robust hook placement and fixed issues related to memory protection and unhooking behavior.
- Static Initialization Order Fiasco Mitigation: Implemented the 'Construct On First Use' idiom to resolve static initialization order fiasco issues encountered with Clang on Linux.
- Test Utilities Refinement: Introduced a new
TestUtils.hppheader with simplified macros for defining hooked functions, streamlining test code and improving readability.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is an excellent and substantial pull request that significantly improves Linux support, fixes several critical bugs, and enhances the overall code quality and test infrastructure of Polyhook2. The detailed pull request description is very helpful. I've found a few issues, including a critical bug in a new test utility function and some potential improvements in memory region handling on Linux. I've also provided feedback on some of the questions you raised in the description. Overall, this is a fantastic contribution.
|
Wow awesome work, thank you for spending the time to work on this! And I appreciate you suffering through the AI bot, I was just trying it out, if it's being dumb just ignore it. I will need to ask for some time to review in depth, but I will get to it. |
Hey, thanks for the swift reply! I understand that reviewing such a PR will take quite some time, and that is exactly what I would expect. There is no need to rush such things, we're dealing with quite complex piece of software here so it's OK if it takes even several days to review it. Regarding the bot: It helped me catch an unused function in the test utilities 😅 The remaining comments mostly just paraphrased what I had already described. Overall it had a slightly positive impact, so I suppose it's worth using it, but not relying on it, as it incorrectly states certain facts, like |
|
The sign extension issue with FF 25 is due to the instruction width varying on x64 vs x86 and me not masking it correctly (and yes reading then sign extending). Fix must be done in the instruction lib so that it still works for x64, we branch by which mode we're decoding as. For Invalid makex86Jmp that should basically not happen. I am not sure what the issue is exactly sorry. I've reviewed most of this and it seems reasonable, the PR is quite large so definitely possible I am missing something, but I'm going to merge. In the future please try to submit smaller PRs that focus on one change at a time, this library is too complicated to handle PRs that fix multiple issues at once. Also please keep format related PRs seperate from ones that change logic, changing formatting creates noisy diffs that make it harder to review. Regardless thanks for the work. |
|
TODO: Generate test hooks with asmjit? This has been a long time goal, would be awesome to have this. |
|
Thanks a lot, @stevemk14ebr for reviewing this PR!
Of course, that is indeed how it should be. But there were many issues related to Linux build and Linux tests, so it didn't make much sense to me to submit PRs that don't produce working Linux build or tests. Such PRs won't normally happen if we maintain all parts of the codebase regularly and that's why I added unit tests to the CI process as well.
I also agree here. That's why I didn't actively go out of my way to format unrelated parts of the code. My IDE was formatting bits and pieces here and there, but I was manually reverting such changes most of the time. Some of those changes slipped through, however. Anyway, I intend to address this in the next PR.
I see, as I suspected. Thanks for the fix!
That is most unfortunate 😞... This is a major roadblock, since it prevents hooking standard libraries on Linux. On windows all libraries (User and System) are loaded in the first 2GB range ( |
|
No worries, i appreciate the contributions a lot! I agree we should fix the e9 jmp bug, unfortunately I don't have time to investigate root cause myself at the moment but if you find leads I would be happy to confirm or go over fixes |
|
@acidicoala I looked into the e9 encoding issue more, you're analysis was correct the address space on linux for heap and your target are too far apart to be encoded on the jump from the trampoline back to the original. You are on the right idea to fix. The x86 logic is always assuming m_fnAddress, m_callback, and the trampoline are within +-2GB. This is to be honest not a good assumption, we should use a strategy approach like the x64 hook does, there's a lot of similarities between the two and x64detour handles the cases much better as the range problem of memory is nearly gauranteed on x64, we're relying on luck basically for x86 with regards to the memory layout. I would constrain the distances of the trampolines by controlling the allocation to +- 2GB, if that fails try other schemes. makex86jump should stay but places that use it should be modified including in hook() and makeTrampoline. I'd turn this section PolyHook_2_0/sources/x86Detour.cpp Lines 136 to 149 in 10529b9
Use x64Detour as an example here for the changes necessary. There is possibly a unification possible here between x86detour and x64detour, but that seems like maybe a lot of effort? |
|
@stevemk14ebr, I was wrong to suggest that the issue is related to jump offset limitation. Even though the signed 32-bit offset allows us to encode +- 2GB, it is still possible to reach any address within 32-bit space thanks to arithmetic overflow and underflow. Allow me to explain this using a simple example (I disregard instruction size of 5 bytes to simplify arithmetic, but it doesn't affect the logical outcome). So, imagine we need to jump from address Thus, there is no issue with reaching any address in 32-bit architecture using |
Greetings
Hello, @stevemk14ebr. I hope you're doing well and having fun building cool things.
Preamble
As stated in its documentation, Polyhook2 aims to support Linux. However, over the years the Linux code-base wasn't maintained & tested as thoroughly as Windows code-base was, so several issues slowly crept up and needed resolving. As I am adding support for Linux build to my projects, I have an opportunity to test drive Polyhook2's Linux support under real conditions, in addition to synthetic tests. This PR aims to minimize the scope of changes, because it is already quite large. I will be listing the most important changes made one by one. In the ending section I will also outline areas of future improvement that warrant a separate discussion, once this PR has been merged.
Without further ado, let us look at the changes.
Implemented fixes
Update
Catch.hppto latest v2When working on Linux support, it is of paramount importance to run unit tests. However, under Clang compiler the Catch.hpp that was in use by the project was producing build errors. In particular the error message stated:
This issue was quite easy to fix by updating it with the latest header from the official repo: https://github.com/catchorg/Catch2/releases/download/v2.13.10/catch.hpp
.clang-formatconfig fileWorking on the code base right now presents considerable challenges. One of those challenges is formatting. With no standardized formatting configuration in place, IDEs will format the code according to their local settings, which will result in inconsistent formatting on one hand, and unintentional formatting on the other (one side will format with spaces, another side will format with tabs). To start remedying this issue, I introduced a new
.clang-formatconfig, which is used by the clang-format utility that comes bundled with Clang compiler or with IDEs like CLion. I spent a little time trying to set up the config to match the existing code-base style as much as possible, but it obviously won't be 100% perfect. However, I did not format the entire project with it, because that would make reviewing this PR a nightmare. This reformatting will be done in a separate PR instead, where we can fine-tune the config according to your tastes. I personally can adapt to any style, for me the most important issue is consistency. I ask you to overlook the specifics of the style for now, as there will be plenty of opportunities to tinker with style that suites your tastes once the main issues of this PR have been resolved.In places which we don't want clang-format to touch, we can wrap the code with comments that turn it off:
New
TestUtils.hppheader with Linux macrosMany tests make use of the
HOOK_CALLBACKmacro to define hooked functions. However the problem was that its implementation causing build errors on Linux. The crux of the issue was that it relied on template specializations of themake_callbackto support various calling conventions. 32-bit Windows has multiple calling conventions, and those conventions form part of the function's signature. Hence, they can be used to generate function overloads / template specializations. But on Linux there is a single standard convention on 32-bit mode and on 64-bit mode. The__attribute__((convention))specifiers were simply ignored by compiler, resulting in identical specializations, which lead to redefinition build errors.Now, to fix the build I've disabled all but one specialization on Linux, but this wasn't the only issue on Linux. Even though calling convention specifiers are ignored on Linux, the
noexceptis forming a part of the signature. Even thoughnoexceptwas not explicitly present inmake_callbackdefinitions, it was nevertheless implicitly evaluated tonoexcept(false)by compiler. Hence it lead to errors when trying to use it for functions likemalloc, which has the following signature on Linux:Because I wanted to limit the scope of changes in this already large change-list, I opted to create a new temporary macro for Linux tests. It solves the issue of noexcept using C++20 features. In the future I want to reconcile it with
HOOK_CALLBACKso there is only one macro that supports both Linux and Windows, but for now I ask you to overlook this duplication for a while.You can check out the macro implementation here: https://github.com/acidicoala/PolyHook_2_0/blob/2cd24dd7ac1e537315d53a4ab4765a0368c46e9f/UnitTests/TestUtils.hpp#L6-L31
I decided to place this new macro in a new
TestUtils.hppheader, located inUnitTests. I don't think that such test utilities should be exposed as part of public API, since we would need to modify it at will to suit project's testing needs. And this might break builds for projects that ended up relying on such exported helpers. In the future I think all utilities frompolyhook2/Testsshould be move toUnitTests, but this is a discussion for another day.In the new
PLH_TEST_CALLBACKI also incorporated StackCanary andeffects.PeakEffect().trigger()calls, since they were present in every hooked function. With the help of a smallPLH_TEST_DETOUR_CALLBACKmacro, the definition of hooked functions went from this:To this:
PLH_TEST_DETOUR_CALLBACK(malloc);I do intend to bring these improvements to Windows tests as well, but in another PR.
Run tests in GitHub Actions CI
The current CI workflow only checks if the project can build successfully for
windows-msvc,linux-gcc, andlinux-clang. However, given the project's complexity I think it is important to institute regular testing as well. To that end, I reworked the CI workflow to not only build PolyHook2 as dynamic library, but also to build and run corresponding tests. You can see the resulting action runs here: https://github.com/acidicoala/PolyHook_2_0/actions/runs/17800985282The tests are failing at the moment for 32-bit builds because there are non-trivial issues that I will elaborate on further in this post. But overall it works well. The windows-clang combination is excluded for now, but I see no reason why Polyhook2 shouldn't support it as well. But this is again work left for the future. I plan to move from MSVC to Clang compiler for Windows builds in my own projects in the future. When that time comes, I will work on adding Clang support for windows in PolyHook2 as well.
Fix scheme retry logic
During my tests of x64 detours, I noticed that Polyhook2 might chose a scheme that is supported, like
INPLACE, and succeed in generating trampolines. But if it turned out that the generated hook instructions are not large enough to fit into the function prologue, Polyhook2 would just fail without trying other schemes likeINPLACE_SHORTwhich would have succeeded. To fix this I created functionfitHookInstsIntoProloguethat is used when checking a certain detour scheme.Fix Static Initialization Order Fiasco in x64 detours
Defining and initializing complex types in the scope of a translation unit works fine on MSVC. But on Clang this sometimes leads to Static Initialization Order Fiasco. I was not able to reproduce this issue during unit tests, but it was reliably reproducible in my project that was making use of PolyHook2 and Clang compiler. I solved this problem with the standard Construct On First Use Idiom.
Before:
After:
Only the global variables in
x64Detours.cppwere fixed. I did not go hunting for other globals across the project to keep the scope small, but it is worth noting that if such globals do exists, they will lead to errors. Hence, it is left up for a future PR to fix this problem completely.Fix
safe_mem_readandmem_protectThe previous implementations on Linux assumed that the memory slice in question was within the bounds of the memory mapping (memory page) where the starting address is located. But in my tests this was consistently not the case. I had a trampoline that was allocated right at the end of a heap mapping, so it spanned multiple mappings. Because of that, memory protector was not applying its permissions for the entire length of the given memory slice, which eventually led to seg-faults. I fixed this issue by applying memory protections not only for the page with starting address, but for all pages that a given memory slice spans. In practice that was only two, but in the future we should construct unit test cases for memory slices that span more than two mappings.
I suppose the same kind of fix ideally needs to be applied to Windows as well. I guess that we haven't seen such issues on Windows because memory pages on Windows are on average larger that on Linux, where they can be as small as
0x1000.Fix detour unhooking
When creating a detour hook instance, Polyhook2 would save the
m_userTrampVartaken from constructor so that it could write the trampoline address to it during thehook()invocation. But Polyhook2 also attempted to writenullto this trampoline address during unhooking/instance destruction. This in turn required for thism_userTrampVarto have remained valid up to the point of destruction. The first problem with it was that such a requirement was not documented anywhere. The second, and more important problem, is that PolyHook2 shouldn't be managingm_userTrampVarafter it has done hooking in the first place. For instance, during hooking a user might use a variable from local stack to receive the trampoline address and then save that trampoline address elsewhere, like an std::map. But during unhooking Polyhook2 would attempt to writenullto that address that used to be a local stack variable. This in effect silently corrupted the stack which later resulted in seg-faults. Given such use cases, I think that Polyhook2 should not manage m_userTrampVar after it has done its hooking. It should be the responsibility of the calling to assign nullptr to trampoline variable after unhooking if he wishes to do so. In cases where trampolines are stored in a structure like map, this might be unnecessary since the variable holding it will be deallocated anyway. Therefore, I simply commented out this code section inADetour.cpp:Remaining issues
The above-mentioned fixes enabled me to build and run Polyhook2 for Linux on 64-bit using Clang. But there are certain issues on 32-bit that I'm not sure how to solve, and they are not be limited to just Linux. Hence, I was looking forward to your opinion and advice on the following matters.
Issue A: Invalid
makex86JmpNote
This is section is just my speculation from my limited understanding. I might be completely wrong here, so please read this with a grain of salt.
When making a detour, we need to create a jump from trampoline to prologue
jmpToProl. On 32-bit this is done bymakex86Jmp, which usesjmpinstruction with0xE9opcode. But this instruction uses relative addressing mode, and can encode ranges from -2Gb to +2Gb. But what if the distance from trampoline to prologue is larger than 2Gb? That's exactly what happens on my linux tests and I think leads to generation of invalid trampoline jumps.For example, take a look at this run: https://github.com/acidicoala/PolyHook_2_0/actions/runs/17800985282/job/50600426921
At the end of step
Run testswe can see the following logs:When debugging locally, it crashed on the line
Did I correctly identify the cause? If so, what should be the solution? Should
makex86Jmpbe implemented likemakex64MinimumJump, which writes destination address at the bottom of trampoline and uses indirect jmp? If so, should this be a single function likemakeIndirectJmpfor both x86 and x64?Issue B: Invalid instruction.getDestination()
Take a look at this test run: https://github.com/acidicoala/PolyHook_2_0/actions/runs/17800985282/job/50600427010
It fails on the case
Test Disassemblers x86 FF25.x86ASM_FF25is defined as:The failing condition is:
REQUIRE(instruction.getDestination() == 0xaa0000ab);In particular:
Looking into this further, I traced the issue to
ZydisDisassembler::setDisplacementFields. At one point, it reads absolute displacement like this:The
raw.disp.valuevariable is of typeZyanI64, so a signed 64-bit integer. And it already stores value0xffffffffaa0000ab. My guess is that Zydis must have read the 32-bit relative offset as signed, and it extended it using signed extension, because0xAA0000ABis larger than 2GB (larger than 0x7FFFFFFF), which means it's a negative number, so the result is just an extension of that negative number.Now, this is an indirect jump instruction, so we do expect the destination to be
0xAA0000AB. So, how do we deal with this issue? Do we just force the first 32-bits to 0 and call it a day?i.e.
inst.setAbsoluteDisplacement(zydisInst->raw.disp.value & 0x00000000FFFFFFFF);Or am I missing something else here? What do you think a proper solution should be here?
Future discussion
We've come to the end of the PR notes. But there is much more that needs to be done to bring the best out of Polyhook2. While working on this PR, many TODOs have naturally manifested themselves. I will briefly outline them here for the sake of bookkeeping, but they are not included in the scope of this PR. Once all the issues in this PR are fixed, I might start tackling the following features one by one in their own PRs. Each of them require extensive discussion so I won't do them without having planned them out with you first.
Long-term: