Fix x86_64 SIGILL crash: correct TURN patch offsets and use full NOP sled#636
Conversation
…sled Fixes hassio-addons#635. The x86_64 offsets introduced in hassio-addons#631 (0x114D1F, 0x11583D) were 102 bytes off. They were derived from a binary that had md5sum output accidentally prepended during extraction over SSH (cat mixed text and binary output); the 102-byte prefix shifted all derived offsets by the same amount. The corrupted binary was later replaced with a clean extract, but the shifted offsets were retained. The md5 check was added after the fix, so it could not catch that the offsets were still wrong. At the incorrect offsets, the single 0x90 NOP bytes were written into the middle of unrelated instructions (lock xadd, cmp, etc.), producing a malformed instruction stream that crashes with SIGILL when executed. The crash PC 0x115841 reported in hassio-addons#635 is 4 bytes into the broken write at 0x11583D, inside the resulting garbage. aarch64 was not affected: its extraction was clean, offsets are correct, and a 4-byte NOP cleanly replaces the full 'mov w2, #0x1a' instruction. Also: - Use a 5-byte NOP sled on x86_64 to cover the full 'mov edx, 0x1a' instruction. A 1-byte NOP on x86_64 leaves residual immediate bytes (`1a 00 00 00`) that the CPU decodes as garbage instructions (sbb/add on [rax]), causing SIGILL. This matches the aarch64 pattern of NOPing the full instruction. - Add pre-patch and post-patch byte verification so this class of mistake fails the build loudly in the future.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughAdded pre- and post-patch byte verification and adjusted patch offsets/bytes in the Dockerfile that modifies Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
unifi/Dockerfile (1)
64-68: Good post-patch verification.The verification correctly confirms:
- aarch64: "1f2003d5" (ARM64 NOP instruction)
- x86_64: "9090909090" (5-byte NOP sled)
One minor observation: if any
testcommand fails, the error message won't indicate which specific check failed. Since the md5sum check (lines 49-50) guards against library changes, this is acceptable, but for future maintainability you could consider adding briefechostatements before each test to aid debugging.,
♻️ Optional: Add diagnostic output before tests
# Verify patches actually landed + && echo "Verifying patches..." \ && test "$(dd if=${AARCH64_LIB} bs=1 skip=$((0x167214)) count=4 2>/dev/null | od -An -tx1 | tr -d ' \n')" = "1f2003d5" \Or wrap each test in a conditional with descriptive failure messages:
&& { test "$(dd if=${X86_64_LIB} bs=1 skip=$((0x114CB9)) count=5 2>/dev/null | od -An -tx1 | tr -d ' \n')" = "9090909090" \ || { echo "FAIL: x86_64 patch at 0x114CB9 did not apply"; exit 1; }; } \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unifi/Dockerfile` around lines 64 - 68, The post-patch verification tests using the dd/od checks against ${AARCH64_LIB} and ${X86_64_LIB} (the dd calls that compare bytes at offsets 0x167214, 0x167D20, 0x114CB9, 0x1157D7) currently fail silently with no indication which check failed; update each test to emit a short diagnostic or fail message on error (for example print which library/offset is being checked before running, or wrap each test so a failed test prints "FAIL: <arch> patch at 0x<offset> did not apply" and exits non-zero) so maintainers can quickly identify which specific dd/od check failed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@unifi/Dockerfile`:
- Around line 64-68: The post-patch verification tests using the dd/od checks
against ${AARCH64_LIB} and ${X86_64_LIB} (the dd calls that compare bytes at
offsets 0x167214, 0x167D20, 0x114CB9, 0x1157D7) currently fail silently with no
indication which check failed; update each test to emit a short diagnostic or
fail message on error (for example print which library/offset is being checked
before running, or wrap each test so a failed test prints "FAIL: <arch> patch at
0x<offset> did not apply" and exits non-zero) so maintainers can quickly
identify which specific dd/od check failed.
Reviewer suggestion: wrap the pre/post-patch checks in a helper that reports which architecture, site, and offset failed, rather than a silent test exit. Preserves the same guarantees while making build failures self-explanatory.
There was a problem hiding this comment.
Pull request overview
Updates the UniFi add-on build to prevent an amd64 (x86_64) SIGILL crash by correcting the WebRTC TURN patch locations and making the patch safer via byte-level validation.
Changes:
- Fix x86_64 patch offsets to the correct
mov edx, 0x1ainstruction sites. - Replace the previous 1-byte NOP with a full 5-byte NOP sequence on x86_64.
- Add pre-patch and post-patch byte verification to fail the build if offsets/opcodes don’t match expectations.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
Sorry I thought I had commented yesterday, I was able to start and run these changes after adding your repo to home assistant. I didn't see any crashes but I only went as far as starting the container, I didn't attempt to restore a setup. |
|
I saw your comment (you put it on issue 631 which wasn't the right spot but it worked). I pushed a new fix that should fix your issue @charlesomer and it was merged last night. Give it a shot and if there are still issues let me know. Thanks! |
Fixes #635.
Root cause
The x86_64 patch offsets introduced in #631 (
0x114D1F,0x11583D) were 102 bytes off. They were derived from a corrupted extraction oflibubnt_webrtc_jni.sothat hadmd5sumoutput accidentally prepended to the ELF header. The 102-byte text prefix shifted every derived offset by the same amount.The corruption was caught and the committed binary was re-extracted cleanly, but the shifted offsets were retained. The md5 check was added after that point, so it couldn't catch that the offsets were now wrong. aarch64 was unaffected because its extraction was clean.
At the wrong x86_64 offsets, the single
0x90NOP byte was written into the middle of unrelated instructions (lock xadd,cmp, etc.), producing a malformed instruction stream. The crash PC0x115841reported in #635 is exactly 4 bytes past the broken write at0x11583D, inside the resulting garbage.What this PR changes
Correct x86_64 offsets. The actual
mov edx, 0x1apattern in the pristine library is at0x114CB9and0x1157D7:5-byte NOP sled instead of 1-byte.
mov edx, 0x1aon x86_64 is a 5-byte instruction (ba 1a 00 00 00). A 1-byte NOP leaves the immediate bytes1a 00 00 00in place, which the CPU decodes as garbage instructions (sbb al, [rax]+add [rax], al) and faults with SIGILL whenraxdoesn't point to writable memory. The 5-byte NOP cleanly replaces the whole instruction. The subsequentmov rsi, rbxandcall AppendFieldEmptystill execute, withedxretaining its prior value, matching the behavior on aarch64 (which NOPs the 4-bytemov w2, #0x1abut leaves theblcall intact).Pre-patch and post-patch byte verification. The build now explicitly checks that each target offset contains the expected opcode before applying the patch, and that the NOPs are actually written after. This would have caught the original offset bug at build time.
Testing
I was unable to runtime-reproduce the #635 crash locally (requires a populated HAOS install with remote access enabled). I did verify:
ba 1a 00 00 00(themov edx, 0x1aopcode)It would help to have @charlesomer (the #635 reporter) confirm the fix on their amd64 HAOS install before merging.
Apologies
The original extraction bug and shifted offsets were my mistake in #631. Sorry for the followup.
Summary by CodeRabbit