Skip to content

[lldb][Mach-O] Allow "process metadata" LC_NOTE to supply registers (#144627) #10923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jasonmolenda
Copy link

The "process metadata" LC_NOTE allows for thread IDs to be specified in a Mach-O corefile. This extends the JSON recognzied in that LC_NOTE to allow for additional registers to be supplied on a per-thread basis.

The registers included in a Mach-O corefile LC_THREAD load command can only be one of the register flavors that the kernel (xnu) defines in <mach/arm/thread_status.h> for arm64 -- the general purpose registers, floating point registers, exception registers.

JTAG style corefile producers may have access to many additional registers beyond these that EL0 programs typically use, for instance TCR_EL1 on AArch64, and people developing low level code need access to these registers. This patch defines a format for including these registers for any thread.

The JSON in "process metadata" is a dictionary that must have a threads key. The value is an array of entries, one per LC_THREAD in the Mach-O corefile. The number of entries must match the LC_THREADs so they can be correctly associated.

Each thread's dictionary must have two keys, sets, and registers. sets is an array of register set names. If a register set name matches one from the LC_THREAD core registers, any registers that are defined will be added to that register set. e.g. metadata can add a register to the "General Purpose Registers" set that lldb shows users.

registers is an array of dictionaries, one per register. Each register must have the keys name, value, bitsize, and set. It may provide additional keys like alt-name, that
DynamicRegisterInfo::SetRegisterInfo recognizes.

This sets + registers formatting is the same that is used by the target.process.python-os-plugin-path script interface uses, both are parsed by DynamicRegisterInfo. The one addition is that in this LC_NOTE metadata, each register must also have a value field, with the value provided in big-endian base 10, as usual with JSON.

In RegisterContextUnifiedCore, I combine the register sets & registers from the LC_THREAD for a specific thread, and the metadata sets & registers for that thread from the LC_NOTE. Even if no LC_NOTE is present, this class ingests the LC_THREAD register contexts and reformats it to its internal stores before returning itself as the RegisterContex, instead of shortcutting and returning the core's native RegisterContext. I could have gone either way with that, but in the end I decided if the code is correct, we should live on it always.

I added a test where we process save-core to create a userland corefile, then use a utility "add-lcnote" to strip the existing "process metadata" LC_NOTE that lldb put in it, and adds a new one from a JSON string.

rdar://74358787


(cherry picked from commit a64db49)

…lvm#144627)

The "process metadata" LC_NOTE allows for thread IDs to be specified in
a Mach-O corefile. This extends the JSON recognzied in that LC_NOTE to
allow for additional registers to be supplied on a per-thread basis.

The registers included in a Mach-O corefile LC_THREAD load command can
only be one of the register flavors that the kernel (xnu) defines in
<mach/arm/thread_status.h> for arm64 -- the general purpose registers,
floating point registers, exception registers.

JTAG style corefile producers may have access to many additional
registers beyond these that EL0 programs typically use, for instance
TCR_EL1 on AArch64, and people developing low level code need access to
these registers. This patch defines a format for including these
registers for any thread.

The JSON in "process metadata" is a dictionary that must have a
`threads` key. The value is an array of entries, one per LC_THREAD in
the Mach-O corefile. The number of entries must match the LC_THREADs so
they can be correctly associated.

Each thread's dictionary must have two keys, `sets`, and `registers`.
`sets` is an array of register set names. If a register set name matches
one from the LC_THREAD core registers, any registers that are defined
will be added to that register set. e.g. metadata can add a register to
the "General Purpose Registers" set that lldb shows users.

`registers` is an array of dictionaries, one per register. Each register
must have the keys `name`, `value`, `bitsize`, and `set`. It may provide
additional keys like `alt-name`, that
`DynamicRegisterInfo::SetRegisterInfo` recognizes.

This `sets` + `registers` formatting is the same that is used by the
`target.process.python-os-plugin-path` script interface uses, both are
parsed by `DynamicRegisterInfo`. The one addition is that in this
LC_NOTE metadata, each register must also have a `value` field, with the
value provided in big-endian base 10, as usual with JSON.

In RegisterContextUnifiedCore, I combine the register sets & registers
from the LC_THREAD for a specific thread, and the metadata sets &
registers for that thread from the LC_NOTE. Even if no LC_NOTE is
present, this class ingests the LC_THREAD register contexts and
reformats it to its internal stores before returning itself as the
RegisterContex, instead of shortcutting and returning the core's native
RegisterContext. I could have gone either way with that, but in the end
I decided if the code is correct, we should live on it always.

I added a test where we process save-core to create a userland corefile,
then use a utility "add-lcnote" to strip the existing "process metadata"
LC_NOTE that lldb put in it, and adds a new one from a JSON string.

rdar://74358787

---------

Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
(cherry picked from commit a64db49)
@jasonmolenda jasonmolenda requested a review from a team as a code owner June 28, 2025 04:05
@jasonmolenda
Copy link
Author

@swift-ci test

@jasonmolenda
Copy link
Author

jasonmolenda commented Jun 28, 2025

macOS PR test bot failed with

[2025-06-28T06:01:41.630Z] Unresolved Tests (3):
[2025-06-28T06:01:41.630Z]   lldb-api :: commands/target/stop-hooks/on-core-load/TestStopHookOnCoreLoad.py
[2025-06-28T06:01:41.630Z]   lldb-api :: macosx/lc-note/additional-registers/TestMetadataRegisters.py
[2025-06-28T06:01:41.630Z]   lldb-api :: macosx/lc-note/kern-ver-str/TestKernVerStrLCNOTE.py

That's a strong enough signal that I'll build this branch locally and debug it; unlikely to be a bot flake issue. May be that my PR depends on a separate patch on llvm.org that isn't on this branch; these failures did not happen on llvm.org main.

@jasonmolenda
Copy link
Author

Ah, interesting. TestStopHookOnCoreLoad is failing because it has an intel corefile and RegisterContextUnifiedCore tries to detect which registers are actually available in the corefile, and only copies those into the unified register file. But the x86 Mach-O corefile RegisterContext has a bug where it assumes every register set is present in the corefile and will present incorrect values past the initial GPRs.

registers in RegisterContextUnifiedCore.
@jasonmolenda
Copy link
Author

@swift-ci test

PR tests pass on the CI; reverting so I can land upstream and cp.

This reverts commit 0211430.
reading, and one bug in the new RegisterContextUnifiedCore class.

The PR I landed a few days ago to allow Mach-O corefiles to augment
their registers with additional per-thread registers in metadata exposed
a few bugs in the x86_64 corefile reader when running under different CI
environments. It also showed a bug in my RegisterContextUnifiedCore
class where I wasn't properly handling lookups of unknown registers
(e.g. the LLDB_GENERIC_RA when debugging an intel target).

The Mach-O x86_64 corefile support would say that it had fpu & exc
registers available in every corefile, regardless of whether they were
actually present. It would only read the bytes for the first register
flavor in the LC_THREAD, the GPRs, but it read them incorrectly, so
sometimes you got more register context than you'd expect. The LC_THREAD
register context specifies a flavor and the number of uint32_t words;
the ObjectFileMachO method would read that number of uint64_t's,
exceeding the GPR register space, but it was followed by FPU and then
EXC register space so it didn't crash. If you had a corefile with GPR
and EXC register bytes, it would be written into the GPR and then FPU
register areas, with zeroes filling out the rest of the context.

(cherry picked from commit e94c609)
@jasonmolenda
Copy link
Author

@swift-ci test

@jasonmolenda
Copy link
Author

Discussed this with Jonas offline. This is changing how lldb parses all Mach-O corefiles, and we're relatively late in the release/6.2 schedule so if there is an edge case regression we're unlikely to hear a report of it until even later. We're going to take this PR in a future release.

@jasonmolenda jasonmolenda deleted the cp/r74358787-lc-note-to-supply-additional-registers-in-corefile branch July 2, 2025 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants