Skip to content

Add lazy loading support for code objects#27

Draft
mawad-amd wants to merge 8 commits intomainfrom
muhaawad/lazy-addfile
Draft

Add lazy loading support for code objects#27
mawad-amd wants to merge 8 commits intomainfrom
muhaawad/lazy-addfile

Conversation

@mawad-amd
Copy link
Copy Markdown
Member

Summary

  • Add addFile(path, lazy=true) mode that indexes kernel names from ELF symbols without disassembling.
  • Add ensureKernelLoaded(name) to disassemble a single kernel on demand when it is first dispatched.
  • Enables downstream tools (e.g. Nexus) to defer the expensive disassembly step until a kernel is actually needed.

🤖 Generated with Claude Code

mawad-amd and others added 2 commits April 1, 2026 00:23
When addFile is called with lazy=true, only kernel names are indexed
from the ELF symbol table (.symtab) — no disassembly runs. Disassembly
and source mapping happen on demand when a kernel is first accessed via
getKernel(), getKernelLines(), getInstructionsForLine(), or
getKernelArguments().

Changes:
- kernelDB::addFile: new lazy parameter; indexes via getKernelNamesFromElf
- kernelDB::getKernelNamesFromElf: reads .symtab for STT_FUNC/STT_AMDGPU_HSA_KERNEL symbols
- kernelDB::ensureKernelLoaded: on-demand disassembly + source mapping for lazy kernels
- getKernels() includes lazy kernel names; hasKernel() checks both maps
- Python API: KernelDB(lazy=True), add_file(), scan_code_object(), has_kernel()
- pybind11: expose addFile, scanCodeObject, hasKernel
- pyproject.toml: fix license field format
- Add examples/07_lazy_load/

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ensureKernelLoaded() previously called scanCodeObject() which
disassembles every kernel in the .hsaco, defeating lazy loading for
code objects with many kernels. Now it calls scanCodeObjectForKernel()
which uses parseDisassemblyForKernel() to skip non-target kernels,
and scopes DWARF/argument processing to the single loaded kernel.
The code object is not marked as fully scanned so other kernels from
the same .hsaco can still be loaded on demand.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mawad-amd mawad-amd requested a review from rwvo April 1, 2026 08:46
mawad-amd and others added 6 commits April 1, 2026 04:16
- Add getDisassemblyForSymbol() using --disassemble-symbols instead of
  full -d, so scanCodeObjectForKernel() only disassembles the requested
  kernel
- Fix shell quoting in invokeProgram() to handle kernel names with
  parentheses or other shell metacharacters
- Fix trailing space in disassembly_params ("-d " -> "-d")
- Accept STT_OBJECT .kd symbols in getKernelNamesFromElf()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The demangled kernel name can contain commas (e.g. C++ template args)
which --disassemble-symbols interprets as multiple symbol names.
Store the raw/mangled ELF symbol name in lazy_kernels_ and pass it
to scanCodeObjectForKernel/getDisassemblyForSymbol instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The targetKernel parameter passed from scanCodeObjectForKernel is
the raw (mangled) ELF symbol name, but parseDisassemblyForKernel
compares it against demangled canonical names extracted from the
disassembly output.  This mismatch caused every kernel to be
skipped, resulting in 0 parsed instructions.

Normalize targetKernel through demangleName + getKernelName at the
top of both parseDisassemblyForKernel and processKernelsWithAddressMap
so comparisons use the same canonical form on both sides.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
getLineType dereferenced line.begin() without checking if the
string was empty, causing a segfault when getBlockMarkers looped
past EOF on disassembly output that contained headers but no
kernel sections.

Also fix getBlockMarkers to return early on EOF instead of
looping forever, and replace the assert(mit != markers.end())
with a graceful skip so release builds don't crash.

Add debug logging to ensureKernelLoaded and scanCodeObjectForKernel
for diagnosing lazy loading issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The disassembly parser searched for '//' comment tokens by
incrementing an index without bounds checking, causing a segfault
when the disassembly line format didn't include address comments.

Add bounds checks to all while-loops that search for '//' tokens
in both getBlockMarkers and parseDisassemblyForKernel. Guard
stoull calls with try/catch to handle unexpected formats.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- getBlockMarkers: replace assert(tmp.size()==2) with defensive check
  for _cbranch_ token format; guard against uninitialized iterator
  when no KERNEL line has been seen yet
- getLineType: guard *(--it) against single-character strings
- parseDisassemblyForKernel: replace abort() with graceful skip when
  branch instruction seen without active kernel; guard null
  current_block before addInstruction call

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant