Draft
Conversation
When addFile is called with lazy=true, only kernel names are indexed from the ELF symbol table (.symtab) — no disassembly runs. Disassembly and source mapping happen on demand when a kernel is first accessed via getKernel(), getKernelLines(), getInstructionsForLine(), or getKernelArguments(). Changes: - kernelDB::addFile: new lazy parameter; indexes via getKernelNamesFromElf - kernelDB::getKernelNamesFromElf: reads .symtab for STT_FUNC/STT_AMDGPU_HSA_KERNEL symbols - kernelDB::ensureKernelLoaded: on-demand disassembly + source mapping for lazy kernels - getKernels() includes lazy kernel names; hasKernel() checks both maps - Python API: KernelDB(lazy=True), add_file(), scan_code_object(), has_kernel() - pybind11: expose addFile, scanCodeObject, hasKernel - pyproject.toml: fix license field format - Add examples/07_lazy_load/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ensureKernelLoaded() previously called scanCodeObject() which disassembles every kernel in the .hsaco, defeating lazy loading for code objects with many kernels. Now it calls scanCodeObjectForKernel() which uses parseDisassemblyForKernel() to skip non-target kernels, and scopes DWARF/argument processing to the single loaded kernel. The code object is not marked as fully scanned so other kernels from the same .hsaco can still be loaded on demand. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add getDisassemblyForSymbol() using --disassemble-symbols instead of
full -d, so scanCodeObjectForKernel() only disassembles the requested
kernel
- Fix shell quoting in invokeProgram() to handle kernel names with
parentheses or other shell metacharacters
- Fix trailing space in disassembly_params ("-d " -> "-d")
- Accept STT_OBJECT .kd symbols in getKernelNamesFromElf()
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The demangled kernel name can contain commas (e.g. C++ template args) which --disassemble-symbols interprets as multiple symbol names. Store the raw/mangled ELF symbol name in lazy_kernels_ and pass it to scanCodeObjectForKernel/getDisassemblyForSymbol instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The targetKernel parameter passed from scanCodeObjectForKernel is the raw (mangled) ELF symbol name, but parseDisassemblyForKernel compares it against demangled canonical names extracted from the disassembly output. This mismatch caused every kernel to be skipped, resulting in 0 parsed instructions. Normalize targetKernel through demangleName + getKernelName at the top of both parseDisassemblyForKernel and processKernelsWithAddressMap so comparisons use the same canonical form on both sides. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
getLineType dereferenced line.begin() without checking if the string was empty, causing a segfault when getBlockMarkers looped past EOF on disassembly output that contained headers but no kernel sections. Also fix getBlockMarkers to return early on EOF instead of looping forever, and replace the assert(mit != markers.end()) with a graceful skip so release builds don't crash. Add debug logging to ensureKernelLoaded and scanCodeObjectForKernel for diagnosing lazy loading issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The disassembly parser searched for '//' comment tokens by incrementing an index without bounds checking, causing a segfault when the disassembly line format didn't include address comments. Add bounds checks to all while-loops that search for '//' tokens in both getBlockMarkers and parseDisassemblyForKernel. Guard stoull calls with try/catch to handle unexpected formats. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- getBlockMarkers: replace assert(tmp.size()==2) with defensive check for _cbranch_ token format; guard against uninitialized iterator when no KERNEL line has been seen yet - getLineType: guard *(--it) against single-character strings - parseDisassemblyForKernel: replace abort() with graceful skip when branch instruction seen without active kernel; guard null current_block before addInstruction call Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
addFile(path, lazy=true)mode that indexes kernel names from ELF symbols without disassembling.ensureKernelLoaded(name)to disassemble a single kernel on demand when it is first dispatched.🤖 Generated with Claude Code