Draft
Conversation
Samplex provides statistical instruction-level profiling via rocprofv3 PC sampling. Supports both host_trap (time-based, MI200+) and stochastic (cycle-based with stall reasons, MI300+) methods. Includes CLI, Python API, MCP server, and unit tests following the existing IntelliKit tool patterns (metrix/linex/etc). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Simplify samplex to only use stochastic (hardware-based) sampling. Stochastic provides stall reasons, instruction types, wave counts, and zero sampling skid. Requires MI300+ which is our target anyway. Removes --method and --unit CLI flags. All fields (wave_issued, stall_reason, instruction_type, wave_count) are now always present. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Low intervals (256-1024) cause rocprofv3 to silently drop all samples due to overhead. 65536 gives ~76K samples on typical workloads while keeping overhead reasonable. Users can lower to 4096 for more samples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Support both PC sampling methods: - stochastic (default): cycle-accurate, MI300+, provides stall reasons - host_trap: time-based, MI200+, broader GPU support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…EADME - Example script now uses format_text_output() instead of manually iterating over API results and printing custom output - Moved the full example output from the main README into the example README where it belongs - Main README now links to the example directory Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Match the pattern used by linex and metrix examples: iterate over results objects and print fields directly. The API data objects have clear names (kernel.name, kernel.issued_pct, hotspot.opcode, etc.) that are readable by both humans and LLMs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The expected output section was still showing the old format_text_output format. Updated to match what the script actually prints: opcodes with [issued=, stalled=] tags, no global breakdown section. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Stall reasons are already available on each InstructionHotspot via stall_reasons dict. The kernel-level aggregation was redundant processing — let consumers iterate and aggregate if needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Example now shows empty_instruction_count (holes) when > 0 - MCP server now returns instruction (full text), stall_reasons, instruction_types, and empty_instruction_count per kernel Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- README.md: add samplex to tools table, MCP config, install examples - AGENTS.md: add samplex to tool descriptions, build commands, testing, package layout, MCP servers, skills, CI sections - install/tools/install.sh: add samplex to ALL_TOOLS - install/skills/install.sh: add samplex to TOOLS - intellikit-ci-test.yml: add samplex to change detection and test matrix - intellikit-pytest.yml: add samplex to change detection and test matrix Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
rocprofv3's --kernel-include-regex only affects counter-collection and thread-trace data, not PC sampling. Move filtering to the API layer where it's applied as a regex match against kernel names after samples are grouped by dispatch ID. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
samplex), Python API (from samplex import Samplex), and MCP server (samplex-mcp)What it does
Samplex answers: "Where is my kernel stuck and why?"
Given a GPU application, it runs rocprofv3 PC sampling and reports:
Files
samplex/pyproject.tomlsamplex/src/samplex/api.pySamplex.sample())samplex/src/samplex/profiler/rocprof_wrapper.pysamplex/src/samplex/cli/main.pyprofileandlist-configscommandssamplex/src/samplex/mcp/server.pypc_sampletoolsamplex/src/samplex/logger.pysamplex/skill/SKILL.mdsamplex/tests/unit/test_api.pysamplex/tests/unit/test_rocprof_wrapper.pyTest plan
torch.mmGEMM kernel🤖 Generated with Claude Code