Add probe QK hook worker for Apple Silicon backend#1
Open
tburleyinfo wants to merge 56 commits intoIBM:mainfrom
Open
Add probe QK hook worker for Apple Silicon backend#1tburleyinfo wants to merge 56 commits intoIBM:mainfrom
tburleyinfo wants to merge 56 commits intoIBM:mainfrom
Conversation
Signed-off-by: Timothy Burley <34224160+tburleyinfo@users.noreply.github.com>
Signed-off-by: Timothy Burley <34224160+tburleyinfo@users.noreply.github.com>
Signed-off-by: Timothy Burley <34224160+tburleyinfo@users.noreply.github.com>
Signed-off-by: Timothy Burley <34224160+tburleyinfo@users.noreply.github.com>
Switch the Metal Granite probe worker to the current self_attn-based capture path and keep the capture data in qkv artifacts that are normalized back into the analyzer's expected q/k view. Key resolution details: - keep the Metal worker at the self_attn boundary and compute raw x plus projected q/k/v inside the recording path - preserve per-sample batch packets in the Metal cache format so batch analysis stays aligned with the non-Metal worker - restore proper worker teardown via HookLLM disposal so repeated hooked runs in the same Python process do not inherit stale wrapped self_attn modules - retain the Metal-specific Granite config override in the demo and keep the Metal head set in a dedicated config file - keep analyzer-side debug support and qkv normalization in place - add updated housing-analogy documentation plus Graphviz flowcharts for both the housing view and the worker control-flow view Signed-off-by: Timothy Burley <34224160+tburleyinfo@users.noreply.github.com>
Delete the Metal-specific Granite attention-tracker config and stop selecting it from the demo so Granite uses the standard config again. Signed-off-by: Timothy Burley <34224160+tburleyinfo@users.noreply.github.com>
Expand and align the inline housing-analogy comments in the Metal worker so they match the current self_attn capture flow, notebook/archive model, and teardown behavior. Signed-off-by: Timothy Burley <34224160+tburleyinfo@users.noreply.github.com>
Move sandbox assets into scripts, markdowns, and visualizations, refresh the Metal worker diagrams, and add more detailed inline comments to the Metal hook installation flow. Signed-off-by: Timothy Burley <34224160+tburleyinfo@users.noreply.github.com>
Add repo ignore rules for Python build artifacts and expand the Metal worker comments around hook installation, Q/K/V capture, cache flushing, and execution flow. Signed-off-by: Timothy Burley <34224160+tburleyinfo@users.noreply.github.com>
Add comments explaining how Metal qkv artifacts are normalized into the legacy qk cache view before the attention analyzer consumes them. Signed-off-by: Timothy Burley <34224160+tburleyinfo@users.noreply.github.com>
Update the attention tracker notebook to set max_model_len=2048 so HookLLM initialization does not fail under MLX auto memory mode on Apple Silicon. Signed-off-by: Timothy Burley <34224160+tburleyinfo@users.noreply.github.com>
Signed-off-by: Timothy Burley <34224160+tburleyinfo@users.noreply.github.com>
Signed-off-by: Timothy Burley <34224160+tburleyinfo@users.noreply.github.com>
Signed-off-by: Timothy Burley <34224160+tburleyinfo@users.noreply.github.com>
Signed-off-by: Timothy Burley <34224160+tburleyinfo@users.noreply.github.com>
Signed-off-by: Timothy Burley <34224160+tburleyinfo@users.noreply.github.com>
This changes the GPU probe worker matching logic to support models whose attention module is exposed as model.layers.<i>.self_attn instead of model.layers.<i>.self_attn.attn.
This keeps the existing tuple-based attention hook path and adds a fallback for Granite-style self_attn modules that compute q and k internally. In that fallback, the worker hooks q_proj and k_proj directly and stores their outputs under the same qk cache structure.
This skips the legacy tuple-based q/k capture path when a matched attention module does not expose q and k in its forward-hook input tuple. Granite-style self_attn modules can then rely on the q_proj/k_proj fallback without crashing.
This updates the actsteer, attention-tracker, and core-reranker Colab notebooks to default to RedHatAI/granite-3.1-2b-instruct-quantized.w4a16 and adds a temporary activation-steering config for that checkpoint.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Details
Validation