Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions kernelgen/.claude/skills/build_nkipykernelgen/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
name: build_nkipykernelgen
description: Rebuild NKIPyKernelGen (C++ passes and Python package)
user-invocable: true
---

## Usage

`/build_nkipykernelgen`

## Instructions

Run the build script. Use `bash` (not `sh`) since it uses `source`. Use a timeout of 300000ms.

```bash
bash .claude/skills/build_nkipykernelgen/scripts/build.sh
```

Note: Run this from the NKIPyKernelGen repo root.

## Important

`pip install -e .` builds BOTH the C++ passes (nkipy-opt binary) AND the Python package in one step. There is NO need to run cmake separately — the pyproject.toml build system handles the full C++ compilation via cmake internally.
12 changes: 12 additions & 0 deletions kernelgen/.claude/skills/build_nkipykernelgen/scripts/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/bin/bash
# Rebuild NKIPyKernelGen (C++ passes and Python package).
set -e

# Derive repo root from script location: scripts/ -> build_nkipykernelgen/ -> skills/ -> .claude/ -> repo root
REPO_ROOT="$(cd "$(dirname "$0")/../../../.." && pwd)"

cd "$REPO_ROOT"

echo "=== Rebuilding NKIPyKernelGen ==="
pip install -e . 2>&1 | tail -5
echo "=== Build complete ==="
121 changes: 121 additions & 0 deletions kernelgen/.claude/skills/debug_nisa_ir/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
name: debug_nisa_ir
description: Debug NISA MLIR that fails BIRSim. Creates a debug case under tests/debug/ with buggy.mlir, kernel.py, iterative fixes, and a README proposing compiler pass changes.
user-invocable: true
---

## Usage

`/debug_nisa_ir <bug_name> [kernel.py path] [buggy NISA MLIR path or inline]`

- `bug_name`: Short snake_case name for the debug case (e.g., `rope_partition_oob`)
- `kernel.py path`: Path to the Python source that was fed into `nkipy_opt`. If omitted, ask the user.
- `buggy NISA MLIR`: Path to the `.mlir` file that `nkipy_opt` produced, or the user may paste it inline. If omitted, ask the user.

## Instructions

You are debugging a NISA-level MLIR kernel that `nkipy_opt` generated but that fails BIRSim verification or produces incorrect numerical results. Follow this systematic workflow.

### Step 1: Set up the debug case directory

Create `tests/debug/<bug_name>/` with:

```
tests/debug/<bug_name>/
kernel.py # Copy of the input Python kernel
buggy.mlir # The failing NISA MLIR from nkipy_opt
README.md # Will be populated in Step 6
```

Copy the user-provided `kernel.py` and `buggy.mlir` into this directory. Ensure `kernel.py` contains a function whose name matches the `sym_name` in the MLIR (this is required by `run_sim.py`).

### Step 2: Reproduce the failure

Run the buggy MLIR through BIRSim:

```bash
cd tests/debug && source ./run.sh <bug_name>/buggy.mlir
```

Record the exact error output. Common failure modes:
- **BIR verification error**: `Invalid access of N partitions starting at partition M` or `Access pattern out of bounds`
- **BIRSim runtime error**: `NCC_ISIM*` errors (e.g., uninitialized PSUM read)
- **Numerical mismatch**: `SIMULATION FAILED (max_diff=...)` -- BIRSim runs but output doesn't match kernel.py

### Step 3: Analyze the bug

Read the MLIR carefully and identify the root cause. Common patterns:

1. **Multi-partition SBUF with vector engine**: `tensor_tensor_arith` (engine=vector) reading from a loop-indexed partition of a multi-partition SBUF tensor. The vector engine processes all 128 partitions simultaneously and cannot address partition N selectively.

2. **Wrong reshape/transpose lowering**: Column-by-column transposes that conflate head and head_dim dimensions. Often manifests as `<128|2>` tile on a dim of size 2 (OOB), or silent numerical corruption.

3. **Missing accumulate flags**: Matmul K-loops without `psum_accumulate_flags`, causing PSUM overwrite instead of accumulate.

4. **SBUF OOM**: Too many live SBUF tensors. Check if intermediates can be fused or freed earlier.

Focus on understanding:
- Which MLIR lines are problematic (cite line numbers)
- What the pass *intended* to generate vs what it actually generated
- Why the hardware rejects it (BIR rules violated)

### Step 4: Create iterative fixes

For each fix attempt, create a new MLIR file:

```
fix_<number>_<what_was_fixed>.mlir
```

For example:
- `fix_01_fuse_rope_elementwise.mlir`
- `fix_02_reshape_head_granularity.mlir`

Edit the MLIR by hand to correct the identified issue. Then run:

```bash
cd tests/debug && source ./run.sh <bug_name>/fix_01_<description>.mlir
```

If it still fails, analyze the new error, create another fix file, and iterate. Keep each attempt as a separate file so the progression is visible.

### Step 5: Verify the final fix

The last `fix_*.mlir` should produce:

```
BIRSim PASSED
SIMULATION PASSED
```

Confirm that the numerical output matches `kernel.py` within tolerance (atol=1e-2, rtol=1e-2).

### Step 6: Write the README

Create `tests/debug/<bug_name>/README.md` documenting:

1. **Overview**: One paragraph summarizing what `buggy.mlir` is (which kernel, what it does) and what goes wrong.

2. **How to reproduce**: The exact `source ../run.sh` commands for buggy and fixed versions.

3. **Bug analysis**: For each bug found:
- **Symptom**: The exact error message
- **Location in MLIR**: Line numbers and what the code does
- **What happens**: Why the hardware rejects it or produces wrong results
- **Fix**: What was changed in the MLIR (with code snippets)

4. **Root cause summary**: Table mapping each bug to the compiler pass responsible and whether it causes a compilation error or silent corruption.

5. **Proposed compiler pass fixes**: For each bug, describe:
- Which pass to fix (e.g., `simplify-linalg`, `linalg-to-nisa`, tiling)
- The root cause *in the pass* (not just the MLIR symptom)
- A concrete proposed change (pseudocode or description of the algorithm change)

Use the format from existing debug cases (see `tests/debug/qwen3_layer/README.md` for reference).

### Tips

- The debug harness (`run.sh` / `run_sim.py`) automatically sets up the NKI environment, generates random inputs (seed=42), compiles to NEFF with BIRSim, and compares against `kernel.py`.
- Artifacts (NEFF, BIR) are written to `artifacts_<stem>/` next to each MLIR file (git-ignored).
- When editing MLIR, keep changes minimal and targeted. Change only the ops/loops related to the bug.
- If you're unsure which pass generated a problematic pattern, check the pass pipeline in `nkipy_opt` or ask the user.
28 changes: 28 additions & 0 deletions kernelgen/.claude/skills/run_nkipykernelgen_tests/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
name: run_nkipykernelgen_tests
description: Run NKIPyKernelGen tests (without rebuilding)
user-invocable: true
---

## Usage

`/run_nkipykernelgen_tests [scope]`

Where `scope` is: `all` (default), `passes`, `e2e`, or a specific path like `passes/infer_layout` or `e2e/nkipy_tests`.

## Instructions

1. Run the script at `~/.claude/skills/run_nkipykernelgen_tests/scripts/run_tests.sh` with the requested scope as the argument. Use `bash` to invoke it (not `sh`) since it uses `source`. Use a timeout of 600000ms.

```bash
bash .claude/skills/run_nkipykernelgen_tests/scripts/run_tests.sh <scope>
```

Note: Run this from the NKIPyKernelGen repo root.

2. The script saves full test output to `/tmp/nkipykernelgen_test_results.txt`. After the script finishes, use the Read tool to read that file for the complete results. This avoids context window issues with long test output.

3. When reporting results, summarize:
- Total passed/failed/xfailed/xpassed/skipped counts
- List any unexpected failures (FAILED, not XFAIL)
- Note any XPASS (unexpected passes) that indicate xfail markers should be removed
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash
# Run NKIPyKernelGen tests with proper environment setup.
# Usage: run_tests.sh [scope]
# scope: all (default), passes, e2e, or a specific path like passes/infer_layout

SCOPE="${1:-all}"
RESULTS_FILE="/tmp/nkipykernelgen_test_results.txt"

# Derive repo root from script location: scripts/ -> run_nkipykernelgen_tests/ -> skills/ -> .claude/ -> repo root
REPO_ROOT="$(cd "$(dirname "$0")/../../../.." && pwd)"

cd "$REPO_ROOT"

# Run tests, capturing full output to file
echo "=== Running tests (scope: $SCOPE) ==="
echo "Results will be saved to: $RESULTS_FILE"

case "$SCOPE" in
all)
python -m pytest tests/ -v --tb=short 2>&1 | tee "$RESULTS_FILE"
;;
passes)
python -m pytest tests/passes/ -v --tb=short 2>&1 | tee "$RESULTS_FILE"
;;
e2e)
python -m pytest tests/e2e/ -v --tb=short 2>&1 | tee "$RESULTS_FILE"
;;
*)
python -m pytest "tests/$SCOPE" -v --tb=short 2>&1 | tee "$RESULTS_FILE"
;;
esac
EXIT_CODE=${PIPESTATUS[0]}

echo ""
echo "=== Full results saved to: $RESULTS_FILE ==="
exit $EXIT_CODE
47 changes: 47 additions & 0 deletions kernelgen/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Override parent nkipy/.gitignore's `lib/` rule so MLIR C++ sources in
# mlir/lib/ are tracked (the parent rule is aimed at Python venv lib/ dirs).
!mlir/lib/
!mlir/lib/**

# Python
__pycache__/
*.py[cod]
*.so

# Distribution / packaging
build/
dist/
*.egg-info/
.eggs/
*.whl

# Built MLIR bindings (generated during build)
nkipy_kernelgen/_mlir/

# Virtual environments
venv/
.env

# Testing
.pytest_cache/
.coverage
tests/**/outputs/
tests/**/artifacts/

# IDE
.vscode/
.idea/

# OS
.DS_Store
Thumbs.db

# Logs
*.log

# LLVM lit test outputs
.lit_test_times.txt
Output/

# Compiler Explorer (cloned repo)
compiler_explorer/compiler-explorer/
Loading