Skip to content

dsl-learn/kernel-to-sol

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SOL-ExecBench Solutions

This repository collects my solutions and writeups for the NVIDIA SOL-ExecBench benchmark.

Goals

  • Build a structured set of SOL-ExecBench solutions.
  • Provide reproducible implementations with clear code comments.
  • Document transferable GPU kernel optimization patterns.

Writeup Structure

Each problem writeup will typically include:

  • Problem understanding and constraints
  • Baseline implementation
  • Optimized versions (e.g., memory access, parallel strategy, fusion)
  • Performance comparison and key takeaways

Status

This repository is a work in progress and will be updated continuously.

Problems

  • 001_attn_bwd: Backward pass for attention softmax, dropout, and value matmul.
  • 002_vae_conv2d: Fused VAE residual block with Conv3x3, GroupNorm, SiLU, and residual addition.

Claude Skills

The .claude/skills/ directory contains model-invoked skills for this project:

Skill Triggers when…
new-kernel Creating a new kernel implementation from a torch reference
b200-tuning Optimizing for B200/Blackwell performance (tiles, TMA, WGMMA, pipeline)
kernel-testing Running test.py, diagnosing failures, or using Triton IR debug flags

Reference

About

SOL-ExecBench Solutions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors