Add challenge 84: SwiGLU MLP Block (Medium) by claude[bot] · Pull Request #221 · AlphaGPU/leetgpu-challenges

claude · 2026-03-19T04:23:52Z

Summary

Adds challenge 84: SwiGLU MLP Block (Medium difficulty)
Implements the feedforward network from LLaMA, Mistral, Gemma, and most modern LLMs: output = (SiLU(x × W_gate) ⊙ (x × W_up)) × W_down
Distinct from the existing SwiGLU challenge (moved monte carlo challenge from easy to medium #54, which is just the element-wise gate silu(x1)*x2) — this challenge includes all three matrix multiplications and is a self-contained inference building block
Validated on NVIDIA Tesla T4 via run_challenge.py

Why this challenge is interesting

Solvers must implement three chained matrix multiplications with a gated nonlinearity in between. The GPU programming challenge is:

Parallelizing across the two independent gate/up projections
Managing intermediate tensors [M, d_ffn] efficiently (memory bandwidth)
Optionally fusing the elementwise SiLU+multiply with one of the matmuls to reduce memory round-trips
Performance test uses LLaMA-3 8B dimensions: M=512, d_model=4096, d_ffn=14336

Checklist

challenge.html

Starts with <p> (problem description)
<h2> sections: Implementation Requirements, Example, Constraints
First example matches generate_example_test() (identity-like matrices, output ≈ [[0.7311, 0], [0, 0.7311]])
Examples use <pre> consistently (1D/sequential data)
Constraints includes performance test size bullet: M = 512, d_model = 4,096, d_ffn = 14,336
SVG visualization included (dark theme, shows gate/up/SiLU/multiply/down dataflow)

challenge.py

class Challenge inherits ChallengeBase
__init__ with name, atol, rtol, num_gpus, access_tier
reference_impl has assertions on shape, dtype, device
All 6 methods present
generate_functional_test returns 10 cases: edge (1, 2 rows), zero, power-of-2 (16×32, 64×64), non-power-of-2 (30, 100, 255), realistic (128×256, 256×512)
generate_performance_test fits 5× in 16GB VRAM (weights ≈ 670MB × 5 = 3.4GB)

Starter files

All 6 present: .cu, .pytorch.py, .triton.py, .jax.py, .cute.py, .mojo
Exactly 1 parameter description comment per file
CUDA/Mojo: "are device pointers" (medium, no parenthetical)
Python frameworks: "are tensors on the GPU"; JAX also has # return output tensor directly
Starters compile but produce no output

General

Directory: 84_swiglu_mlp_block
Linting passes: pre-commit run --all-files

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix overlapping text in SVG: separate gate/up branch labels, add tensor shape annotations at each stage, color-code converging arrows. Convert example from <pre> to LaTeX bmatrix with proper math notation for each intermediate computation step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

shxjames · 2026-03-27T01:50:05Z

Add challenge 84: SwiGLU MLP Block (Medium)

b87bf1e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude bot requested review from ishaan-arya, kunal-mansukhani and shxjames as code owners March 19, 2026 04:23

kunal-mansukhani approved these changes Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add challenge 84: SwiGLU MLP Block (Medium)#221

Add challenge 84: SwiGLU MLP Block (Medium)#221
claude[bot] wants to merge 2 commits intomainfrom
add-challenge-84-swiglu-mlp-block

claude bot commented Mar 19, 2026

Uh oh!

shxjames commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

claude bot commented Mar 19, 2026

Summary

Why this challenge is interesting

Checklist

challenge.html

challenge.py

Starter files

General

Uh oh!

shxjames commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants