Add challenge 87: Speculative Decoding Verification (Medium) by claude[bot] · Pull Request #226 · AlphaGPU/leetgpu-challenges

claude · 2026-03-26T04:48:11Z

Summary

Adds Challenge 87: Speculative Decoding Verification (Medium difficulty)
Implements the core token acceptance/rejection step used in every modern LLM serving framework (vLLM, TensorRT-LLM, etc.)
Given a batch of B draft sequences with T candidate tokens, the solver must: accept tokens whose target/draft probability ratio passes a uniform random test, resample from the adjusted distribution clamp(q−p, 0) on the first rejection, or sample a bonus token if all T tokens are accepted

GPU learning moments:

The acceptance chain has a sequential dependency (each position depends on the previous), so naive per-position parallelism doesn't work — solvers must think about parallel scan for finding the first rejection
The inverse-CDF resampling step is O(V) and naturally parallelizable across the vocabulary dimension
Batch dimension B provides the main axis for parallelism across sequences

Test cases: 10 functional tests covering single-token edge cases, all-accept (bonus token), forced first-rejection, mixed acceptance, varying V (4–1,000), and a realistic batch. Performance test: B=64, T=8, V=32,768.

Test plan

All 6 starter files present (.cu, .pytorch.py, .triton.py, .jax.py, .cute.py, .mojo)
pre-commit run --all-files passes (Black, isort, flake8, clang-format)
Validated with run_challenge.py --action submit — all functional + performance tests pass on NVIDIA Tesla T4
Checklist in CLAUDE.md verified: HTML starts with <p>, <h2> sections, example matches generate_example_test(), SVG visualization included, performance bullet matches generate_performance_test()

🤖 Generated with Claude Code

Implements the token acceptance/rejection step from speculative decoding: given B draft sequences with T candidate tokens each, determine which tokens to accept (based on min(1, q/p) acceptance probability), resample a replacement from the adjusted distribution clamp(q-p, 0) on the first rejection, or sample a bonus token if all T draft tokens are accepted. Key GPU learning moments: - Sequential acceptance chain with inherent data dependency across positions - Parallel reduction to find first rejection across the batch dimension - O(V) inverse-CDF sampling via prefix sum over vocabulary Performance test: B=64, T=8, V=32,768 (Mistral/LLaMA-2 vocab size) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Redesign SVG with wider boxes to prevent text overflow, add probs row showing p(t) and q(t) values per position. Convert description to notation list + numbered algorithm steps with equations. Convert example from <pre> to LaTeX. Fix \texttt{uniform\_samples} → plain underscore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

shxjames · 2026-03-27T03:09:05Z

claude bot requested review from ishaan-arya, kunal-mansukhani and shxjames as code owners March 26, 2026 04:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add challenge 87: Speculative Decoding Verification (Medium)#226

Add challenge 87: Speculative Decoding Verification (Medium)#226
claude[bot] wants to merge 2 commits intomainfrom
add-challenge-87-speculative-decoding

claude bot commented Mar 26, 2026

Uh oh!

shxjames commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

claude bot commented Mar 26, 2026

Summary

Test plan

Uh oh!

shxjames commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant