Skip to content

Add challenge 87: Speculative Decoding Verification (Medium)#226

Open
claude[bot] wants to merge 2 commits intomainfrom
add-challenge-87-speculative-decoding
Open

Add challenge 87: Speculative Decoding Verification (Medium)#226
claude[bot] wants to merge 2 commits intomainfrom
add-challenge-87-speculative-decoding

Conversation

@claude
Copy link
Copy Markdown
Contributor

@claude claude bot commented Mar 26, 2026

Summary

  • Adds Challenge 87: Speculative Decoding Verification (Medium difficulty)
  • Implements the core token acceptance/rejection step used in every modern LLM serving framework (vLLM, TensorRT-LLM, etc.)
  • Given a batch of B draft sequences with T candidate tokens, the solver must: accept tokens whose target/draft probability ratio passes a uniform random test, resample from the adjusted distribution clamp(q−p, 0) on the first rejection, or sample a bonus token if all T tokens are accepted

GPU learning moments:

  • The acceptance chain has a sequential dependency (each position depends on the previous), so naive per-position parallelism doesn't work — solvers must think about parallel scan for finding the first rejection
  • The inverse-CDF resampling step is O(V) and naturally parallelizable across the vocabulary dimension
  • Batch dimension B provides the main axis for parallelism across sequences

Test cases: 10 functional tests covering single-token edge cases, all-accept (bonus token), forced first-rejection, mixed acceptance, varying V (4–1,000), and a realistic batch. Performance test: B=64, T=8, V=32,768.

Test plan

  • All 6 starter files present (.cu, .pytorch.py, .triton.py, .jax.py, .cute.py, .mojo)
  • pre-commit run --all-files passes (Black, isort, flake8, clang-format)
  • Validated with run_challenge.py --action submit — all functional + performance tests pass on NVIDIA Tesla T4
  • Checklist in CLAUDE.md verified: HTML starts with <p>, <h2> sections, example matches generate_example_test(), SVG visualization included, performance bullet matches generate_performance_test()

🤖 Generated with Claude Code

Implements the token acceptance/rejection step from speculative decoding:
given B draft sequences with T candidate tokens each, determine which
tokens to accept (based on min(1, q/p) acceptance probability), resample
a replacement from the adjusted distribution clamp(q-p, 0) on the first
rejection, or sample a bonus token if all T draft tokens are accepted.

Key GPU learning moments:
- Sequential acceptance chain with inherent data dependency across positions
- Parallel reduction to find first rejection across the batch dimension
- O(V) inverse-CDF sampling via prefix sum over vocabulary

Performance test: B=64, T=8, V=32,768 (Mistral/LLaMA-2 vocab size)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Redesign SVG with wider boxes to prevent text overflow, add probs row
showing p(t) and q(t) values per position. Convert description to
notation list + numbered algorithm steps with equations. Convert example
from <pre> to LaTeX. Fix \texttt{uniform\_samples} → plain underscore.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@shxjames
Copy link
Copy Markdown
Contributor

Screenshot 2026-03-26 at 23 08 07 Screenshot 2026-03-26 at 23 08 13 Screenshot 2026-03-26 at 23 08 17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant