refactor(nn): modularize nn.cu into separate .inl files (#133) by m96-chan · Pull Request #137 · m96-chan/PyGPUkit

m96-chan · 2025-12-30T09:23:28Z

Summary

Split monolithic nn.cu (2673 lines) into modular .inl files matching binding structure
Created 9 subdirectories: activation/, norm/, rope/, linear/, attention/, tensor/, embedding/, elementwise/, cast/
Maintains single translation unit compilation to avoid LNK2005 duplicate symbol errors

Files Changed

Directory	Files
`activation/`	gelu.inl, silu.inl, sigmoid.inl, tanh.inl
`norm/`	layernorm.inl, rmsnorm.inl
`rope/`	rope_inplace.inl
`linear/`	linear_bias.inl (includes softmax)
`attention/`	sdpa_causal.inl
`tensor/`	tensor.inl (transpose, reshape, concat, split)
`embedding/`	embedding.inl (lookup, kv_cache ops)
`elementwise/`	inplace.inl (add, mul, copy)
`cast/`	cast.inl (f32<->bf16/f16)

Key Implementation Details

nn.cu as aggregator: Includes all .inl files to compile as single translation unit
PYGPUKIT_IMPLEMENT_NN_KERNELS: Conditional compilation guard for kernel definitions
Namespace handling: All .inl files use using namespace nn; for kernel access
CMakeLists.txt: Simplified to only compile ops/nn/nn.cu

Test plan

Build passes (SM 120a, CUDA 13.1)
238 pytest tests pass
Key NN ops verified: GELU, SiLU, RMSNorm, LayerNorm, Transpose, Softmax
Pre-commit checks pass (Ruff lint, Ruff format, Mypy)

🤖 Generated with Claude Code

Split the monolithic ops_bindings.cpp (~3000 lines) into 39 organized binding files for better maintainability and navigation. Directory structure: - elementwise/: binary, inplace, compare operations - unary/: math, trig operations - reduction/: basic, argmax, softmax operations - tensor/: cast, transpose, reshape, repeat operations - embedding/: lookup, kv_cache operations - nn/: activation, norm, attention, rope operations - gemm/: generic, fp8, nvf4, grouped, int operations - gemv/: generic, fp8, nvf4 operations - sampling/: basic, topk, seed operations - Other: quantize, paged_attention, continuous_batching, audio, cublaslt, moe Changes: - ops_bindings.cpp reduced from ~3000 to ~77 lines (init calls only) - bindings_common.hpp with shared includes and forward declarations - CMakeLists.txt updated with all new source files - Build verified: 238 tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Split the monolithic nn.cu (2673 lines) into modular files matching the binding structure from Issue #131: - activation/: gelu.inl, silu.inl, sigmoid.inl, tanh.inl - norm/: layernorm.inl, rmsnorm.inl - rope/: rope_inplace.inl - linear/: linear_bias.inl (+ softmax) - attention/: sdpa_causal.inl - tensor/: tensor.inl (transpose, reshape, concat, split) - embedding/: embedding.inl (lookup, kv_cache ops) - elementwise/: inplace.inl (add, mul, copy) - cast/: cast.inl (f32<->bf16/f16) Key changes: - nn.cu now aggregates all .inl files as single translation unit - Avoids LNK2005 duplicate symbol errors from CUDA kernels - activation_kernels.cuh uses PYGPUKIT_IMPLEMENT_NN_KERNELS guard - All .inl files use 'using namespace nn;' for kernel access Build: PASS (SM 120a, CUDA 13.1) Tests: 238 passed, 6/6 key NN ops verified 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

m96-chan and others added 2 commits December 30, 2025 16:39

m96-chan merged commit 031a9c6 into main Dec 30, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(nn): modularize nn.cu into separate .inl files (#133)#137

refactor(nn): modularize nn.cu into separate .inl files (#133)#137
m96-chan merged 2 commits intomainfrom
feature/issue-133-nn-modular

m96-chan commented Dec 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

m96-chan commented Dec 30, 2025

Summary

Files Changed

Key Implementation Details

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant