Hussain Nazary hussainnazary2

I work on LLM inference at the engine and runtime level, focusing on performance, memory efficiency, and predictable behavior in production environments.

My experience includes optimizing inference across CPU and GPU backends, with hands-on use of CUDA, cuBLAS, cuBLASLt, and custom CUDA kernels for transformer workloads. I focus on practical improvements such as quantization-aware execution, efficient KV-cache management, memory allocation strategies, and optimized execution paths tailored to specific model architectures and hardware constraints.

I build and adapt local, cloud-independent inference systems, customizing runtimes for different model families and deployment requirements rather than relying on fixed abstractions. The goal is stable, efficient inference that makes full use of available hardware under real operational conditions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hussain Nazary hussainnazary2

Achievements

Achievements

Highlights

Organizations

Block or report hussainnazary2

Pinned Loading

Uh oh!