Skip to content
View hkimw's full-sized avatar
📣
Focusing: 404 - Distractions not found 🐛💻
📣
Focusing: 404 - Distractions not found 🐛💻

Block or report hkimw

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
hkimw/README.md

What I work on

Area Focus
AI Hardware FPGA NPU, systolic array datapaths, custom ISA, memory hierarchy
LLM Inference Transformer decode bottlenecks, KV-cache, GEMM/GEMV, quantization
Systems C/C++, Python runtimes, queues, profiling, reproducible benchmarks
Research Writing Paper notes, architecture diagrams, experiment logs, technical reports

Featured work

INT4 quantized FPGA NPU for LLM inference on Xilinx KV260.

Stars Forks Issues Last commit

Lightweight LLM inference engine with INT4 / INT16 quantization.

Stars Forks Issues Last commit

Tauri + React inference visualization and trace inspector.

Stars Forks Issues Last commit

Research notebook and ISA documentation for the pccx project family.

Stars Forks Issues Last commit


Current direction

I am building a research-oriented AI systems portfolio around edge LLM inference, where model graphs meet memory bandwidth, runtime queues, quantization, and hardware limits.

  • Main stack: SystemVerilog / FPGA / C++ / Python / TypeScript
  • Main research theme: memory-bound Transformer inference
  • Main project family: pccx / pccx-lab / llm-bottleneck-lab
  • Homepage: technical notebook + project portfolio + paper notes

Tech stack


About this website repository

This repository also contains my personal website and research notebook, built with Docusaurus and customized as a quiet, text-first engineer notebook.

npm run start
npm run build

The website is used as a technical portfolio for AI systems, FPGA acceleration, LLM inference experiments, research notes, and project documentation.

Pinned Loading

  1. Driver-drowsiness-detection Driver-drowsiness-detection Public

    Driver Drowsiness Detection with YOLOv8 and Facial Features Combat driver fatigue with this deep learning-powered system that utilizes YOLOv8 to detect open and closed eyes, accurately assessing dr…

    Python 8 1

  2. pccx-FPGA-NPU-LLM-kv260 pccx-FPGA-NPU-LLM-kv260 Public

    Bare-metal FPGA implementation of the pccx NPU for LLM inference on Kria KV260: SystemVerilog RTL, W4A8 quantization, GEMM/GEMV datapaths, KV-cache scheduling, and driver code.

    SystemVerilog 7 1

  3. llm-bottleneck-lab llm-bottleneck-lab Public

    Measure and visualize why LLM inference is slow: bottleneck analysis, model dissection, KV-cache, GEMM/GEMV, quantization, and memory-bound decoding.

    Makefile 1 1

  4. pccx pccx Public

    PCCX is an open NPU architecture for memory-bound Transformer inference on edge FPGAs, focused on GEMM/GEMV, KV-cache, W4A8 quantization, and custom ISA scheduling.

    SystemVerilog 2 2

  5. pccx-lab pccx-lab Public

    Visual pre-RTL bottleneck profiler for pccx NPU: Rust/Tauri GUI, UVM co-simulation, trace reports, and LLM-assisted testbench generation.

    TypeScript 1