ReArch Group Paper Reading List
Date
Paper Title
Presenter
Notes
07.26
TEE-SGX Introduction
Zhengyi Li
Slides
08.15
Accelerating mixture of experts model Inference
Shuwen Lu
Slides
08.22
Accelerating Stable Diffusion-based Video Generation
Yuge Cheng
Slides
09.05
TCP: A Tensor Contraction Processor for AI Workloads
Weiming Hu
Slides
10.11
dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving
Haoyan Zhang
Slides
10.18
Dataflow Chips and a Compiler
Renyang Guan
Slides
11.01
ByteCheckpoint A Unified Checkpointing System for Large Foundation Model Development
Gonglin Xu
Slides
11.15
Opensora architecture and its computational reuse
Haosong Liu
Slides
11.22
LLM Quantization
Wenxuan Miao
Slides
11.29
Survey: Large-scale 3DGS
Zheng Liu
Slides
12.05
Enhance Efficiency: 3D Gaussian Splatting for Speed and Memory Optimization
Xiaotong Huang
Slides
12.13
Stealing Part of a Production Language Model
Zhengyi Li
Slides
12.20
Communication-Compute Co-Optimization in Distributed Training
Yijia Diao
Slides
12.27
Byte Latent Transformer: Patches Scale Better Than Tokens
Shuyong Bao
Slides
01.03
Gemini Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators
Renyang Guan
Slides
Date
Paper Title
Presenter
Notes
01.17
HybridFlow: A Flexible and Efficient RLHF Framework
Gonglin Xu
Slides
02.28
Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion
Ziyu Huang
Slides
03.14
Auto-Vectorization in Compilers: Leveraging SIMD for High�Performance Computing
Shihan Fang
Slides
03.21
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Xing Ma
Slides
03.28
Taming Load Balancing in Distributed LLM Training
Jiale Xu
Slides
04.11
SparseAttn for Video Generation
Yulin Sun
Slides
04.18
Towards End-to-End Optimization of LLM-based Applications with Ayo
Jiawei Huang
Slides
04.25
FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models
Xiaotong Huang
Slides
05.23
EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models
Yuge Chen
Slides
05.30
Modern Programming Model for Writing Kernels on GPUs
Xinhao Luo
Slides
06.06
Prefix Sharing LLM Inference and SGLang
Yitong Ding
Slides
06.13
Ditto: Accelerating Diffusion Model via Temporal Value Similarity CMC: Video Transformer Acceleration via CODEC Assisted Matrix Condensing
Haosong Liu
Slides
06.21-25
ISCA conference notes
Link
06.27
Speeding up LLM and GEMM
Wenxuan Miao
Slides
07.25
Modeling and Simulation
Weiming Hu
Slides
Date
Paper Title
Presenter
Notes
09.18
Fine Grained Comm Comp Overlap
Ziyu Huang
Slides
09.26
A Sample-Free Compilation Framework for Efficient Dynamic Tensor Computation
Yangjie Zhou
Slides
10.17
Hydra: Harnessing Expert Popularity for Efficient�Mixture-of-Expert Inference on Chiplet System
Gonglin Xu
Slides
10.24
LLM for cuda codegen
Ma Xing
Slides
11.14
A Survey of 3DGS SLAM
Xiaotong Huang
Slides
11.21
Survey of Vision-Language-Action (VLA) Models
Zheng Liu
Slides
11.28
Modern DSLs Compile Workflow
Xinhao Luo
Slides
Date
Paper Title
Presenter
Notes
03.06
Breaking the Layer Barrier: Remodeling Private Transformer Inference with Hybrid CKKS and MPC
Zhengyi Li
Slides
03.13
Where LLMs Fit, and Where We Still Matter
Yijia Diao
Slides1 , Paper2
03.27
GPU Profiling for Optimization
Wu Sun
Slides
Link
List Contributed by Zihan Liu
List Contributed by Jingwen Leng
List Contributed by Shuwen Lu
Quantization, Data Type, Compression, Acceleration
List Contributed by Weiming Hu
List Contributed by Ma Xing and Yangjie Zhou
Reading List From Other Groups