ReArch Group Paper Reading List

Seminars

Spring 2021

Date	Paper Title	Presenter	Notes
03.01	Training for Multi-resolution Inference Using Reusable Quantization Terms	Cong Guo
03.08	Toward Efficient Interactions between Python and Native Libraries	Yuxian Qiu
03.15	SpAtten: Efficient Natural Language Processing	Yue Guan
03.22	X-Stream: Edge-centric Graph Processing using Streaming Partitions	Zhihui Zhang
03.29	Loop Nested Optimization, Polyhedral Model and Micro-2020 Best Paper (Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data)	Zihan Liu	Slides
04.12	Defensive Approximation: Securing CNNs using Approximate Computing	Yakai Wang	Related Work
05.17	Commutative Data Reordering: A New Technique to Reduce Data Movement Energy on Sparse Inference Workloads	Yangjie Zhou	ISCA 2020
05.31	Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture	Zhihui Zhang	VLDB 2021
06.07	DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification	Yue Guan	NeurIPS 2021

Summer 2021

Date	Paper Title	Presenter	Notes
07.14	AKG: automatic kernel generation for neural processing units using polyhedral transformations (PLDI 2021)	Yuxian Qiu	Slides
07.21	Floating-Point Format and Quantization for Deep Learning Computation	Cong Guo
07.28	P-OPT: Practical Optimal Cache Replacement for Graph Analytics	Yangjie Zhou	Slides
08.04	Rubik: A Hierarchical Architecture for Efficient Graph Neural Network Training	Zhihui Zhang
08.11	A Useful Tool CKA: Similarity of Neural Network Representations Revisited and It's application: Uncovering How Neural Network Representations Vary with Width and Depth	Zhengyi Li	Slides
08.18	Ansor: Generating High-Performance Tensor Programs for Deep Learning	Zihan Liu	Slides

Fall 2021

Date	Paper Title	Presenter	Notes
10.11	Adaptive numeric type for DNN quantization	Cong Guo
10.18	Compiling Graph Applications for GPUs with GraphIt	Yangjie Zhou	Slides
11.01	TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation	Zihan Liu	Slides
11.08	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity	Zhengyi Li	Slides (code: zdea)
11.22	Dynamic Tensor Rematerialization Checkmate: Breaking The Memory Wall with Optimal Tensor Rematerialization	Yue Guan	Slides Slides
11.29	GraphPulse: An Event-Driven Hardware Accelerator for Asynchronous Graph Processing	Zhihui Zhang	Presentation
12.06	CheckFreq: Frequent, Fine-Grained DNN Checkpointing	Guandong Lu	Slides
12.13	PipeDream: generalized pipeline parallelism for DNN training	Runzhe Chen	Slides
12.20	Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters	Yakai Wang	Slides

Spring 2022

Date	Paper Title	Presenter	Notes
3.10	Speculation Attack: Meltdown, Spectre, Pinned-Loads	Zihan Liu	Slides
3.24	SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute	Yue Guan
3.31	ROLLER: Fast and Efficient Tensor Compilation for Deep Learning	Yijia Diao	Link
4.07	Adaptable Register File Organization for Vector Processors	Zhihui Zhang
4.14	CORTEX: A COMPILER FOR RECURSIVE DEEP LEARNING MODELS	Yangjie Zhou	Slides
4.21	Zero-Knowledge Succinct Non-Interactive Argument of Knowledge	Shuwen Lu	Slides
5.05	Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning	Runzhe Chen	Slides

Fall 2022

Date	Paper Title	Presenter	Notes
9.20	ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization	Cong Guo	Slides
9.27	X-cache: a modular architecture for domain-specific caches	Zihan Liu	Slides
10.18	Automatically Discovering ML Optimizations	Yangjie Zhou	Slides
11.8	Privacy Preserving Machine Learning--inference	Zhengyi Li	Slides
11.15	Dynamic Tensor Compilers	Yijia Diao	Slides

Spring 2023

Date	Paper Title	Presenter	Notes
3.30	JUNO: Algorithm-Hardware Mapping Co-design for Efficient\Approximate Nearest Neighbour Search in High Dimensional Space	Zihan Liu
4.6	LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale; SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models; Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning; GPTQ: ACCURATE POST-TRAINING QUANTIZATION FOR GENERATIVE PRE-TRAINED; SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot; P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks; Offsite-Tuning: Transfer Learning without Full Model; LoRA: Low-Rank Adaptation of Large Language Models	Jiaming Tang	Slides
4.13	SMG: Towards Efficient Execution and Adequate Encryption of Private DNN Inference via Secure Micro-Graph	Zhengyi Li	Slides
5.04	FlexGen and FlashAttention	Yue Guan	Slides
5.11	Multi-Tenant DNN Inference: Spatial GPU Sharing	Yijia Diao	Slides
5.25	Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion	Yangjie Zhou	TVMConf Video

Fall 2023

Date	Paper Title	Presenter	Notes
9.21	GPU Warp Scheduling and Control Code	Weiming Hu	Slides
9.28	Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity	Yue Guan	Slides
10.12	Shared SIMD unit: Occamy, Two Out-of-Order Commit CPU: NOREBA and Orinoco	Zihan Liu	Slides
10.19	Multitasking on GPU: Preemption	Yijia Diao	Slides
10.26	SecretFlow-SPU: A Performant and User-Friendly Framework for Privacy-Preserving Machine Learning	Zhengyi Li	Slides
11.09	Efficient large-scale language model training on GPU clusters using megatron-LM; ZeRO: Memory Optimizations Toward Training Trillion Parameter Models; ZeRO-Offload: Democratizing Billion-Scale Model Training; ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning	Jiale Xu	Slides
11.16	ATOM: LOW-BIT QUANTIZATION FOR EFFICIENT AND ACCURATE LLM SERVING	Haoyan Zhang	Slides
11.23	DFU: Dataflow Processing Unit	Renyang Guan	Slides
12.07	WaveScalar;Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads	Gonglin Xu	Slides
12.14	Fast Inference from Transformers via Speculative Decoding;SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification;LLMCad: Fast and Scalable On-device Large Language Model Inference	Changming Yu	Slides
12.28	A Framework for Fine-Grained Synchronization of Dependent GPU Kernels;Fast Fine-Grained Global Synchronization on GPUs;AutoScratch: ML-Optimized Cache Management for Inference-Oriented GPUs	Ziyu Huang	Slides

Spring 2024

Date	Paper Title	Presenter	Notes
03.14	LLM Attack and Defense	Zhengyi Li	Slides
03.21	Transparent GPU Sharing in Container Clouds for Deep Learning Workloads	Yijia Diao	Link
03.28	DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving	Shuwen Lu	Slide
05.09	8-bit Transformer Inference and Fine-tuning for Edge Accelerators	Weiming Hu	Slide

Fall 2024

Date	Paper Title	Presenter	Notes
07.26	TEE-SGX Introduction	Zhengyi Li	Slides
08.15	Accelerating mixture of experts model Inference	Shuwen Lu	Slides
08.22	Accelerating Stable Diffusion-based Video Generation	Yuge Cheng	Slides
09.05	TCP: A Tensor Contraction Processor for AI Workloads	Weiming Hu	Slides
10.11	dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving	Haoyan Zhang	Slides
10.18	Dataflow Chips and a Compiler	Renyang Guan	Slides
11.01	ByteCheckpoint A Unified Checkpointing System for Large Foundation Model Development	Gonglin Xu	Slides
11.15	Opensora architecture and its computational reuse	Haosong Liu	Slides
11.22	LLM Quantization	Wenxuan Miao	Slides
11.29	Survey: Large-scale 3DGS	Zheng Liu	Slides
12.05	Enhance Efficiency: 3D Gaussian Splatting for Speed and Memory Optimization	Xiaotong Huang	Slides
12.13	Stealing Part of a Production Language Model	Zhengyi Li	Slides
12.20	Communication-Compute Co-Optimization in Distributed Training	Yijia Diao	Slides
12.27	Byte Latent Transformer: Patches Scale Better Than Tokens	Shuyong Bao	Slides
01.03	Gemini Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators	Renyang Guan	Slides

Spring 2025

Date	Paper Title	Presenter	Notes
01.17	HybridFlow: A Flexible and Efficient RLHF Framework	Gonglin Xu	Slides
02.28	Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion	Ziyu Huang	Slides
03.14	Auto-Vectorization in Compilers: Leveraging SIMD for High�Performance Computing	Shihan Fang	Slides
03.21	Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention	Xing Ma	Slides
03.28	Taming Load Balancing in Distributed LLM Training	Jiale Xu	Slides
04.11	SparseAttn for Video Generation	Yulin Sun	Slides
04.18	Towards End-to-End Optimization of LLM-based Applications with Ayo	Jiawei Huang	Slides
04.25	FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models	Xiaotong Huang	Slides
05.23	EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models	Yuge Chen	Slides
05.30	Modern Programming Model for Writing Kernels on GPUs	Xinhao Luo	Slides
06.06	Prefix Sharing LLM Inference and SGLang	Yitong Ding	Slides
06.13	Ditto: Accelerating Diffusion Model via Temporal Value Similarity CMC: Video Transformer Acceleration via CODEC Assisted Matrix Condensing	Haosong Liu	Slides
06.21-25	ISCA conference notes		Link
06.27	Speeding up LLM and GEMM	Wenxuan Miao	Slides
07.25	Modeling and Simulation	Weiming Hu	Slides

Fall 2025

Date	Paper Title	Presenter	Notes
09.18	Fine Grained Comm Comp Overlap	Ziyu Huang	Slides
09.26	A Sample-Free Compilation Framework for Efficient Dynamic Tensor Computation	Yangjie Zhou	Slides
10.17	Hydra: Harnessing Expert Popularity for Efficient�Mixture-of-Expert Inference on Chiplet System	Gonglin Xu	Slides
10.24	LLM for cuda codegen	Ma Xing	Slides
11.14	A Survey of 3DGS SLAM	Xiaotong Huang	Slides
11.21	Survey of Vision-Language-Action (VLA) Models	Zheng Liu	Slides
11.28	Modern DSLs Compile Workflow	Xinhao Luo	Slides

Spring 2026

Date	Paper Title	Presenter	Notes
03.06	Breaking the Layer Barrier: Remodeling Private Transformer Inference with Hybrid CKKS and MPC	Zhengyi Li	Slides
03.13	Where LLMs Fit, and Where We Still Matter	Yijia Diao	Slides1, Paper2
03.27	GPU Profiling for Optimization	Wu Sun	Slides

DNN Architecture

Link

Deep Learning Compiler

List Contributed by Zihan Liu

Past Architecture Papers

List Contributed by Jingwen Leng

MoE Related Papers

List Contributed by Shuwen Lu

Quantization, Data Type, Compression, Acceleration

List Contributed by Weiming Hu

LLM for Coding Papers

List Contributed by Ma Xing and Yangjie Zhou

Reading List From Other Groups

University of Sydney, Future System Architecture Lab

Name		Name	Last commit message	Last commit date
Latest commit History 223 Commits
Architecture		Architecture
Dataflow		Dataflow
DeepLearningCompiler		DeepLearningCompiler
EfficientLLMServing		EfficientLLMServing
LLM_Quantization		LLM_Quantization
LLMforCoding		LLMforCoding
MoE		MoE
NumericalBehavior		NumericalBehavior
PrivacyPreservingML		PrivacyPreservingML
Slides		Slides
distributed_overlap		distributed_overlap
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReArch Group Paper Reading List

Seminars

Spring 2021

Summer 2021

Fall 2021

Spring 2022

Fall 2022

Spring 2023

Fall 2023

Spring 2024

Fall 2024

Spring 2025

Fall 2025

Spring 2026

DNN Architecture

Deep Learning Compiler

Past Architecture Papers

MoE Related Papers

Quantization, Data Type, Compression, Acceleration

LLM for Coding Papers

Reading List From Other Groups

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ReArch Group Paper Reading List

Seminars

Spring 2021

Summer 2021

Fall 2021

Spring 2022

Fall 2022

Spring 2023

Fall 2023

Spring 2024

Fall 2024

Spring 2025

Fall 2025

Spring 2026

DNN Architecture

Deep Learning Compiler

Past Architecture Papers

MoE Related Papers

Quantization, Data Type, Compression, Acceleration

LLM for Coding Papers

Reading List From Other Groups

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages