Overview
Remarkable success of machine learning (ML) algorithms have led to deployment of industrial ML accelerators throughout cloud, mobile, edge, and wearables, and from computer vision and speech processing to recommendations and graph learning. This course is intended to provide students with a solid understanding of such accelerator system designs, including implications of the various hardware and software components on various costs such as latency, energy, area, throughput, power, storage, and inference accuracy for ML tasks. The course will start off with accelerator architectures for computer vision (convolutional and feed-forward neural networks), culminating into cutting-edge advances such as accelerator systems for federated, on-device, and graph learning and industrial case studies. The course will also cover various important topics in the ML accelerator system design such as execution cost modeling, mapping and hardware exploration, compilers and ISAs for ML accelerators, multi-chip/multi-workload designs, accelerator-aware neural architecture search, and reliability and security of ML accelerators.
Learning Outcomes
This course will help students gain a solid understanding of the fundamentals of designing machine learning accelerators and relevant cutting-edge topics. After doing this course, students will be able to understand:
- The role and importance of machine learning accelerators.
- Implementation of machine learning hardware, including various computational, NoC (network-on-chip) and memory configurations.
- Libraries and frameworks for designing and exploring machine learning accelerators.
- The efficiency challenges brought by large and deep models and the need for tensor compression.
- Architectural implications of compression techniques and impact on inference accuracy.
- Characterization of accelerators under various deployment scenarios (cloud vs. edge).
- Workings of accelerators for training of the machine learning models.
- Differentiate workings of accelerators for different domains such as computer vision, language processing, graph learning, and recommendation systems.
- The need for reconfigurability in architectural components for supporting various workloads and range of specializations, and case studies that enable it.
- The programming challenges brought forth by machine learning accelerators, especially with algorithmic and hardware specializations, like sparsity and mixed-precision computations.
- The need for accelerator-aware neural architecture search.
- Various runtime optimizations for efficient inference on accelerators.
- Recent accelerator systems developed by industry and their trade-offs.
Prerequisites
A background in Computer Architecture (CSE 420 or CSE 520) and Machine Learning (e.g., courses like CSE 571, CSE 574, CSE 575, CSE 576, EEE 598/591: Machine/Deep Learning) will be advantageous.
Assessment
Students will be evaluated based on their participation in class discussions and presentation, quality of paper review, and a topical survey paper. Grade Components:
- Class participation – 20%
- Paper critiques and presentations – 30%
- Topical survey paper – 50%
Class Participation
In each class, preselected students will present the paper. Later, students will discuss the paper in groups, followed by a full-class discussion, and a short lecture to introduce the topic of the next class.
Paper Review
Each week, 2–3 papers will be assigned for review. Each student will write a summary, strengths and weaknesses of the paper, and point out some potential future research direction.
Topical Survey Paper
Students will work on ML-accelerator related survey in groups of 1-3 students. Each group will prepare a survey paper and slides to present their work.
Tentative Schedule
- Week 1 topic: Overview of Machine Learning and Deep Learning Models
- Week 2 topic: Introduction to DNN accelerators (CNNs, GEMMs/MLPs) and their mappings (dataflow orchestration). NOCs and memory designs for DNN accelerators.
- Week 3 topic: Accelerators for NLP (RNN, LSTM, and Transformers) and Neuromorphic architectures
- Week 4 topic: Design Libraries, Programming Languages, and Compilers for Generating Accelerators, Execution Cost Modeling, Mapping Optimizations, and Hardware/Software Design Space Exploration
- Week 5 topic: Compilers for ML Accelerator, Machine Code Generation; Automated cost model generation with machine/deep learning.
- Week 6 topic: Accelerators for Training; Quantization (precision, value similarity)
- Week 7 topic: Sparse ML Accelerators (unstructured sparsity, structured sparsity and model/accelerator codesign)
- Week 8 topic: Accelerator-aware Neural-architecture Search and Accelerator/Model Codesigns
- Week 9 topic: Accelerator Design for Mutliple workloads, Multi-chip Accelerator Designs
- Week 10 topic: ML Accelerators for Near-data Processing and In-memory Computations, Emerging Technologies such as Photonics.
- Week 11 topic: Accelerators for Recommendation Systems and Graph Learning
- Week 12 topic: Full-Stack System Infrastructures and Workload Characterizations for Various Deployment Scenarios
- Week 13 topic: Federated and On-device Learning on ML Accelerators; Runtime Optimizations
- Week 14 topic: Industry Case Studies (Established Startups)
- Week 15 topic: More Industry Case Studies
- Week 16 topic: Reliability and Security of ML Accelerators
Reference Books
- Efficient processing of deep neural networks (Synthesis Lectures on Computer Architecture) Authors: Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. Publisher: Morgan and Claypool ISBN: 978-1-68-173831-4
- Deep Learning Systems: Algorithms, Compilers, and Processors for Large-Scale Production (Synthesis Lectures on Computer Architecture) Authors: Andres Rodriguez Publisher: Morgan and Claypool ISBN: 978-1-68-173966-3 EBook: deeplearningsystems.ai