## Processing-in-Memory ### Applicable Workloads
- Fully connected layers have a large weight matrix - Weight matrix does not fit onto on-chip cache - No data reuse in the matrix
--- preload: false clicks: 1 --- ## Processing-in-Memory ### Applicable Workloads
- Convolutional layers have a small filter matrix - Matrix does fit onto on-chip cache - Excessive data reuse in the matrix
--- ## Processing-in-Memory ### Applicable Workloads



### Suitable candidates for PIM:
- Fully connected layers in multilayer perceptrons (MLPs) - Layers in recurrent neural networks (RNNs)
### Less suitable candidates for PIM:
- Convolutional neural networks (CNNs)
--- ## Processing-in-Memory ### Architectures


- Inside the memory subarray - Near the subarray in the PSA output region - Near the bank in its peripheral region - In the I/O region of the memory




The nearer the computation is to the memory cells, the higher the achievable bandwidth!
Sudarshan et al. „A Critical Assessment of DRAM-PIM Architectures - Trends, Challenges and Solutions“, 2022. --- ## Processing-in-Memory ### Samsung's PIM-HBM

- Real-world PIM implementation based on HBM2 - PIM units embedded at the bank level
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021. --- ## Processing-in-Memory ### Samsung's PIM-HBM | Processing Unit

- Two 16-wide 16-bit FPUs - Register files and control unit
#### Instructions: - Control: NOP, JUMP, EXIT - Data: MOV (ReLU), FILL - Arithmetic: ADD, MUL, MAC, MAD Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021. --- ## Processing-in-Memory ### Samsung's PIM-HBM | GEMV Operation
--- ## Processing-in-Memory ### Samsung's PIM-HBM | GEMV Operation
Kang et al. „An FPGA-Based RNN-T Inference Accelerator with PIM-HBM“, 2022. --- ## Processing-in-Memory ### Research




- To analyze the performance gains of PIM, simulations are needed - Research should not only focus on hardware but also explore the programmability
- In the following, a virtual prototype of PIM-HBM is implemented