Files
2024-04-09 16:13:37 +02:00

5.2 KiB
Raw Permalink Blame History

Processing-in-Memory

Applicable Workloads


  • Fully connected layers have a large weight matrix
    • Weight matrix does not fit onto on-chip cache
    • No data reuse in the matrix

preload: false clicks: 1

Processing-in-Memory

Applicable Workloads


  • Convolutional layers have a small filter matrix
    • Matrix does fit onto on-chip cache
    • Excessive data reuse in the matrix


Processing-in-Memory

Applicable Workloads





Suitable candidates for PIM:


  • Fully connected layers in multilayer perceptrons (MLPs)
  • Layers in recurrent neural networks (RNNs)

Less suitable candidates for PIM:


  • Convolutional neural networks (CNNs)

Processing-in-Memory

Architectures




  • Inside the memory subarray
  • Near the subarray in the PSA output region
  • Near the bank in its peripheral region
  • In the I/O region of the memory




The nearer the computation is to the memory cells, the higher the achievable bandwidth!
Sudarshan et al. „A Critical Assessment of DRAM-PIM Architectures - Trends, Challenges and Solutions“, 2022.

Processing-in-Memory

Samsung's PIM-HBM



  • Real-world PIM implementation based on HBM2
  • PIM units embedded at the bank level

Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product“, 2021.

Processing-in-Memory

Samsung's PIM-HBM | Processing Unit



  • Two 16-wide 16-bit FPUs
  • Register files and control unit

Instructions:

  • Control: NOP, JUMP, EXIT
  • Data: MOV (ReLU), FILL
  • Arithmetic: ADD, MUL, MAC, MAD
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product“, 2021.

Processing-in-Memory

Samsung's PIM-HBM | GEMV Operation



Processing-in-Memory

Samsung's PIM-HBM | GEMV Operation


Kang et al. „An FPGA-Based RNN-T Inference Accelerator with PIM-HBM“, 2022.

Processing-in-Memory

Research






  • To analyze the performance gains of PIM, simulations are needed
  • Research should not only focus on hardware but also explore the programmability

  • In the following, a virtual prototype of PIM-HBM is implemented