3.6 KiB
3.6 KiB
Processing-in-Memory
Applicable Workloads
- Fully connected layers have a large weight matrix
- Weight matrix does not fit onto on-chip cache
- No data reuse in the matrix
preload: false clicks: 1
Processing-in-Memory
Applicable Workloads
- Fully connected layers have a small filter matrix
- Matrix does fit onto on-chip cache
- Excessive data reuse in the matrix
Processing-in-Memory
Architectures
Possible placements of compute logic1:
- Inside the memory subarray
- In the PSA region near a subarray
- Outside the bank in its peripheral region
- In the I/O region of the memory
The nearer the computation is to the memory cells, the higher the achievable bandwidth!
Sudarshan et al. „A Critical Assessment of DRAM-PIM Architectures - Trends, Challenges and Solutions“, 2022.
layout: figure figureUrl: /hbm-pim.svg figureCaption: Architecture of PIM-HBM figureFootnoteNumber: 1
Processing-in-Memory
Samsung's HBM-PIM
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021.
layout: figure figureUrl: /pu.svg figureCaption: Architecture of a PIM processing unit figureFootnoteNumber: 1
Processing-in-Memory
Samsung's HBM-PIM
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021.
layout: figure figureUrl: /gemv.svg figureCaption: Procedure to perform a (128×8)×(128) GEMV operation figureFootnoteNumber: 1
Processing-in-Memory
Samsung's HBM-PIM
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021.
layout: figure figureUrl: /layout.svg figureCaption: Mapping of the weight matrix onto the memory banks
Processing-in-Memory
Samsung's HBM-PIM
Processing-in-Memory
Research
simulation models needed
research should not only focus on hardware but also explore the software side!
deswegen baue ich einen virutal protoype