189 lines
3.7 KiB
Markdown
189 lines
3.7 KiB
Markdown
---
|
||
layout: figure
|
||
figureUrl: /dnn.svg
|
||
figureCaption: A fully connected DNN layer
|
||
figureFootnoteNumber: 1
|
||
---
|
||
|
||
## Processing-in-Memory
|
||
### Applicable Workloads
|
||
<hr/>
|
||
|
||
<Footnotes separator>
|
||
<Footnote :number=1>
|
||
He et al. „Newton: A DRAM-maker’s Accelerator-in-Memory (AiM) Architecture for Machine Learning“, 2020.
|
||
</Footnote>
|
||
</Footnotes>
|
||
|
||
<!--
|
||
- Workload must be memory-bound
|
||
|
||
- memory-bound:
|
||
- fully-connected layers
|
||
- layers of recurrent neural networks (RNNs)
|
||
|
||
- not memory-bound:
|
||
- convolutional layers
|
||
- data reuse
|
||
-->
|
||
|
||
---
|
||
preload: false
|
||
clicks: 1
|
||
---
|
||
|
||
## Processing-in-Memory
|
||
### Applicable Workloads
|
||
<hr/>
|
||
|
||
<br>
|
||
|
||
- Convolutional layers have excessive data reuse
|
||
- Small filter matrix fits onto on-chip cache
|
||
|
||
<br>
|
||
|
||
<Transform :scale="1.4">
|
||
<div class="absolute left-175px top-1px">
|
||
<img src="/cnn_input.svg">
|
||
</div>
|
||
<div v-if="$slidev.nav.clicks === 0" class="absolute left-175px">
|
||
<img src="/cnn_filter.svg">
|
||
</div>
|
||
<div v-if="$slidev.nav.clicks === 1"
|
||
v-motion
|
||
:initial="{ x: 175, y: 0}"
|
||
:enter="{ x: 335, y: 0, transition: { duration: 5000 }}">
|
||
<img src="/cnn_filter.svg">
|
||
</div>
|
||
</Transform>
|
||
|
||
---
|
||
|
||
## Processing-in-Memory
|
||
### Architectures
|
||
<hr/>
|
||
|
||
<br>
|
||
<br>
|
||
|
||
Possible placements of compute logic<sup>1</sup>:
|
||
|
||
<v-clicks>
|
||
|
||
- Inside the memory subarray
|
||
- In the PSA region near a subarray
|
||
- Outside the bank in its peripheral region
|
||
- In the I/O region of the memory
|
||
|
||
</v-clicks>
|
||
|
||
<br>
|
||
|
||
<div v-click class="text-xl"> The nearer the computation is to the memory cells, the higher the achievable bandwidth! </div>
|
||
|
||
<Footnotes separator>
|
||
<Footnote :number=1>
|
||
Sudarshan et al. „A Critical Assessment of DRAM-PIM Architectures - Trends, Challenges and Solutions“, 2022.
|
||
</Footnote>
|
||
</Footnotes>
|
||
|
||
---
|
||
layout: figure
|
||
figureUrl: /hbm-pim.svg
|
||
figureCaption: Architecture of PIM-HBM
|
||
figureFootnoteNumber: 1
|
||
---
|
||
|
||
## Processing-in-Memory
|
||
### Samsung's HBM-PIM
|
||
<hr/>
|
||
|
||
<Footnotes separator>
|
||
<Footnote :number=1>
|
||
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021.
|
||
</Footnote>
|
||
</Footnotes>
|
||
|
||
<!--
|
||
- Real-world PIM implementation based on HBM2
|
||
- SIMD FPUs are 16-wide, i.e., there are 16 FPU units
|
||
- Three execution modes
|
||
- Single-Bank (SB)
|
||
- All-Bank (AB)
|
||
- All-Bank-PIM (AB-PIM)
|
||
-->
|
||
|
||
---
|
||
layout: figure
|
||
figureUrl: /pu.svg
|
||
figureCaption: Architecture of a PIM processing unit
|
||
figureFootnoteNumber: 1
|
||
---
|
||
|
||
## Processing-in-Memory
|
||
### Samsung's HBM-PIM
|
||
<hr/>
|
||
|
||
<Footnotes separator>
|
||
<Footnote :number=1>
|
||
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021.
|
||
</Footnote>
|
||
</Footnotes>
|
||
|
||
<!--
|
||
- Control unit executes RISC instructions
|
||
- Two SIMD FPUs
|
||
- ADD
|
||
- MUL
|
||
|
||
- CRF: 32 32-bit entries (32 instructions)
|
||
- GRF: 16 256-bit entries
|
||
- SRF: 16 16-bit entries
|
||
|
||
- One instruction is executed when RD or WR command is issued
|
||
-->
|
||
|
||
---
|
||
layout: figure
|
||
figureUrl: /gemv.svg
|
||
figureCaption: Procedure to perform a (128×8)×(128) GEMV operation
|
||
figureFootnoteNumber: 1
|
||
---
|
||
|
||
## Processing-in-Memory
|
||
### Samsung's HBM-PIM
|
||
<hr/>
|
||
|
||
<Footnotes separator>
|
||
<Footnote :number=1>
|
||
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021.
|
||
</Footnote>
|
||
</Footnotes>
|
||
|
||
---
|
||
layout: figure
|
||
figureUrl: /layout.svg
|
||
figureCaption: Mapping of the weight matrix onto the memory banks
|
||
---
|
||
|
||
## Processing-in-Memory
|
||
### Samsung's HBM-PIM
|
||
<hr/>
|
||
|
||
<!--
|
||
- Data layout in program and address mapping must match
|
||
-->
|
||
|
||
---
|
||
|
||
## Processing-in-Memory
|
||
### Research
|
||
<hr/>
|
||
|
||
simulation models needed
|
||
|
||
research should not only focus on hardware but also explore the software side!
|
||
|
||
deswegen baue ich einen virutal protoype
|