Files
master-thesis-presentation/slides/pim.md
2024-04-06 10:01:24 +02:00

189 lines
3.7 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
layout: figure
figureUrl: /dnn.svg
figureCaption: A fully connected DNN layer
figureFootnoteNumber: 1
---
## Processing-in-Memory
### Applicable Workloads
<hr/>
<Footnotes separator>
<Footnote :number=1>
He et al. „Newton: A DRAM-makers Accelerator-in-Memory (AiM) Architecture for Machine Learning“, 2020.
</Footnote>
</Footnotes>
<!--
- Workload must be memory-bound
- memory-bound:
- fully-connected layers
- layers of recurrent neural networks (RNNs)
- not memory-bound:
- convolutional layers
- data reuse
-->
---
preload: false
clicks: 1
---
## Processing-in-Memory
### Applicable Workloads
<hr/>
<br>
- Convolutional layers have excessive data reuse
- Small filter matrix fits onto on-chip cache
<br>
<Transform :scale="1.4">
<div class="absolute left-175px top-1px">
<img src="/cnn_input.svg">
</div>
<div v-if="$slidev.nav.clicks === 0" class="absolute left-175px">
<img src="/cnn_filter.svg">
</div>
<div v-if="$slidev.nav.clicks === 1"
v-motion
:initial="{ x: 175, y: 0}"
:enter="{ x: 335, y: 0, transition: { duration: 5000 }}">
<img src="/cnn_filter.svg">
</div>
</Transform>
---
## Processing-in-Memory
### Architectures
<hr/>
<br>
<br>
Possible placements of compute logic<sup>1</sup>:
<v-clicks>
- Inside the memory subarray
- In the PSA region near a subarray
- Outside the bank in its peripheral region
- In the I/O region of the memory
</v-clicks>
<br>
<div v-click class="text-xl"> The nearer the computation is to the memory cells, the higher the achievable bandwidth! </div>
<Footnotes separator>
<Footnote :number=1>
Sudarshan et al. „A Critical Assessment of DRAM-PIM Architectures - Trends, Challenges and Solutions“, 2022.
</Footnote>
</Footnotes>
---
layout: figure
figureUrl: /hbm-pim.svg
figureCaption: Architecture of PIM-HBM
figureFootnoteNumber: 1
---
## Processing-in-Memory
### Samsung's HBM-PIM
<hr/>
<Footnotes separator>
<Footnote :number=1>
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product“, 2021.
</Footnote>
</Footnotes>
<!--
- Real-world PIM implementation based on HBM2
- SIMD FPUs are 16-wide, i.e., there are 16 FPU units
- Three execution modes
- Single-Bank (SB)
- All-Bank (AB)
- All-Bank-PIM (AB-PIM)
-->
---
layout: figure
figureUrl: /pu.svg
figureCaption: Architecture of a PIM processing unit
figureFootnoteNumber: 1
---
## Processing-in-Memory
### Samsung's HBM-PIM
<hr/>
<Footnotes separator>
<Footnote :number=1>
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product“, 2021.
</Footnote>
</Footnotes>
<!--
- Control unit executes RISC instructions
- Two SIMD FPUs
- ADD
- MUL
- CRF: 32 32-bit entries (32 instructions)
- GRF: 16 256-bit entries
- SRF: 16 16-bit entries
- One instruction is executed when RD or WR command is issued
-->
---
layout: figure
figureUrl: /gemv.svg
figureCaption: Procedure to perform a (128×8)×(128) GEMV operation
figureFootnoteNumber: 1
---
## Processing-in-Memory
### Samsung's HBM-PIM
<hr/>
<Footnotes separator>
<Footnote :number=1>
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product“, 2021.
</Footnote>
</Footnotes>
---
layout: figure
figureUrl: /layout.svg
figureCaption: Mapping of the weight matrix onto the memory banks
---
## Processing-in-Memory
### Samsung's HBM-PIM
<hr/>
<!--
- Data layout in program and address mapping must match
-->
---
## Processing-in-Memory
### Research
<hr/>
simulation models needed
research should not only focus on hardware but also explore the software side!
deswegen baue ich einen virutal protoype