master-thesis-presentation/slides/pim.md

---
layout: figure
figureUrl: /dnn.svg
figureCaption: A fully connected DNN layer
figureFootnoteNumber: 1
---

## Processing-in-Memory
### Applicable Workloads
<hr/>

<Footnotes separator>
  <Footnote :number=1>
  He et al. „Newton: A DRAM-maker’s Accelerator-in-Memory (AiM) Architecture for Machine Learning“, 2020.
</Footnote>
</Footnotes>

<!--
- Workload must be memory-bound

- memory-bound:
  - fully-connected layers
  - layers of recurrent neural networks (RNNs)

- not memory-bound:
  - convolutional layers
  	- data reuse
-->

---
preload: false
clicks: 1
---

## Processing-in-Memory
### Applicable Workloads
<hr/>

<br>

- Convolutional layers have excessive data reuse
- Small filter matrix fits onto on-chip cache

<br>

<Transform :scale="1.4">
  <div class="absolute left-175px top-1px">
    <img src="/cnn_input.svg">
  </div>
  <div v-if="$slidev.nav.clicks === 0" class="absolute left-175px">
    <img src="/cnn_filter.svg">
  </div>
  <div v-if="$slidev.nav.clicks === 1"
  v-motion
  :initial="{ x: 175, y: 0}"
  :enter="{ x: 335, y: 0, transition: { duration: 5000 }}">
    <img src="/cnn_filter.svg">
  </div>
</Transform>

---

## Processing-in-Memory
### Architectures
<hr/>

<br>
<br>

Possible placements of compute logic<sup>1</sup>:

<v-clicks>

- Inside the memory subarray
- In the PSA region near a subarray
- Outside the bank in its peripheral region
- In the I/O region of the memory

</v-clicks>

<br>

<div v-click class="text-xl"> The nearer the computation is to the memory cells, the higher the achievable bandwidth! </div>

<Footnotes separator>
  <Footnote :number=1>
  Sudarshan et al. „A Critical Assessment of DRAM-PIM Architectures - Trends, Challenges and Solutions“, 2022.
</Footnote>
</Footnotes>

---
layout: figure
figureUrl: /hbm-pim.svg
figureCaption: Architecture of PIM-HBM
figureFootnoteNumber: 1
---

## Processing-in-Memory
### Samsung's HBM-PIM
<hr/>

<Footnotes separator>
  <Footnote :number=1>
  Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021.
</Footnote>
</Footnotes>

<!--
- Real-world PIM implementation based on HBM2
- SIMD FPUs are 16-wide, i.e., there are 16 FPU units
- Three execution modes
    - Single-Bank (SB)
    - All-Bank (AB)
    - All-Bank-PIM (AB-PIM)
-->

---
layout: figure
figureUrl: /pu.svg
figureCaption: Architecture of a PIM processing unit
figureFootnoteNumber: 1
---

## Processing-in-Memory
### Samsung's HBM-PIM
<hr/>

<Footnotes separator>
  <Footnote :number=1>
  Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021.
</Footnote>
</Footnotes>

<!--
- Control unit executes RISC instructions
- Two SIMD FPUs
  - ADD
  - MUL

- CRF: 32 32-bit entries (32 instructions)
- GRF: 16 256-bit entries
- SRF: 16 16-bit entries

- One instruction is executed when RD or WR command is issued
-->

---
layout: figure
figureUrl: /gemv.svg
figureCaption: Procedure to perform a (128×8)×(128) GEMV operation
figureFootnoteNumber: 1
---

## Processing-in-Memory
### Samsung's HBM-PIM
<hr/>

<Footnotes separator>
  <Footnote :number=1>
  Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021.
</Footnote>
</Footnotes>

---
layout: figure
figureUrl: /layout.svg
figureCaption: Mapping of the weight matrix onto the memory banks
---

## Processing-in-Memory
### Samsung's HBM-PIM
<hr/>

<!--
- Data layout in program and address mapping must match
-->

---

## Processing-in-Memory
### Research
<hr/>

simulation models needed

research should not only focus on hardware but also explore the software side!

deswegen baue ich einen virutal protoype