Add more slides and images

This commit is contained in:
2024-04-03 22:45:37 +02:00
parent fb8c674f2a
commit a7d5b77dcd
19 changed files with 20783 additions and 6 deletions

68
slides/implementation.md Normal file
View File

@@ -0,0 +1,68 @@
---
layout: figure
figureUrl: /dramsys.svg
figureCaption: The PIM-HBM model integrated into DRAMSys
---
## Virtual Prototype
### Processing Units
<hr/>
---
layout: figure-side
figureUrl: /data_structures.svg
figureCaption: The PIM-HBM model integrated into DRAMSys
---
## Virtual Prototype
### Software Library
<hr/>
<br>
<br>
- Software support library written in Rust
- Provides data structures for PIM-HBM
- Adhering special memory layout requirements
- Executes programmed microkernels
---
layout: figure-side
figureUrl: /bare_metal.svg
---
## Virtual Prototype
### Platform
<hr/>
<br>
<br>
- Bare-metal kernel executes on ARM processor model
- Custom page table configuration
- Non-PIM DRAM region mapped as cacheable memory
- PIM DRAM region mapped as non-cacheable memory
---
<hr/>
<br>
<br>
GEMV Microkernel
```asm{none|1-8|9,10|11|all}{lines:true}
MOV GRF_A #0, BANK
MOV GRF_A #1, BANK
MOV GRF_A #2, BANK
MOV GRF_A #3, BANK
MOV GRF_A #4, BANK
MOV GRF_A #5, BANK
MOV GRF_A #6, BANK
MOV GRF_A #7, BANK
MAC(AAM) GRF_B, BANK, GRF_A
JUMP -1, 7
FILL BANK, GRF_B #0
EXIT
```

View File

@@ -1,6 +1,6 @@
---
layout: figure
figureUrl: world_energy.svg
figureUrl: /world_energy.svg
figureCaption: Total energy of computing
figureFootnoteNumber: 1
---
@@ -17,7 +17,7 @@ figureFootnoteNumber: 1
---
layout: figure
figureUrl: gpt.svg
figureUrl: /gpt.svg
figureCaption: Roofline model of GPT revisions
figureFootnoteNumber: 1
---

View File

@@ -1,6 +1,6 @@
---
layout: figure
figureUrl: dnn.svg
figureUrl: /dnn.svg
figureCaption: A fully connected DNN layer
figureFootnoteNumber: 1
---
@@ -37,11 +37,107 @@ Possible placements of compute logic<sup>1</sup>:
<br>
<div v-click class="text-xl"> The nearer the computation is to the memory array, the higher the achievable bandwidth! </div>
<div v-click class="text-xl"> The nearer the computation is to the memory cells, the higher the achievable bandwidth! </div>
<Footnotes separator>
<Footnote :number=1>
Sudarshan et al. „A Critical Assessment of DRAM-PIM Architectures - Trends, Challenges and Solutions“, 2022.
</Footnote>
</Footnotes>
</Footnotes>
---
layout: figure
figureUrl: /hbm-pim.svg
figureCaption: Architecture of PIM-HBM
figureFootnoteNumber: 1
---
## Processing-in-Memory
### Samsung's HBM-PIM
<hr/>
<Footnotes separator>
<Footnote :number=1>
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product“, 2021.
</Footnote>
</Footnotes>
<!--
- Real-world PIM implementation based on HBM2
- SIMD FPUs are 16-wide, i.e., there are 16 FPU units
- Three execution modes
- Single-Bank (SB)
- All-Bank (AB)
- All-Bank-PIM (AB-PIM)
-->
---
layout: figure
figureUrl: /pu.svg
figureCaption: Architecture of a PIM processing unit
figureFootnoteNumber: 1
---
## Processing-in-Memory
### Samsung's HBM-PIM
<hr/>
<Footnotes separator>
<Footnote :number=1>
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product“, 2021.
</Footnote>
</Footnotes>
<!--
- Control unit executes RISC instructions
- Two SIMD FPUs
- ADD
- MUL
- CRF: 32 32-bit entries (32 instructions)
- GRF: 16 256-bit entries
- SRF: 16 16-bit entries
- One instruction is executed when RD or WR command is issued
-->
---
layout: figure
figureUrl: /gemv.svg
figureCaption: Procedure to perform a (128×8)×(128) GEMV operation
figureFootnoteNumber: 1
---
## Processing-in-Memory
### Samsung's HBM-PIM
<hr/>
<Footnotes separator>
<Footnote :number=1>
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product“, 2021.
</Footnote>
</Footnotes>
---
layout: figure
figureUrl: /layout.svg
figureCaption: Mapping of the weight matrix onto the memory banks
---
## Processing-in-Memory
### Samsung's HBM-PIM
<hr/>
<!--
- Data layout in program and address mapping must match
-->
---
## Processing-in-Memory
### Research
<hr/>
simulation models needed
research should not only focus on hardware but also explore the software side!

38
slides/simulations.md Normal file
View File

@@ -0,0 +1,38 @@
## Simulations
### Microbenchmarks
<hr/>
<br>
<br>
<div class="grid grid-cols-2 gap-4">
<div>
- Vector benchmarks (BLAS level 1)
- VADD: $z = x + y$
- VMUL: $z = x \cdot y$
- HAXPY: $z = a \cdot x + y$
- Vector-Matrix benchmarks (BLAS level 2)
- GEMV: $z = A \cdot x$
- DNN Layer: $z = ReLU(A \cdot x)$
</div>
<div>
| Level | Vector | GEMV | DNN |
|-------|--------|---------------|---------------|
| X1 | (2M) | (1024 x 4096) | (256 x 256) |
| X2 | (4M) | (2048 x 4096) | (512 x 512) |
| X3 | (8M) | (4096 x 8192) | (1024 x 1024) |
| X4 | (16M) | (4096 x 8192) | (2048 x 2048) |
</div>
</div>
---
layout: figure
figureUrl: /dnn.svg
figureCaption: A fully connected DNN layer
figureFootnoteNumber: 1
---