Files
master-thesis-presentation/slides/simulations.md
2024-04-09 16:10:45 +02:00

156 lines
2.4 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
## Simulations
### Microbenchmarks
<hr/>
<br>
<div class="grid grid-cols-2 gap-4">
<div>
- Vector benchmarks (BLAS level 1)
- VADD: $z = x + y$
- VMUL: $z = x \cdot y$
- HAXPY: $z = a \cdot x + y$
- Vector-Matrix benchmarks (BLAS level 2)
- GEMV: $z = A \cdot x$
- Simple DNN:
- $f(x) = z = ReLU(A \cdot x)$
- $z_{n+1} = f(z_n)$
- 5 layers in total
</div>
<div>
<br>
| Level | Vector | GEMV | DNN |
|-------|--------|---------------|---------------|
| X1 | (2M) | (1024 x 4096) | (256 x 256) |
| X2 | (4M) | (2048 x 4096) | (512 x 512) |
| X3 | (8M) | (4096 x 8192) | (1024 x 1024) |
| X4 | (16M) | (4096 x 8192) | (2048 x 2048) |
Operand Dimensions
</div>
</div>
<!--
- operand data significantly larger than on-chip cache
-->
---
## Simulations
### System Configuration
<hr/>
<br>
<br>
<br>
<div class="grid grid-cols-2 gap-4">
<div>
#### Two simulated systems:
<br>
- Generic ARM system
- Infinite compute system
- unrealistic frequency of 100 GHz
- completely memory bound
- lower bound of possible speedup
</div>
<div>
<br>
#### Two real GPUs using HBM2:
<br>
- AMD RX Vega 56
- NVIDIA Tesla V100
</div>
</div>
---
layout: figure
figureUrl: /speedup_normal.svg
figureCaption: Speedups of PIM compared to non-PIM
---
## Simulations
### Speedups / Generic ARM System
<hr/>
---
layout: figure
figureUrl: /speedup_inf.svg
figureCaption: Speedups of PIM compared to non-PIM
---
## Simulations
### Speedups / Infinite Compute System
<hr/>
<!--
- VADD: 12.7x
- GEMV: 9.0x
-->
---
layout: figure
figureUrl: /samsung.svg
figureCaption: Speedups of Samsung for VADD and GEMV
---
## Simulations
### Speedups / Samsung
<hr/>
<Footnotes separator>
<Footnote>
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product“, 2021.
</Footnote>
</Footnotes>
<!--
- GEMV matches good
- ADD shows deviation
-> differences in hardware architecture
- GPU has no speculative execution
-->
---
layout: figure
figureUrl: /runtimes_vector.svg
figureCaption: Runtimes for Vector Benchmarks
---
## Simulations
### Runtimes / Vector Benchmarks
<hr/>
<!--
- Real GPUs use multiple memory channels
- Memory barriers
- Also architectural differences
-->
---
layout: figure
figureUrl: /runtimes_matrix.svg
figureCaption: Runtimes for Matrix Benchmarks
---
## Simulations
### Runtimes / Matrix Benchmarks
<hr/>