## Simulations
### Microbenchmarks
- Vector benchmarks (BLAS level 1)
- VADD: $z = x + y$
- VMUL: $z = x \cdot y$
- HAXPY: $z = a \cdot x + y$
- Vector-Matrix benchmarks (BLAS level 2)
- GEMV: $z = A \cdot x$
- Simple DNN:
- $f(x) = z = ReLU(A \cdot x)$
- $z_{n+1} = f(z_n)$
- 5 layers in total
| Level | Vector | GEMV | DNN |
|-------|--------|---------------|---------------|
| X1 | (2M) | (1024 x 4096) | (256 x 256) |
| X2 | (4M) | (2048 x 4096) | (512 x 512) |
| X3 | (8M) | (4096 x 8192) | (1024 x 1024) |
| X4 | (16M) | (4096 x 8192) | (2048 x 2048) |
Operand Dimensions
---
## Simulations
### System Configuration
#### Two simulated systems:
- Generic ARM system
- Infinite compute system
- unrealistic frequency of 100 GHz
- completely memory bound
- lower bound of possible speedup
#### Two real GPUs using HBM2:
- AMD RX Vega 56
- NVIDIA Tesla V100
---
layout: figure
figureUrl: /speedup_normal.svg
figureCaption: Speedups of PIM compared to non-PIM
---
## Simulations
### Speedups / Generic ARM System
---
layout: figure
figureUrl: /speedup_inf.svg
figureCaption: Speedups of PIM compared to non-PIM
---
## Simulations
### Speedups / Infinite Compute System
---
layout: figure
figureUrl: /samsung.svg
figureCaption: Speedups of Samsung for VADD and GEMV
---
## Simulations
### Speedups / Samsung
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021.
---
layout: figure
figureUrl: /runtimes_vector.svg
figureCaption: Runtimes for Vector Benchmarks
---
## Simulations
### Runtimes / Vector Benchmarks
---
layout: figure
figureUrl: /runtimes_matrix.svg
figureCaption: Runtimes for Matrix Benchmarks
---
## Simulations
### Runtimes / Matrix Benchmarks