2.3 KiB
2.3 KiB
Simulations
Microbenchmarks
-
Vector benchmarks (BLAS level 1)
- VADD:
z = x + y - VMUL:
z = x \cdot y - HAXPY:
z = a \cdot x + y
- VADD:
-
Vector-Matrix benchmarks (BLAS level 2)
- GEMV:
z = A \cdot x - Simple DNN:
f(x) = z = ReLU(A \cdot x)z_{n+1} = f(z_n)- 5 layers in total
- GEMV:
| Level | Vector | GEMV | DNN |
|---|---|---|---|
| X1 | (2M) | (1024 x 4096) | (256 x 256) |
| X2 | (4M) | (2048 x 4096) | (512 x 512) |
| X3 | (8M) | (4096 x 8192) | (1024 x 1024) |
| X4 | (16M) | (4096 x 8192) | (2048 x 2048) |
Operand Dimensions
Simulations
System Configuration
Two simulated systems:
- Generic ARM system
- Infinite compute system
- completely memory bound
Two real GPUs using HBM2:
- AMD RX Vega 56
- NVIDIA V100
layout: figure figureUrl: /speedup_normal.svg figureCaption: Speedups of PIM compared to non-PIM
Simulations
Speedups / Generic ARM System
layout: figure figureUrl: /speedup_inf.svg figureCaption: Speedups of PIM compared to non-PIM
Simulations
Speedups / Infinite Compute System
layout: figure figureUrl: /samsung.svg figureCaption: Speedups of Samsung for VADD and GEMV
Simulations
Speedups / Samsung
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021.