Files
master-thesis-presentation/slides/simulations.md
2024-04-07 22:41:59 +02:00

2.3 KiB
Raw Blame History

Simulations

Microbenchmarks



  • Vector benchmarks (BLAS level 1)

    • VADD: z = x + y
    • VMUL: z = x \cdot y
    • HAXPY: z = a \cdot x + y
  • Vector-Matrix benchmarks (BLAS level 2)

    • GEMV: z = A \cdot x
    • Simple DNN:
      • f(x) = z = ReLU(A \cdot x)
      • z_{n+1} = f(z_n)
      • 5 layers in total

Level Vector GEMV DNN
X1 (2M) (1024 x 4096) (256 x 256)
X2 (4M) (2048 x 4096) (512 x 512)
X3 (8M) (4096 x 8192) (1024 x 1024)
X4 (16M) (4096 x 8192) (2048 x 2048)

Operand Dimensions


Simulations

System Configuration





Two simulated systems:


  • Generic ARM system
  • Infinite compute system
    • completely memory bound

Two real GPUs using HBM2:


  • AMD RX Vega 56
  • NVIDIA V100

layout: figure figureUrl: /speedup_normal.svg figureCaption: Speedups of PIM compared to non-PIM

Simulations

Speedups / Generic ARM System



layout: figure figureUrl: /speedup_inf.svg figureCaption: Speedups of PIM compared to non-PIM

Simulations

Speedups / Infinite Compute System



layout: figure figureUrl: /samsung.svg figureCaption: Speedups of Samsung for VADD and GEMV

Simulations

Speedups / Samsung


Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product“, 2021.

layout: figure figureUrl: /runtimes_vector.svg figureCaption: Runtimes for Vector Benchmarks

Simulations

Runtimes / Vector Benchmarks



layout: figure figureUrl: /runtimes_matrix.svg figureCaption: Runtimes for Matrix Benchmarks

Simulations

Runtimes / Matrix Benchmarks