diff --git a/slides.md b/slides.md
index e0e3bc1..c502e36 100644
--- a/slides.md
+++ b/slides.md
@@ -14,6 +14,7 @@ addons:
- slidev-addon-citations
biblio:
filename: references.bib
+record: true
---
### Master Thesis
diff --git a/slides/conclusion.md b/slides/conclusion.md
index 79a5dc7..cd7513c 100644
--- a/slides/conclusion.md
+++ b/slides/conclusion.md
@@ -3,12 +3,13 @@
-A speedup of 17.6× and 9.0× for the hypothetical infinite compute system has been achieved
+- PIM can accelerate memory-bound workloads
+- Special PIM-friendly memory layouts are required
-Future work:
+#### Future work:
- Implementation of Linux driver
- - Comparison with complete neural networks
+ - Comparison with complete neural networks
- Consider replacing library approach with compiler approach
- Implement a power model to analyze the power efficiency gains
diff --git a/slides/implementation.md b/slides/implementation.md
index ded60a6..06a74ac 100644
--- a/slides/implementation.md
+++ b/slides/implementation.md
@@ -38,28 +38,12 @@ figureCaption: Data structures for instructions and register files
- Provides data structures for operand data and microkernels
- Executes programmed microkernels
-
----
-layout: figure-side
-figureUrl: /bare_metal.svg
----
-
-## Virtual Prototype
-### Platform
-
-
-
-
-
-- Bare-metal kernel executes on ARM processor model
-- Custom page table configuration
- - Non-PIM DRAM region mapped as cacheable memory
- - PIM DRAM region mapped as non-cacheable memory
+ - generate RD and WR requests
---
## Virtual Prototype
-### Platform
+### GEMV Kernel
@@ -68,7 +52,7 @@ figureUrl: /bare_metal.svg
DRAM-side
-```asm{all|1-8|9,10|11|12|all}{lines:true,at:1}
+```asm{all|1-8|9,10|11|12}{lines:true,at:1}
MOV GRF_A #0, BANK
MOV GRF_A #1, BANK
MOV GRF_A #2, BANK
@@ -94,7 +78,7 @@ code {
Host-side
-```rust {all|7-10|12-17|19-28|30-31|all}{lines:true,maxHeight:'15em',at:1}
+```rust {all|7-10|12-17|19-28|30-31}{lines:true,maxHeight:'15em',at:1}
pub fn execute(
matrix: &Matrix,
input_vector: &Vector,
@@ -131,4 +115,24 @@ pub fn execute(
-
+---
+layout: figure-side
+figureUrl: /bare_metal.svg
+---
+
+## Virtual Prototype
+### Platform
+
+
+
+
+
+- ARM processor model
+- Bare-metal kernel
+- Custom page table configuration
+ - Non-PIM DRAM region mapped as cacheable memory
+ - PIM DRAM region mapped as non-cacheable memory
+
+
diff --git a/slides/introduction.md b/slides/introduction.md
index 07d14b2..0b4a126 100644
--- a/slides/introduction.md
+++ b/slides/introduction.md
@@ -18,6 +18,14 @@
+
+
---
## Introduction
@@ -26,7 +34,7 @@
-#### Roofline model of GPT revisions1
+- AI workloads become increasingly memory-bound
@@ -39,3 +47,10 @@
Ivo Bolsens. „Scalable AI Architectures for Edge and Cloud“, 2023.
+
+
diff --git a/slides/pim.md b/slides/pim.md
index 01f44e3..68d03aa 100644
--- a/slides/pim.md
+++ b/slides/pim.md
@@ -11,15 +11,8 @@
---
@@ -52,6 +45,10 @@ clicks: 1
+
+
---
## Processing-in-Memory
@@ -67,7 +64,7 @@ clicks: 1
### Suitable candidates for PIM:
- - Multilayer perceptrons (MLPs)
+ - Fully connected layers in multilayer perceptrons (MLPs)
- Layers in recurrent neural networks (RNNs)
@@ -130,19 +127,18 @@ To summarize...
---
@@ -171,12 +167,9 @@ To summarize...
---
@@ -201,16 +194,15 @@ To summarize...
---
@@ -229,6 +221,13 @@ figureCaption: Procedure to perform a (128×8)×(128) GEMV operation
+
+
---
layout: figure
figureUrl: /layout.svg
@@ -254,7 +253,7 @@ figureCaption: Mapping of the weight matrix onto the memory banks
-- To analyze the performance gains of PIM, simulation models are needed
+- Simulations are needed to analyze the performance gains of PIM
- Research should not only focus on hardware but also explore the software side
diff --git a/slides/simulations.md b/slides/simulations.md
index dcdbb38..a6c6779 100644
--- a/slides/simulations.md
+++ b/slides/simulations.md
@@ -14,7 +14,7 @@
- Vector-Matrix benchmarks (BLAS level 2)
- GEMV: $z = A \cdot x$
- - DNN:
+ - Simple DNN:
- $f(x) = z = ReLU(A \cdot x)$
- $z_{n+1} = f(z_n)$
- 5 layers in total
@@ -36,24 +36,44 @@ Operand Dimensions
+
+
---
## Simulations
### System Configuration
+
-- Two simulated systems:
- - Generic ARM systems
- - Infinite compute ARM system
+
+
+
+#### Two simulated systems:
-- Two real GPUs using HBM2:
- - AMD RX Vega 56
- - NVIDIA V100
+- Generic ARM system
+- Infinite compute system
+ - completely memory bound
+
+
+
+
+
+#### Two real GPUs using HBM2:
+
+
+
+- AMD RX Vega 56
+- NVIDIA V100
+
+
+
---
layout: figure
@@ -75,11 +95,15 @@ figureCaption: Speedups of PIM compared to non-PIM
### Speedups / Infinite Compute System
+
+
---
layout: figure
figureUrl: /samsung.svg
figureCaption: Speedups of Samsung for VADD and GEMV
-figureFootnoteNumber: 1
---
## Simulations
@@ -97,6 +121,7 @@ figureFootnoteNumber: 1
- ADD shows deviation
-> differences in hardware architecture
+- GPU has no speculative execution
-->
---
@@ -111,6 +136,7 @@ figureCaption: Runtimes for Vector Benchmarks