diff --git a/slides.md b/slides.md
index e0e3bc1..c502e36 100644
--- a/slides.md
+++ b/slides.md
@@ -14,6 +14,7 @@ addons:
   - slidev-addon-citations
 biblio:
   filename: references.bib
+record: true
 ---
 
 ### Master Thesis
diff --git a/slides/conclusion.md b/slides/conclusion.md
index 79a5dc7..cd7513c 100644
--- a/slides/conclusion.md
+++ b/slides/conclusion.md
@@ -3,12 +3,13 @@
 
 <br>
 
-A speedup of 17.6× and 9.0× for the hypothetical infinite compute system has been achieved
+- PIM can accelerate memory-bound workloads
+- Special PIM-friendly memory layouts are required
 
 <br>
 
-Future work:
+#### Future work:
   - Implementation of Linux driver
-  - Comparison with complete neural networks
+    - Comparison with complete neural networks
   - Consider replacing library approach with compiler approach
   - Implement a power model to analyze the power efficiency gains
diff --git a/slides/implementation.md b/slides/implementation.md
index ded60a6..06a74ac 100644
--- a/slides/implementation.md
+++ b/slides/implementation.md
@@ -38,28 +38,12 @@ figureCaption: Data structures for instructions and register files
 
 - Provides data structures for operand data and microkernels
 - Executes programmed microkernels
-
----
-layout: figure-side
-figureUrl: /bare_metal.svg
----
-
-## Virtual Prototype
-### Platform
-<hr/>
-
-<br>
-<br>
-
-- Bare-metal kernel executes on ARM processor model
-- Custom page table configuration
-  - Non-PIM DRAM region mapped as cacheable memory
-  - PIM DRAM region mapped as non-cacheable memory
+  - generate RD and WR requests
 
 ---
 
 ## Virtual Prototype
-### Platform
+### GEMV Kernel
 <hr/>
 
 <br>
@@ -68,7 +52,7 @@ figureUrl: /bare_metal.svg
 <div>
 
 DRAM-side
-```asm{all|1-8|9,10|11|12|all}{lines:true,at:1}
+```asm{all|1-8|9,10|11|12}{lines:true,at:1}
 MOV GRF_A #0, BANK
 MOV GRF_A #1, BANK
 MOV GRF_A #2, BANK
@@ -94,7 +78,7 @@ code {
 
 Host-side
 
-```rust {all|7-10|12-17|19-28|30-31|all}{lines:true,maxHeight:'15em',at:1}
+```rust {all|7-10|12-17|19-28|30-31}{lines:true,maxHeight:'15em',at:1}
 pub fn execute<const X16R: usize, const X16C: usize, const R: usize>(
     matrix: &Matrix<X16R, X16C>,
     input_vector: &Vector<X16C>,
@@ -131,4 +115,24 @@ pub fn execute<const X16R: usize, const X16C: usize, const R: usize>(
 </div>
 </div>
 
-<!-- </Transform> -->
+---
+layout: figure-side
+figureUrl: /bare_metal.svg
+---
+
+## Virtual Prototype
+### Platform
+<hr/>
+
+<br>
+<br>
+
+- ARM processor model
+- Bare-metal kernel
+- Custom page table configuration
+  - Non-PIM DRAM region mapped as cacheable memory
+  - PIM DRAM region mapped as non-cacheable memory
+
+<!--
+- bare metal offers most control
+-->
diff --git a/slides/introduction.md b/slides/introduction.md
index 07d14b2..0b4a126 100644
--- a/slides/introduction.md
+++ b/slides/introduction.md
@@ -18,6 +18,14 @@
   </Footnote>
 </Footnotes>
 
+<!--
+- compute doubles every two years
+- energy production grows linearly at 2% per year
+
+- to meet future compute demands
+  - -> drastic improvements in energy efficiency
+-->
+
 ---
 
 ## Introduction
@@ -26,7 +34,7 @@
 
 <br>
 
-#### Roofline model of GPT revisions<sup>1</sup>
+- AI workloads become increasingly memory-bound
 
 <br>
 
@@ -39,3 +47,10 @@
   Ivo Bolsens. „Scalable AI Architectures for Edge and Cloud“, 2023.
   </Footnote>
 </Footnotes>
+
+<!--
+- Emerging AI applications become increasingly memory-bound
+- Roofline model
+- Not limited by compute power but by memory
+-> researchers begin to consider processing in memory to circumvent memory bottleneck
+-->
diff --git a/slides/pim.md b/slides/pim.md
index 01f44e3..68d03aa 100644
--- a/slides/pim.md
+++ b/slides/pim.md
@@ -11,15 +11,8 @@
 </div>
 
 <!--
-- Workload must be memory-bound
-
-- memory-bound:
-  - fully-connected layers
-  - layers of recurrent neural networks (RNNs)
-  
-- not memory-bound:
-  - convolutional layers
-  	- data reuse
+- fully connected layers of a neural network
+- Such that PIM is effective, workload must be memory-bound
 -->
 
 ---
@@ -52,6 +45,10 @@ clicks: 1
   </div>
 </Transform>
 
+<!--
+- filter matrix is reused
+-->
+
 ---
 
 ## Processing-in-Memory
@@ -67,7 +64,7 @@ clicks: 1
 <div>
 
 ### Suitable candidates for PIM:
- - Multilayer perceptrons (MLPs)
+ - Fully connected layers in multilayer perceptrons (MLPs)
  - Layers in recurrent neural networks (RNNs)
 
 </div>
@@ -130,19 +127,18 @@ To summarize...
 </Footnotes>
 
 <!--
+- Architecture space of PIM:
 - Inside the memory SA
-  - Ambit
-    - activate multiple rows at the same time
-    - bulk logic operations
+  - simple bulk logic
     
 - Near SA in PSA output region
-  - CMOS-based logic gates in the region
+  - logic gates in the region
   
 - Near a bank in its peripheral region
-  - computation units with control at bank output
+  - computation units with control
   
 - I/O region of memory
-  - more traditional accelerator approach
+  - limited by memory bus
 -->
 
 ---
@@ -171,12 +167,9 @@ To summarize...
 </Footnotes>
 
 <!--
-- Real-world PIM implementation based on HBM2
-- SIMD FPUs are 16-wide, i.e., there are 16 FPU units
-- Three execution modes
-    - Single-Bank (SB)
-    - All-Bank (AB)
-    - All-Bank-PIM (AB-PIM)
+- One PIM unit shared by two banks
+- 16-wide SIMD FPUs are 16-wide
+- All-Bank mode: All PIM units operate in parallel
 -->
 
 ---
@@ -201,16 +194,15 @@ To summarize...
 </Footnotes>
 
 <!--
-- Control unit executes RISC instructions
 - Two SIMD FPUs
   - ADD
   - MUL
 
-- CRF: 32 32-bit entries (32 instructions)
-- GRF: 16 256-bit entries
-- SRF: 16 16-bit entries
+- CRF: 32 instructions, stores the program
+- GRF: 16 entries, one memory fetch
+- SRF: 16 entries
 
-- One instruction is executed when RD or WR command is issued
+- Control units executes one instruction when RD or WR command is issued
 -->
 
 ---
@@ -229,6 +221,13 @@ figureCaption: Procedure to perform a (128×8)×(128) GEMV operation
 </Footnote>
 </Footnotes>
 
+<!--
+- Procedure of GEMV operation
+- multiple cycles
+- each PIM unit operatates on one matrix row
+- partial sum, reduced by host
+-->
+
 ---
 layout: figure
 figureUrl: /layout.svg
@@ -254,7 +253,7 @@ figureCaption: Mapping of the weight matrix onto the memory banks
 <br>
 <br>
 
-- To analyze the performance gains of PIM, simulation models are needed
+- Simulations are needed to analyze the performance gains of PIM
 - Research should not only focus on hardware but also explore the software side
 
 <br>
diff --git a/slides/simulations.md b/slides/simulations.md
index dcdbb38..a6c6779 100644
--- a/slides/simulations.md
+++ b/slides/simulations.md
@@ -14,7 +14,7 @@
 
 - Vector-Matrix benchmarks (BLAS level 2)
     - GEMV: $z = A \cdot x$
-    - DNN:
+    - Simple DNN:
       - $f(x) = z = ReLU(A \cdot x)$
       - $z_{n+1} = f(z_n)$
       - 5 layers in total
@@ -36,24 +36,44 @@ Operand Dimensions
 </div>
 </div>
 
+<!--
+- operand data significantly larger than on-chip cache
+-->
+
 ---
 
 ## Simulations
 ### System Configuration
 <hr/>
 
+<br>
 <br>
 <br>
 
-- Two simulated systems:
-    - Generic ARM systems
-    - Infinite compute ARM system
+<div class="grid grid-cols-2 gap-4">
+<div>
+
+#### Two simulated systems:
 
 <br>
 
-- Two real GPUs using HBM2:
-  - AMD RX Vega 56
-  - NVIDIA V100
+- Generic ARM system
+- Infinite compute system
+  - completely memory bound
+
+</div>
+
+<div>
+
+#### Two real GPUs using HBM2:
+
+<br>
+
+- AMD RX Vega 56
+- NVIDIA V100
+
+</div>
+</div>
 
 ---
 layout: figure
@@ -75,11 +95,15 @@ figureCaption: Speedups of PIM compared to non-PIM
 ### Speedups / Infinite Compute System
 <hr/>
 
+<!--
+- VADD: 12.7x
+- GEMV: 9.0x
+-->
+
 ---
 layout: figure
 figureUrl: /samsung.svg
 figureCaption: Speedups of Samsung for VADD and GEMV
-figureFootnoteNumber: 1
 ---
 
 ## Simulations
@@ -97,6 +121,7 @@ figureFootnoteNumber: 1
 - ADD shows deviation
 
 -> differences in hardware architecture
+- GPU has no speculative execution
 -->
 
 ---
@@ -111,6 +136,7 @@ figureCaption: Runtimes for Vector Benchmarks
 
 <!--
 - Real GPUs use multiple memory channels
+- Memory barriers
 - Also architectural differences
 -->