Refactor presentation

2024-04-07 21:21:40 +02:00
parent 56226ebf41
commit 3d15758c82
9 changed files with 279 additions and 187 deletions
--- a/public/bare_metal.svg
+++ b/public/bare_metal.svg
--- a/public/dramsys.svg
+++ b/public/dramsys.svg
@@ -1791,9 +1791,9 @@
     inkscape:pageopacity="0"
     inkscape:pagecheckerboard="0"
     inkscape:deskcolor="#d1d1d1"
-     inkscape:zoom="2.4038052"
-     inkscape:cx="376.48641"
-     inkscape:cy="146.01849"
+     inkscape:zoom="0.84987348"
+     inkscape:cx="436.53557"
+     inkscape:cy="289.45485"
     inkscape:window-width="2194"
     inkscape:window-height="1158"
     inkscape:window-x="0"
--- a/public/pim_positions_3.svg
+++ b/public/pim_positions_3.svg
@@ -6407,20 +6407,20 @@
       id="g276"
       transform="matrix(1.2839206,0,0,1.2839206,-276.91223,57.78747)">
      <path
-         d="m 326.43557,68.032148 h 61.338 v 8.120261 h -61.338 z"
+         d="m 303.28562,68.032148 h 84.48795 v 8.120261 h -84.48795 z"
         style="fill:#f5f000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.04686;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
-         transform="matrix(1.143907,0,0,1.1485333,-51.191119,119.61054)"
+         transform="matrix(1.143907,0,0,1.1485333,-24.709729,119.61054)"
         clip-path="url(#clipPath117-3)"
         id="path276-9"
         sodipodi:nodetypes="ccccc" />
      <text
         xml:space="preserve"
         style="font-size:8px;fill:#000000"
-         x="357.35019"
+         x="370.59088"
         y="205.03008"
         id="text362"><tspan
           sodipodi:role="line"
-           x="357.35019"
+           x="370.59088"
           y="205.03008"
           id="tspan363"
           style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:8px;font-family:'Liberation Serif';-inkscape-font-specification:'Liberation Serif Bold';text-align:center;text-anchor:middle">PIM</tspan></text>
--- a/public/speedup_inf.svg
+++ b/public/speedup_inf.svg
--- a/slides/conclusion.md
+++ b/slides/conclusion.md
@@ -1,9 +1,14 @@
 ## Conclusion and Future Work
 <hr/>

- achievable speedup of 17.6 × and 9.0 × hypothetical infinite compute system
-  - lower bound
- linux driver implementation
- comparison with real neural network workloads
- consider replacing library approach with compiler approach
- power comparison, power models needed
+<br>
+
+A speedup of 17.6× and 9.0× for the hypothetical infinite compute system has been achieved
+
+<br>
+
+Future work:
+  - Implementation of Linux driver
+  - Comparison with complete neural networks
+  - Consider replacing library approach with compiler approach
+  - Implement a power model to analyze the power efficiency gains
--- a/slides/implementation.md
+++ b/slides/implementation.md
@@ -1,13 +1,23 @@
---
-layout: figure
-figureUrl: /dramsys.svg
-figureCaption: The PIM-HBM model integrated into DRAMSys
---
-
 ## Virtual Prototype
 ### Processing Units
 <hr/>

+<br>
+
+- Integrate DRAMSys into gem5
+- Implement PIM-HBM virtual prototype in DRAM model
+
+<br>
+
+<div class="flex justify-center items-center">
+<img src="/dramsys.svg">
+</div>
+
+<!--
+- VP interprets the programmed microkernel
+  - not yet drop-in replacement
+-->
+
 ---
 layout: figure-side
 figureUrl: /data_structures.svg
@@ -18,12 +28,15 @@ figureCaption: Data structures for instructions and register files
 ### Software Library
 <hr/>

+<br>
 <br>
 <br>

- Software support library
- Provides data structures for PIM-HBM
-  - Adhering special memory layout requirements
+#### Software support library
+
+<br>
+
+- Provides data structures for operand data and microkernels
 - Executes programmed microkernels

 ---
--- a/slides/introduction.md
+++ b/slides/introduction.md
@@ -1,33 +1,41 @@
---
-layout: figure
-figureUrl: /world_energy.svg
-figureCaption: Total energy of computing
-figureFootnoteNumber: 1
---
-
 ## Introduction
 ### Energy Demand of Applications
 <hr/>

+<br>
+
+- Total compute energy approaches world's energy production
+
+  --> drastic improvements in energy efficiency needed
+
+<div class="flex justify-center">
+<img src="/world_energy.svg">
+</div>
+
 <Footnotes separator>
-  <Footnote :number=1>
+  <Footnote>
  SRC. „Decadal Plan for Semiconductors“, Januar 2021. https://www.src.org/about/decadal-plan/.
-</Footnote>
+  </Footnote>
 </Footnotes>

---
-layout: figure
-figureUrl: /gpt.svg
-figureCaption: Roofline model of GPT revisions
-figureFootnoteNumber: 1
 ---

 ## Introduction
 ### Memory Bound Workloads
 <hr/>

+<br>
+
+#### Roofline model of GPT revisions<sup>1</sup>
+
+<br>
+
+<div class="flex justify-center">
+<img src="/gpt.svg">
+</div>
+
 <Footnotes separator>
-  <Footnote :number=1>
+  <Footnote>
  Ivo Bolsens. „Scalable AI Architectures for Edge and Cloud“, 2023.
-</Footnote>
+  </Footnote>
 </Footnotes>
--- a/slides/pim.md
+++ b/slides/pim.md
@@ -79,6 +79,10 @@ clicks: 1
 </div>
 </div>

+<!--
+To summarize...
+-->
+
 ---

 ## Processing-in-Memory
@@ -94,8 +98,8 @@ clicks: 1
 <v-clicks>

 - Inside the memory subarray
- In the PSA region near a subarray
- Outside the bank in its peripheral region
+- Near the subarray in the PSA output region
+- Near the bank in its peripheral region
 - In the I/O region of the memory

 </v-clicks>
@@ -120,7 +124,7 @@ clicks: 1
 <div v-click class="text-xl"> The nearer the computation is to the memory cells, the higher the achievable bandwidth! </div>

 <Footnotes separator>
-  <Footnote :number=1>
+  <Footnote>
  Sudarshan et al. „A Critical Assessment of DRAM-PIM Architectures - Trends, Challenges and Solutions“, 2022.
 </Footnote>
 </Footnotes>
@@ -141,19 +145,27 @@ clicks: 1
  - more traditional accelerator approach
 -->

---
-layout: figure
-figureUrl: /hbm-pim.svg
-figureCaption: Architecture of PIM-HBM
-figureFootnoteNumber: 1
 ---

 ## Processing-in-Memory
-### Samsung's HBM-PIM
+### Samsung's PIM-HBM
 <hr/>

+
+<br>
+
+- Real-world PIM implementation based on HBM2
+- PIM units embedded at the bank level
+
+<br>
+
+
+<div class="flex justify-center items-center">
+<img src="/hbm-pim.svg">
+</div>
+
 <Footnotes separator>
-  <Footnote :number=1>
+  <Footnote>
  Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021.
 </Footnote>
 </Footnotes>
@@ -167,19 +179,23 @@ figureFootnoteNumber: 1
    - All-Bank-PIM (AB-PIM)
 -->

---
-layout: figure
-figureUrl: /pu.svg
-figureCaption: Architecture of a PIM processing unit
-figureFootnoteNumber: 1
 ---

 ## Processing-in-Memory
-### Samsung's HBM-PIM
+### Samsung's PIM-HBM
 <hr/>

+<br>
+
+- Two 16-wide 16-bit FPUs
+- Register files and control unit
+
+<div class="flex justify-center items-center">
+<img src="/pu.svg">
+</div>
+
 <Footnotes separator>
-  <Footnote :number=1>
+  <Footnote>
  Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021.
 </Footnote>
 </Footnotes>
@@ -201,15 +217,14 @@ figureFootnoteNumber: 1
 layout: figure
 figureUrl: /gemv.svg
 figureCaption: Procedure to perform a (128×8)×(128) GEMV operation
-figureFootnoteNumber: 1
 ---

 ## Processing-in-Memory
-### Samsung's HBM-PIM
+### Samsung's PIM-HBM
 <hr/>

 <Footnotes separator>
-  <Footnote :number=1>
+  <Footnote>
  Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021.
 </Footnote>
 </Footnotes>
@@ -221,7 +236,7 @@ figureCaption: Mapping of the weight matrix onto the memory banks
 ---

 ## Processing-in-Memory
-### Samsung's HBM-PIM
+### Samsung's PIM-HBM
 <hr/>

 <!--
@@ -234,8 +249,14 @@ figureCaption: Mapping of the weight matrix onto the memory banks
 ### Research
 <hr/>

-simulation models needed
+<br>
+<br>
+<br>
+<br>

-research should not only focus on hardware but also explore the software side!
+- To analyze the performance gains of PIM, simulation models are needed
+- Research should not only focus on hardware but also explore the software side

-deswegen baue ich einen virutal protoype
+<br>
+
+- In the following, a virtual prototype of PIM-HBM is implemented
--- a/slides/simulations.md
+++ b/slides/simulations.md
@@ -2,7 +2,6 @@
 ### Microbenchmarks
 <hr/>

-<br>
 <br>

 <div class="grid grid-cols-2 gap-4">
@@ -15,11 +14,16 @@

 - Vector-Matrix benchmarks (BLAS level 2)
    - GEMV: $z = A \cdot x$
-    - DNN Layer: $z = ReLU(A \cdot x)$
+    - DNN:
+      - $f(x) = z = ReLU(A \cdot x)$
+      - $z_{n+1} = f(z_n)$
+      - 5 layers in total

 </div>
 <div>

+<br>
+
 | Level | Vector | GEMV          | DNN           |
 |-------|--------|---------------|---------------|
 | X1    | (2M)   | (1024 x 4096) | (256 x 256)   |
@@ -27,6 +31,8 @@
 | X3    | (8M)   | (4096 x 8192) | (1024 x 1024) |
 | X4    | (16M)  | (4096 x 8192) | (2048 x 2048) |

+Operand Dimensions
+
 </div>
 </div>

@@ -36,11 +42,18 @@
 ### System Configuration
 <hr/>

- Two system configurations:
-    - ARM 3GHz
-    - ARM Infinite
+<br>
+<br>

- TODO ... GPU und so
+- Two simulated systems:
+    - Generic ARM systems
+    - Infinite compute ARM system
+
+<br>
+
+- Two real GPUs using HBM2:
+  - AMD RX Vega 56
+  - NVIDIA V100

 ---
 layout: figure
@@ -49,7 +62,7 @@ figureCaption: Speedups of PIM compared to non-PIM
 ---

 ## Simulations
-### Speedups / ARM System
+### Speedups / Generic ARM System
 <hr/>

 ---
@@ -74,11 +87,18 @@ figureFootnoteNumber: 1
 <hr/>

 <Footnotes separator>
-  <Footnote :number=1>
+  <Footnote>
  Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product“, 2021.
 </Footnote>
 </Footnotes>

+<!--
+- GEMV matches good
+- ADD shows deviation
+
+-> differences in hardware architecture
+-->
+
 ---
 layout: figure
 figureUrl: /runtimes_vector.svg
@@ -89,6 +109,11 @@ figureCaption: Runtimes for Vector Benchmarks
 ### Runtimes / Vector Benchmarks
 <hr/>

+<!--
+- Real GPUs use multiple memory channels
+- Also architectural differences
+-->
+
 ---
 layout: figure
 figureUrl: /runtimes_matrix.svg