Refactor presentation

This commit is contained in:
2024-04-07 21:21:40 +02:00
parent 56226ebf41
commit 3d15758c82
9 changed files with 279 additions and 187 deletions

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 54 KiB

After

Width:  |  Height:  |  Size: 54 KiB

View File

@@ -1791,9 +1791,9 @@
inkscape:pageopacity="0"
inkscape:pagecheckerboard="0"
inkscape:deskcolor="#d1d1d1"
inkscape:zoom="2.4038052"
inkscape:cx="376.48641"
inkscape:cy="146.01849"
inkscape:zoom="0.84987348"
inkscape:cx="436.53557"
inkscape:cy="289.45485"
inkscape:window-width="2194"
inkscape:window-height="1158"
inkscape:window-x="0"

Before

Width:  |  Height:  |  Size: 276 KiB

After

Width:  |  Height:  |  Size: 276 KiB

View File

@@ -6407,20 +6407,20 @@
id="g276"
transform="matrix(1.2839206,0,0,1.2839206,-276.91223,57.78747)">
<path
d="m 326.43557,68.032148 h 61.338 v 8.120261 h -61.338 z"
d="m 303.28562,68.032148 h 84.48795 v 8.120261 h -84.48795 z"
style="fill:#f5f000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.04686;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
transform="matrix(1.143907,0,0,1.1485333,-51.191119,119.61054)"
transform="matrix(1.143907,0,0,1.1485333,-24.709729,119.61054)"
clip-path="url(#clipPath117-3)"
id="path276-9"
sodipodi:nodetypes="ccccc" />
<text
xml:space="preserve"
style="font-size:8px;fill:#000000"
x="357.35019"
x="370.59088"
y="205.03008"
id="text362"><tspan
sodipodi:role="line"
x="357.35019"
x="370.59088"
y="205.03008"
id="tspan363"
style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:8px;font-family:'Liberation Serif';-inkscape-font-specification:'Liberation Serif Bold';text-align:center;text-anchor:middle">PIM</tspan></text>

Before

Width:  |  Height:  |  Size: 346 KiB

After

Width:  |  Height:  |  Size: 346 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 95 KiB

After

Width:  |  Height:  |  Size: 95 KiB

View File

@@ -1,9 +1,14 @@
## Conclusion and Future Work
<hr/>
- achievable speedup of 17.6 × and 9.0 × hypothetical infinite compute system
- lower bound
- linux driver implementation
- comparison with real neural network workloads
- consider replacing library approach with compiler approach
- power comparison, power models needed
<br>
A speedup of 17.6× and 9.0× for the hypothetical infinite compute system has been achieved
<br>
Future work:
- Implementation of Linux driver
- Comparison with complete neural networks
- Consider replacing library approach with compiler approach
- Implement a power model to analyze the power efficiency gains

View File

@@ -1,13 +1,23 @@
---
layout: figure
figureUrl: /dramsys.svg
figureCaption: The PIM-HBM model integrated into DRAMSys
---
## Virtual Prototype
### Processing Units
<hr/>
<br>
- Integrate DRAMSys into gem5
- Implement PIM-HBM virtual prototype in DRAM model
<br>
<div class="flex justify-center items-center">
<img src="/dramsys.svg">
</div>
<!--
- VP interprets the programmed microkernel
- not yet drop-in replacement
-->
---
layout: figure-side
figureUrl: /data_structures.svg
@@ -18,12 +28,15 @@ figureCaption: Data structures for instructions and register files
### Software Library
<hr/>
<br>
<br>
<br>
- Software support library
- Provides data structures for PIM-HBM
- Adhering special memory layout requirements
#### Software support library
<br>
- Provides data structures for operand data and microkernels
- Executes programmed microkernels
---

View File

@@ -1,33 +1,41 @@
---
layout: figure
figureUrl: /world_energy.svg
figureCaption: Total energy of computing
figureFootnoteNumber: 1
---
## Introduction
### Energy Demand of Applications
<hr/>
<br>
- Total compute energy approaches world's energy production
--> drastic improvements in energy efficiency needed
<div class="flex justify-center">
<img src="/world_energy.svg">
</div>
<Footnotes separator>
<Footnote :number=1>
<Footnote>
SRC. „Decadal Plan for Semiconductors“, Januar 2021. https://www.src.org/about/decadal-plan/.
</Footnote>
</Footnote>
</Footnotes>
---
layout: figure
figureUrl: /gpt.svg
figureCaption: Roofline model of GPT revisions
figureFootnoteNumber: 1
---
## Introduction
### Memory Bound Workloads
<hr/>
<br>
#### Roofline model of GPT revisions<sup>1</sup>
<br>
<div class="flex justify-center">
<img src="/gpt.svg">
</div>
<Footnotes separator>
<Footnote :number=1>
<Footnote>
Ivo Bolsens. „Scalable AI Architectures for Edge and Cloud“, 2023.
</Footnote>
</Footnote>
</Footnotes>

View File

@@ -79,6 +79,10 @@ clicks: 1
</div>
</div>
<!--
To summarize...
-->
---
## Processing-in-Memory
@@ -94,8 +98,8 @@ clicks: 1
<v-clicks>
- Inside the memory subarray
- In the PSA region near a subarray
- Outside the bank in its peripheral region
- Near the subarray in the PSA output region
- Near the bank in its peripheral region
- In the I/O region of the memory
</v-clicks>
@@ -120,7 +124,7 @@ clicks: 1
<div v-click class="text-xl"> The nearer the computation is to the memory cells, the higher the achievable bandwidth! </div>
<Footnotes separator>
<Footnote :number=1>
<Footnote>
Sudarshan et al. „A Critical Assessment of DRAM-PIM Architectures - Trends, Challenges and Solutions“, 2022.
</Footnote>
</Footnotes>
@@ -141,19 +145,27 @@ clicks: 1
- more traditional accelerator approach
-->
---
layout: figure
figureUrl: /hbm-pim.svg
figureCaption: Architecture of PIM-HBM
figureFootnoteNumber: 1
---
## Processing-in-Memory
### Samsung's HBM-PIM
### Samsung's PIM-HBM
<hr/>
<br>
- Real-world PIM implementation based on HBM2
- PIM units embedded at the bank level
<br>
<div class="flex justify-center items-center">
<img src="/hbm-pim.svg">
</div>
<Footnotes separator>
<Footnote :number=1>
<Footnote>
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product“, 2021.
</Footnote>
</Footnotes>
@@ -167,19 +179,23 @@ figureFootnoteNumber: 1
- All-Bank-PIM (AB-PIM)
-->
---
layout: figure
figureUrl: /pu.svg
figureCaption: Architecture of a PIM processing unit
figureFootnoteNumber: 1
---
## Processing-in-Memory
### Samsung's HBM-PIM
### Samsung's PIM-HBM
<hr/>
<br>
- Two 16-wide 16-bit FPUs
- Register files and control unit
<div class="flex justify-center items-center">
<img src="/pu.svg">
</div>
<Footnotes separator>
<Footnote :number=1>
<Footnote>
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product“, 2021.
</Footnote>
</Footnotes>
@@ -201,15 +217,14 @@ figureFootnoteNumber: 1
layout: figure
figureUrl: /gemv.svg
figureCaption: Procedure to perform a (128×8)×(128) GEMV operation
figureFootnoteNumber: 1
---
## Processing-in-Memory
### Samsung's HBM-PIM
### Samsung's PIM-HBM
<hr/>
<Footnotes separator>
<Footnote :number=1>
<Footnote>
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product“, 2021.
</Footnote>
</Footnotes>
@@ -221,7 +236,7 @@ figureCaption: Mapping of the weight matrix onto the memory banks
---
## Processing-in-Memory
### Samsung's HBM-PIM
### Samsung's PIM-HBM
<hr/>
<!--
@@ -234,8 +249,14 @@ figureCaption: Mapping of the weight matrix onto the memory banks
### Research
<hr/>
simulation models needed
<br>
<br>
<br>
<br>
research should not only focus on hardware but also explore the software side!
- To analyze the performance gains of PIM, simulation models are needed
- Research should not only focus on hardware but also explore the software side
deswegen baue ich einen virutal protoype
<br>
- In the following, a virtual prototype of PIM-HBM is implemented

View File

@@ -2,7 +2,6 @@
### Microbenchmarks
<hr/>
<br>
<br>
<div class="grid grid-cols-2 gap-4">
@@ -15,11 +14,16 @@
- Vector-Matrix benchmarks (BLAS level 2)
- GEMV: $z = A \cdot x$
- DNN Layer: $z = ReLU(A \cdot x)$
- DNN:
- $f(x) = z = ReLU(A \cdot x)$
- $z_{n+1} = f(z_n)$
- 5 layers in total
</div>
<div>
<br>
| Level | Vector | GEMV | DNN |
|-------|--------|---------------|---------------|
| X1 | (2M) | (1024 x 4096) | (256 x 256) |
@@ -27,6 +31,8 @@
| X3 | (8M) | (4096 x 8192) | (1024 x 1024) |
| X4 | (16M) | (4096 x 8192) | (2048 x 2048) |
Operand Dimensions
</div>
</div>
@@ -36,11 +42,18 @@
### System Configuration
<hr/>
- Two system configurations:
- ARM 3GHz
- ARM Infinite
<br>
<br>
- TODO ... GPU und so
- Two simulated systems:
- Generic ARM systems
- Infinite compute ARM system
<br>
- Two real GPUs using HBM2:
- AMD RX Vega 56
- NVIDIA V100
---
layout: figure
@@ -49,7 +62,7 @@ figureCaption: Speedups of PIM compared to non-PIM
---
## Simulations
### Speedups / ARM System
### Speedups / Generic ARM System
<hr/>
---
@@ -74,11 +87,18 @@ figureFootnoteNumber: 1
<hr/>
<Footnotes separator>
<Footnote :number=1>
<Footnote>
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product“, 2021.
</Footnote>
</Footnotes>
<!--
- GEMV matches good
- ADD shows deviation
-> differences in hardware architecture
-->
---
layout: figure
figureUrl: /runtimes_vector.svg
@@ -89,6 +109,11 @@ figureCaption: Runtimes for Vector Benchmarks
### Runtimes / Vector Benchmarks
<hr/>
<!--
- Real GPUs use multiple memory channels
- Also architectural differences
-->
---
layout: figure
figureUrl: /runtimes_matrix.svg