Grundgerüst steht

This commit is contained in:
2024-04-03 23:14:06 +02:00
parent a7d5b77dcd
commit b38406a36f
7 changed files with 367 additions and 128 deletions

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 148 KiB

After

Width:  |  Height:  |  Size: 156 KiB

View File

@@ -36,9 +36,17 @@ src: ./slides/implementation.md
src: ./slides/simulations.md
---
---
src: ./slides/conclusion.md
---
---
layout: end
---
# Thank you for your attention
<hr/>
---
src: ./slides/appendix.md
---

42
slides/appendix.md Normal file
View File

@@ -0,0 +1,42 @@
## Appendix
### GEMV Kernel
<hr/>
<Transform :scale="0.7">
```rust {all}{lines:true}
pub fn execute<const X16R: usize, const X16C: usize, const R: usize>(
matrix: &Matrix<X16R, X16C>,
input_vector: &Vector<X16C>,
output_partial_sum_vector: &mut SVector<F16x16, R>,
dummy: &impl PimOperand,
) {
// Load input vector into GRF-A registers
for chunk in input_vector.0.iter() {
chunk.execute_read();
}
// Execute the MAC instructions without memory barriers
for sub_matrix in matrix.0.iter() {
for column_block in sub_matrix.fixed_rows::<1>(0).iter() {
column_block.execute_read_async();
}
}
// Verify all memory accesses have finished
barrier::dsb(barrier::SY);
// Copy the partial sums into the bank
for chunk in output_partial_sum_vector
.fixed_rows_with_step_mut::<X16R>(0, 16)
.iter_mut()
{
chunk.execute_write();
}
// Execute the EXIT instruction
dummy.execute_read();
}
```
</Transform>

9
slides/conclusion.md Normal file
View File

@@ -0,0 +1,9 @@
## Conclusion and Future Work
<hr/>
- achievable speedup of 17.6 × and 9.0 × hypothetical infinite compute system
- lower bound
- linux driver implementation
- comparison with real neural network workloads
- consider replacing library approach with compiler approach
- power comparison, power models needed

View File

@@ -11,7 +11,7 @@ figureCaption: The PIM-HBM model integrated into DRAMSys
---
layout: figure-side
figureUrl: /data_structures.svg
figureCaption: The PIM-HBM model integrated into DRAMSys
figureCaption: Data structures for instructions and register files
---
## Virtual Prototype

View File

@@ -141,3 +141,5 @@ figureCaption: Mapping of the weight matrix onto the memory banks
simulation models needed
research should not only focus on hardware but also explore the software side!
deswegen baue ich einen virutal protoype

View File

@@ -30,9 +30,71 @@
</div>
</div>
---
## Simulations
### System Configuration
<hr/>
- Two system configurations:
- ARM 3GHz
- ARM Infinite
- TODO ... GPU und so
---
layout: figure
figureUrl: /dnn.svg
figureCaption: A fully connected DNN layer
figureUrl: /speedup_normal.svg
figureCaption: Speedups of PIM compared to non-PIM
---
## Simulations
### Speedups / ARM System
<hr/>
---
layout: figure
figureUrl: /speedup_inf.svg
figureCaption: Speedups of PIM compared to non-PIM
---
## Simulations
### Speedups / Infinite Compute System
<hr/>
---
layout: figure
figureUrl: /samsung.svg
figureCaption: Speedups of Samsung for VADD and GEMV
figureFootnoteNumber: 1
---
## Simulations
### Speedups / Samsung
<hr/>
<Footnotes separator>
<Footnote :number=1>
Lee et al. „Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product“, 2021.
</Footnote>
</Footnotes>
---
layout: figure
figureUrl: /runtimes_vector.svg
figureCaption: Runtimes for Vector Benchmarks
---
## Simulations
### Runtimes / Vector Benchmarks
<hr/>
---
layout: figure
figureUrl: /runtimes_matrix.svg
figureCaption: Runtimes for Matrix Benchmarks
---
## Simulations
### Runtimes / Matrix Benchmarks
<hr/>