Add CNN slide

This commit is contained in:
2024-04-06 10:01:24 +02:00
parent b38406a36f
commit 1e5256dc99
12 changed files with 7104 additions and 8977 deletions

View File

@@ -21,7 +21,7 @@ figureCaption: Data structures for instructions and register files
<br>
<br>
- Software support library written in Rust
- Software support library
- Provides data structures for PIM-HBM
- Adhering special memory layout requirements
- Executes programmed microkernels
@@ -45,13 +45,17 @@ figureUrl: /bare_metal.svg
---
## Virtual Prototype
### Platform
<hr/>
<br>
<br>
GEMV Microkernel
```asm{none|1-8|9,10|11|all}{lines:true}
<div class="grid grid-cols-2 gap-4">
<div>
DRAM-side
```asm{all|1-8|9,10|11|12|all}{lines:true,at:1}
MOV GRF_A #0, BANK
MOV GRF_A #1, BANK
MOV GRF_A #2, BANK
@@ -66,3 +70,52 @@ FILL BANK, GRF_B #0
EXIT
```
</div>
<div>
<style>
code {
font-size: 8px
}
</style>
Host-side
```rust {all|7-10|12-17|19-28|30-31|all}{lines:true,maxHeight:'15em',at:1}
pub fn execute<const X16R: usize, const X16C: usize, const R: usize>(
matrix: &Matrix<X16R, X16C>,
input_vector: &Vector<X16C>,
output_partial_sum_vector: &mut SVector<F16x16, R>,
dummy: &impl PimOperand,
) {
// Load input vector into GRF-A registers
for chunk in input_vector.0.iter() {
chunk.execute_read();
}
// Execute the MAC instructions without memory barriers
for sub_matrix in matrix.0.iter() {
for column_block in sub_matrix.fixed_rows::<1>(0).iter() {
column_block.execute_read_async();
}
}
// Verify all memory accesses have finished
barrier::dsb(barrier::SY);
// Copy the partial sums into the bank
for chunk in output_partial_sum_vector
.fixed_rows_with_step_mut::<X16R>(0, 16)
.iter_mut()
{
chunk.execute_write();
}
// Execute the EXIT instruction
dummy.execute_read();
}
```
</div>
</div>
<!-- </Transform> -->

View File

@@ -15,6 +15,49 @@ figureFootnoteNumber: 1
</Footnote>
</Footnotes>
<!--
- Workload must be memory-bound
- memory-bound:
- fully-connected layers
- layers of recurrent neural networks (RNNs)
- not memory-bound:
- convolutional layers
- data reuse
-->
---
preload: false
clicks: 1
---
## Processing-in-Memory
### Applicable Workloads
<hr/>
<br>
- Convolutional layers have excessive data reuse
- Small filter matrix fits onto on-chip cache
<br>
<Transform :scale="1.4">
<div class="absolute left-175px top-1px">
<img src="/cnn_input.svg">
</div>
<div v-if="$slidev.nav.clicks === 0" class="absolute left-175px">
<img src="/cnn_filter.svg">
</div>
<div v-if="$slidev.nav.clicks === 1"
v-motion
:initial="{ x: 175, y: 0}"
:enter="{ x: 335, y: 0, transition: { duration: 5000 }}">
<img src="/cnn_filter.svg">
</div>
</Transform>
---
## Processing-in-Memory
@@ -142,4 +185,4 @@ simulation models needed
research should not only focus on hardware but also explore the software side!
deswegen baue ich einen virutal protoype
deswegen baue ich einen virutal protoype