Add CNN slide

2024-04-06 10:01:24 +02:00
parent b38406a36f
commit 1e5256dc99
12 changed files with 7104 additions and 8977 deletions
--- a/slides/implementation.md
+++ b/slides/implementation.md
@@ -21,7 +21,7 @@ figureCaption: Data structures for instructions and register files
 <br>
 <br>

- Software support library written in Rust
+- Software support library
 - Provides data structures for PIM-HBM
  - Adhering special memory layout requirements
 - Executes programmed microkernels
@@ -45,13 +45,17 @@ figureUrl: /bare_metal.svg

 ---

+## Virtual Prototype
+### Platform
 <hr/>

 <br>
-<br>
-GEMV Microkernel

-```asm{none|1-8|9,10|11|all}{lines:true}
+<div class="grid grid-cols-2 gap-4">
+<div>
+
+DRAM-side
+```asm{all|1-8|9,10|11|12|all}{lines:true,at:1}
 MOV GRF_A #0, BANK
 MOV GRF_A #1, BANK
 MOV GRF_A #2, BANK
@@ -66,3 +70,52 @@ FILL BANK, GRF_B #0
 EXIT
 ```

+</div>
+<div>
+
+<style>
+code {
+  font-size: 8px
+}
+</style>
+
+Host-side
+
+```rust {all|7-10|12-17|19-28|30-31|all}{lines:true,maxHeight:'15em',at:1}
+pub fn execute<const X16R: usize, const X16C: usize, const R: usize>(
+    matrix: &Matrix<X16R, X16C>,
+    input_vector: &Vector<X16C>,
+    output_partial_sum_vector: &mut SVector<F16x16, R>,
+    dummy: &impl PimOperand,
+) {
+    // Load input vector into GRF-A registers
+    for chunk in input_vector.0.iter() {
+        chunk.execute_read();
+    }
+
+    // Execute the MAC instructions without memory barriers
+    for sub_matrix in matrix.0.iter() {
+        for column_block in sub_matrix.fixed_rows::<1>(0).iter() {
+            column_block.execute_read_async();
+        }
+    }
+
+    // Verify all memory accesses have finished
+    barrier::dsb(barrier::SY);
+
+    // Copy the partial sums into the bank
+    for chunk in output_partial_sum_vector
+        .fixed_rows_with_step_mut::<X16R>(0, 16)
+        .iter_mut()
+    {
+        chunk.execute_write();
+    }
+
+    // Execute the EXIT instruction
+    dummy.execute_read();
+}
+```
+</div>
+</div>
+
+<!-- </Transform> -->
--- a/slides/pim.md
+++ b/slides/pim.md
@@ -15,6 +15,49 @@ figureFootnoteNumber: 1
 </Footnote>
 </Footnotes>

+<!--
+- Workload must be memory-bound
+
+- memory-bound:
+  - fully-connected layers
+  - layers of recurrent neural networks (RNNs)
+  
+- not memory-bound:
+  - convolutional layers
+  	- data reuse
+-->
+
+---
+preload: false
+clicks: 1
+---
+
+## Processing-in-Memory
+### Applicable Workloads
+<hr/>
+
+<br>
+
+- Convolutional layers have excessive data reuse
+- Small filter matrix fits onto on-chip cache
+
+<br>
+
+<Transform :scale="1.4">
+  <div class="absolute left-175px top-1px">
+    <img src="/cnn_input.svg">
+  </div>
+  <div v-if="$slidev.nav.clicks === 0" class="absolute left-175px">
+    <img src="/cnn_filter.svg">
+  </div>
+  <div v-if="$slidev.nav.clicks === 1"
+  v-motion
+  :initial="{ x: 175, y: 0}"
+  :enter="{ x: 335, y: 0, transition: { duration: 5000 }}">
+    <img src="/cnn_filter.svg">
+  </div>
+</Transform>
+
 ---

 ## Processing-in-Memory
@@ -142,4 +185,4 @@ simulation models needed

 research should not only focus on hardware but also explore the software side!

-deswegen baue ich einen virutal protoype
+deswegen baue ich einen virutal protoype