Add CNN slide
This commit is contained in:
@@ -21,7 +21,7 @@ figureCaption: Data structures for instructions and register files
|
||||
<br>
|
||||
<br>
|
||||
|
||||
- Software support library written in Rust
|
||||
- Software support library
|
||||
- Provides data structures for PIM-HBM
|
||||
- Adhering special memory layout requirements
|
||||
- Executes programmed microkernels
|
||||
@@ -45,13 +45,17 @@ figureUrl: /bare_metal.svg
|
||||
|
||||
---
|
||||
|
||||
## Virtual Prototype
|
||||
### Platform
|
||||
<hr/>
|
||||
|
||||
<br>
|
||||
<br>
|
||||
GEMV Microkernel
|
||||
|
||||
```asm{none|1-8|9,10|11|all}{lines:true}
|
||||
<div class="grid grid-cols-2 gap-4">
|
||||
<div>
|
||||
|
||||
DRAM-side
|
||||
```asm{all|1-8|9,10|11|12|all}{lines:true,at:1}
|
||||
MOV GRF_A #0, BANK
|
||||
MOV GRF_A #1, BANK
|
||||
MOV GRF_A #2, BANK
|
||||
@@ -66,3 +70,52 @@ FILL BANK, GRF_B #0
|
||||
EXIT
|
||||
```
|
||||
|
||||
</div>
|
||||
<div>
|
||||
|
||||
<style>
|
||||
code {
|
||||
font-size: 8px
|
||||
}
|
||||
</style>
|
||||
|
||||
Host-side
|
||||
|
||||
```rust {all|7-10|12-17|19-28|30-31|all}{lines:true,maxHeight:'15em',at:1}
|
||||
pub fn execute<const X16R: usize, const X16C: usize, const R: usize>(
|
||||
matrix: &Matrix<X16R, X16C>,
|
||||
input_vector: &Vector<X16C>,
|
||||
output_partial_sum_vector: &mut SVector<F16x16, R>,
|
||||
dummy: &impl PimOperand,
|
||||
) {
|
||||
// Load input vector into GRF-A registers
|
||||
for chunk in input_vector.0.iter() {
|
||||
chunk.execute_read();
|
||||
}
|
||||
|
||||
// Execute the MAC instructions without memory barriers
|
||||
for sub_matrix in matrix.0.iter() {
|
||||
for column_block in sub_matrix.fixed_rows::<1>(0).iter() {
|
||||
column_block.execute_read_async();
|
||||
}
|
||||
}
|
||||
|
||||
// Verify all memory accesses have finished
|
||||
barrier::dsb(barrier::SY);
|
||||
|
||||
// Copy the partial sums into the bank
|
||||
for chunk in output_partial_sum_vector
|
||||
.fixed_rows_with_step_mut::<X16R>(0, 16)
|
||||
.iter_mut()
|
||||
{
|
||||
chunk.execute_write();
|
||||
}
|
||||
|
||||
// Execute the EXIT instruction
|
||||
dummy.execute_read();
|
||||
}
|
||||
```
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- </Transform> -->
|
||||
|
||||
@@ -15,6 +15,49 @@ figureFootnoteNumber: 1
|
||||
</Footnote>
|
||||
</Footnotes>
|
||||
|
||||
<!--
|
||||
- Workload must be memory-bound
|
||||
|
||||
- memory-bound:
|
||||
- fully-connected layers
|
||||
- layers of recurrent neural networks (RNNs)
|
||||
|
||||
- not memory-bound:
|
||||
- convolutional layers
|
||||
- data reuse
|
||||
-->
|
||||
|
||||
---
|
||||
preload: false
|
||||
clicks: 1
|
||||
---
|
||||
|
||||
## Processing-in-Memory
|
||||
### Applicable Workloads
|
||||
<hr/>
|
||||
|
||||
<br>
|
||||
|
||||
- Convolutional layers have excessive data reuse
|
||||
- Small filter matrix fits onto on-chip cache
|
||||
|
||||
<br>
|
||||
|
||||
<Transform :scale="1.4">
|
||||
<div class="absolute left-175px top-1px">
|
||||
<img src="/cnn_input.svg">
|
||||
</div>
|
||||
<div v-if="$slidev.nav.clicks === 0" class="absolute left-175px">
|
||||
<img src="/cnn_filter.svg">
|
||||
</div>
|
||||
<div v-if="$slidev.nav.clicks === 1"
|
||||
v-motion
|
||||
:initial="{ x: 175, y: 0}"
|
||||
:enter="{ x: 335, y: 0, transition: { duration: 5000 }}">
|
||||
<img src="/cnn_filter.svg">
|
||||
</div>
|
||||
</Transform>
|
||||
|
||||
---
|
||||
|
||||
## Processing-in-Memory
|
||||
@@ -142,4 +185,4 @@ simulation models needed
|
||||
|
||||
research should not only focus on hardware but also explore the software side!
|
||||
|
||||
deswegen baue ich einen virutal protoype
|
||||
deswegen baue ich einen virutal protoype
|
||||
|
||||
Reference in New Issue
Block a user