## Virtual Prototype ### Processing Units

- Integrate DRAMSys into gem5 - Implement PIM-HBM virtual prototype in DRAM model
--- layout: figure-side figureUrl: /data_structures.svg figureCaption: Data structures for instructions and register files --- ## Virtual Prototype ### Software Library



#### Software support library
- Provides data structures for operand data and microkernels - Executes programmed microkernels - Generate RD and WR requests - Insert memory barriers for synchronization --- ## Virtual Prototype ### GEMV Kernel

DRAM-side ```asm{all|1-8|9,10|11|12}{lines:true,at:1} MOV GRF_A #0, BANK MOV GRF_A #1, BANK MOV GRF_A #2, BANK MOV GRF_A #3, BANK MOV GRF_A #4, BANK MOV GRF_A #5, BANK MOV GRF_A #6, BANK MOV GRF_A #7, BANK MAC(AAM) GRF_B, BANK, GRF_A JUMP -1, 7 FILL BANK, GRF_B #0 EXIT ```
Host-side ```rust {all|7-10|12-17|22-28|30-31}{lines:true,maxHeight:'15em',at:1} pub fn execute( matrix: &Matrix, input_vector: &Vector, output_partial_sum_vector: &mut SVector, dummy: &impl PimOperand, ) { // Load input vector into GRF-A registers for chunk in input_vector.0.iter() { chunk.execute_read(); } // Execute the MAC instructions without memory barriers for sub_matrix in matrix.0.iter() { for column_block in sub_matrix.fixed_rows::<1>(0).iter() { column_block.execute_read_async(); } } // Verify all memory accesses have finished barrier::dsb(barrier::SY); // Copy the partial sums into the bank for chunk in output_partial_sum_vector .fixed_rows_with_step_mut::(0, 16) .iter_mut() { chunk.execute_write(); } // Execute the EXIT instruction dummy.execute_read(); } ```
--- layout: figure-side figureUrl: /bare_metal.svg --- ## Virtual Prototype ### Platform


- ARM processor model - Bare-metal kernel - Custom page table configuration - Non-PIM DRAM region mapped as cacheable memory - PIM DRAM region mapped as non-cacheable memory