--- layout: figure figureUrl: /dramsys.svg figureCaption: The PIM-HBM model integrated into DRAMSys --- ## Virtual Prototype ### Processing Units
--- layout: figure-side figureUrl: /data_structures.svg figureCaption: Data structures for instructions and register files --- ## Virtual Prototype ### Software Library


- Software support library - Provides data structures for PIM-HBM - Adhering special memory layout requirements - Executes programmed microkernels --- layout: figure-side figureUrl: /bare_metal.svg --- ## Virtual Prototype ### Platform


- Bare-metal kernel executes on ARM processor model - Custom page table configuration - Non-PIM DRAM region mapped as cacheable memory - PIM DRAM region mapped as non-cacheable memory --- ## Virtual Prototype ### Platform

DRAM-side ```asm{all|1-8|9,10|11|12|all}{lines:true,at:1} MOV GRF_A #0, BANK MOV GRF_A #1, BANK MOV GRF_A #2, BANK MOV GRF_A #3, BANK MOV GRF_A #4, BANK MOV GRF_A #5, BANK MOV GRF_A #6, BANK MOV GRF_A #7, BANK MAC(AAM) GRF_B, BANK, GRF_A JUMP -1, 7 FILL BANK, GRF_B #0 EXIT ```
Host-side ```rust {all|7-10|12-17|19-28|30-31|all}{lines:true,maxHeight:'15em',at:1} pub fn execute( matrix: &Matrix, input_vector: &Vector, output_partial_sum_vector: &mut SVector, dummy: &impl PimOperand, ) { // Load input vector into GRF-A registers for chunk in input_vector.0.iter() { chunk.execute_read(); } // Execute the MAC instructions without memory barriers for sub_matrix in matrix.0.iter() { for column_block in sub_matrix.fixed_rows::<1>(0).iter() { column_block.execute_read_async(); } } // Verify all memory accesses have finished barrier::dsb(barrier::SY); // Copy the partial sums into the bank for chunk in output_partial_sum_vector .fixed_rows_with_step_mut::(0, 16) .iter_mut() { chunk.execute_write(); } // Execute the EXIT instruction dummy.execute_read(); } ```