2.4 KiB
2.4 KiB
Virtual Prototype
Processing Units
- Integrate DRAMSys into gem5
- Implement PIM-HBM virtual prototype in DRAM model
layout: figure-side figureUrl: /data_structures.svg figureCaption: Data structures for instructions and register files
Virtual Prototype
Software Library
Software support library
- Provides data structures for operand data and microkernels
- Executes programmed microkernels
- generate RD and WR requests
Virtual Prototype
GEMV Kernel
DRAM-side
MOV GRF_A #0, BANK
MOV GRF_A #1, BANK
MOV GRF_A #2, BANK
MOV GRF_A #3, BANK
MOV GRF_A #4, BANK
MOV GRF_A #5, BANK
MOV GRF_A #6, BANK
MOV GRF_A #7, BANK
MAC(AAM) GRF_B, BANK, GRF_A
JUMP -1, 7
FILL BANK, GRF_B #0
EXIT
<style>
code {
font-size: 8px
}
</style>
Host-side
pub fn execute<const X16R: usize, const X16C: usize, const R: usize>(
matrix: &Matrix<X16R, X16C>,
input_vector: &Vector<X16C>,
output_partial_sum_vector: &mut SVector<F16x16, R>,
dummy: &impl PimOperand,
) {
// Load input vector into GRF-A registers
for chunk in input_vector.0.iter() {
chunk.execute_read();
}
// Execute the MAC instructions without memory barriers
for sub_matrix in matrix.0.iter() {
for column_block in sub_matrix.fixed_rows::<1>(0).iter() {
column_block.execute_read_async();
}
}
// Verify all memory accesses have finished
barrier::dsb(barrier::SY);
// Copy the partial sums into the bank
for chunk in output_partial_sum_vector
.fixed_rows_with_step_mut::<X16R>(0, 16)
.iter_mut()
{
chunk.execute_write();
}
// Execute the EXIT instruction
dummy.execute_read();
}
layout: figure-side figureUrl: /bare_metal.svg
Virtual Prototype
Platform
- ARM processor model
- Bare-metal kernel
- Custom page table configuration
- Non-PIM DRAM region mapped as cacheable memory
- PIM DRAM region mapped as non-cacheable memory