gpu-compute: Use timing DMAs for GPUFS HSA signals

The functional HSA signal read was a hack left in the gpu-compute code.
In full system, this functional read is causing problems occasionally
with the translation not yet being in the page table. The error message
output by gem5 was a fatal message on the readBlob method in port proxy.
Changing this to a timing DMA fixes this problem.

This commit adds the various timing DMA functions to send and receive
response and clean up. A helper method "sendCompletionSignal" is added
to the GPUCommandProcessor because the indentation level was getting too
deep. This change applies only to FS mode. Code for SE mode is
equivalent to what it was before this commit.

Change-Id: I1bfcaa0a52731cdf9532a7fd0eb06ab2f0e09d48
This commit is contained in:
Matthew Poremba
2023-08-25 09:35:55 -05:00
parent 5cb604559a
commit 57b3d2897c
4 changed files with 118 additions and 21 deletions

View File

@@ -389,20 +389,16 @@ HSAPacketProcessor::processPkt(void* pkt, uint32_t rl_idx, Addr host_pkt_addr)
dep_sgnl_rd_st->resetSigVals();
// The completion signal is connected
if (bar_and_pkt->completion_signal != 0) {
// HACK: The semantics of the HSA signal is to
// decrement the current signal value
// I'm going to cheat here and read out
// the value from main memory using functional
// access, and then just DMA the decremented value.
uint64_t signal_value = gpu_device->functionalReadHsaSignal(\
bar_and_pkt->completion_signal);
// The semantics of the HSA signal is to decrement the current
// signal value by one. Do this asynchronously via DMAs and
// callbacks as we can safely continue with this function
// while waiting for the next packet from the host.
DPRINTF(HSAPacketProcessor, "Triggering barrier packet" \
" completion signal! Addr: %x\n",
bar_and_pkt->completion_signal);
gpu_device->updateHsaSignal(bar_and_pkt->completion_signal,
signal_value - 1);
gpu_device->sendCompletionSignal(
bar_and_pkt->completion_signal);
}
}
if (dep_sgnl_rd_st->pendingReads > 0) {