dev-hsa,gpu-compute: Add timestamps to AMD HSA signals

The AMD specific HSA signal contains start/end timestamps for dispatch
packet completion signals. These are current always zero. These
timestamp values are used for profiling in the ROCr runtime.
Unfortunately, the GpuAgent::TranslateTime method in ROCr does not check
for zero values before dividing, causing applications that use profiling
to crash with SIGFPE. Profiling is used via hipEvents in the HACC
application, so these should be supported in gem5.

In order to handle writing the timestamp values, we need to DMA the
values to memory before writing the completion signal. This changes the
flow of the async completion signal write to be (1) read mailbox pointer
(2) if valid, write the mailbox data, other skip to 4 (3) write mailbox
data if pointer is valid (4) write timestamp values (5) write completion
signal. The application will process the timestamp data as soon as the
completion signal is received, so we need to ordering to ensure the DMA
for timestamps was completed.

HACC now runs to completion on GPUFS and has the same output was
hardware.

Change-Id: I09877cdff901d1402140f2c3bafea7605fa6554e
This commit is contained in:
Matthew Poremba
2023-09-11 09:22:26 -05:00
parent ae104cc431
commit 6a4b2bb096
3 changed files with 74 additions and 24 deletions

View File

@@ -117,6 +117,7 @@ class GPUCommandProcessor : public DmaVirtDevice
void updateHsaSignalDone(uint64_t *signal_value);
void updateHsaMailboxData(Addr signal_handle, uint64_t *mailbox_value);
void updateHsaEventData(Addr signal_handle, uint64_t *event_value);
void updateHsaEventTs(Addr signal_handle, amd_event_t *event_value);
uint64_t functionalReadHsaSignal(Addr signal_handle);
@@ -148,6 +149,9 @@ class GPUCommandProcessor : public DmaVirtDevice
HSAPacketProcessor *hsaPP;
TranslationGenPtr translate(Addr vaddr, Addr size) override;
// Keep track of start times for task dispatches.
std::unordered_map<Addr, Tick> dispatchStartTime;
/**
* Perform a DMA read of the read_dispatch_id_field_base_byte_offset
* field, which follows directly after the read_dispatch_id (the read