The functional HSA signal read was a hack left in the gpu-compute code.
In full system, this functional read is causing problems occasionally
with the translation not yet being in the page table. The error message
output by gem5 was a fatal message on the readBlob method in port proxy.
Changing this to a timing DMA fixes this problem.
This commit adds the various timing DMA functions to send and receive
response and clean up. A helper method "sendCompletionSignal" is added
to the GPUCommandProcessor because the indentation level was getting too
deep. This change applies only to FS mode. Code for SE mode is
equivalent to what it was before this commit.
Change-Id: I1bfcaa0a52731cdf9532a7fd0eb06ab2f0e09d48
The ROCm stack requires PCI express atomics. Currently the first PCI
CapabilityPtr does not point to anything, which signals to the OS
(Linux) that this is an early generation PCI device. As PCI express
atomics were introduced later, the CapabilityPtr needs to point to at
least a PCI express capability structure. This capability is defined as
0x10 in Linux. We additionally set the PCI atomic based bits and
implement device specific PCI configuration space reads and writes to
the amdgpu device.
With this commit, the output of simulation when loading the amdgpu
driver no longer outputs "PCIE atomics not supported". Further, an
application which uses PCIe atomics (PyTorch with a reduce_sum kernel)
now makes further progress.
Change-Id: I5e3866979659a2657f558941106ef65c2f4d9988
The capabilities for PCI express is a struct, instead of a union, like
the other capability unions. A union is used here to provide access to
the ordinal data values when reading/writing an offset while
simultaneously providing human readable field values that can be set
when writing the code.
This commit changes it to union which is likely should be. Nothing
appears to be using this union yet so it is likely an oversight.
Change-Id: I85fe7cc62914525c70fd7a5946d725ed308f8775
This SDMA packet is much more common starting around ROCm 5.4.
Previously this was mostly used to clear page tables after an
application ended and was therefore left unimplemented. It is
now used for basic operation like device memsets.
This patch implements constant fill as it is now necessary.
Change-Id: I9b2cf076ec17f5ed07c20bb820e7db0c082bbfbc
When using the new operator, delete should be called
on any allocated memory after it's use is complete.
Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019
* gpu-compute: Remove use of 'std::random_shuffle'
This was deprecated in C++14 and removed in C++17. This has been
replaced with std::random. This has been implemented to ensure
reproducible results despite (pseudo)random behavior.
Change-Id: Idd52bc997547c7f8c1be88f6130adff8a37b4116
* dev-amdgpu: Add missing 'overrides'
This causes warnings/errors in some compilers.
Change-Id: I36a3548943c030d2578c2f581c8985c12eaeb0ae
* dev: Fix Linux specific includes to be portable
This allows for compilation in non-linux systems (e.g., Mac OS).
Change-Id: Ib6c9406baf42db8caaad335ebc670c1905584ea2
* tests: Add 'VEGA_X86' build target to compiler-tests.sh
Change-Id: Icbf1d60a096b1791a4718a7edf17466f854b6ae5
* tests: Add 'GCN3_X86' build target to compiler-tests.sh
Change-Id: Ie7c9c20bb090f8688e48c8619667312196a7c123
The PCI read/write functions are atomic functions in gem5, meaning they
expect a response with a latency value on the same simulation Tick. For
reads to a PCI device, the response must also include a data value read
from the device.
The AMDGPU device has a PCI BAR which mirrors the frame buffer memory.
Currently reads are done atomically, but writes are sent to a DMA device
without waiting for a write completion ACK. As a result, it is possible
that writes can be queued in the DMA device long enough that another
read for a queued address arrives. This happens very deterministically
with the AtomicSimpleCPU and causes GPUFS to break with that CPU.
This change makes writes to the frame BAR atomic the same as reads. This
avoids that problem and as a result the AtomicSimpleCPU can now load the
driver for GPUFS simulations.
Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71898
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
According to the GIC specification (IHI0069) reserved addresses in the
GIC memory map are treated as RES0. We allow to disable this behaviour
and panic instead (reserved_res0 = False, which is what we have been
doing so far) to catch development bugs (in gem5 and in the guest SW)
Change-Id: I23f98519c2f256c092a52425735b8792bae7a2c7
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71138
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Patch https://gem5-review.googlesource.com/c/public/gem5/+/70040 added
support for a variable number of SDMA engines to support newer GPU
models. As part of this an SDMA IDs map was added to map from SDMA ID
number to the SDMA SimObject pointer. In order to get the correct
pointer in unserialize now, we need to store the ID in the checkpoint
and use that to index the new map. We can't simply assign using the loop
variable as the SDMAs might not be in order in the checkpoint and
additionally the checkpoint contains both the gfx and page offset for
the SDMA engines, so each SDMA is inserted into the SDMA offset map
(sdmaEngs) twice.
Change-Id: I08e9a8d785f467b6eebff8ab0a9336851c87258d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70878
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Currently gem5 assumes the amdgpu device to be Vega10. In order to
support more devices we need to handle situations where different
registers and addresses have the same functionality but different
offsets on different devices.
This changeset adds an NBIO class to handle device discovery and driver
initialization related tasks, pulling them out of the AMDGPUDevice
class. The offsets used for MMIOs are reworked slightly to use offsets
rather than absolute addresses. This is because we cannot determine the
absolute address in the constructor since the BAR has not been assigned
by the OS yet.
Change-Id: I14b364374e086e185978334425a4e265cf2760d0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70041
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently the amdgpu simulated device is assumed to be a Vega10. As a
result there are a few things that are hardcoded. One of those is the
number of SDMAs. In order to add a newer device, such as MI100+, we need
to enable a flexible number of SDMAs.
In order to support a variable number of SDMAs and with the MMIO offsets
of each device being potentially different, the MMIO interface for SDMAs
is changed to use an SDMA class method dispatch table with forwards a
32-bit value from the MMIO packet to the MMIO functions in SDMA of the
format `void method(uint32_t)`. Several changes are made to enable this:
- Allow the SDMA to have a variable MMIO base and size. These are
configured in python.
- An SDMA class method dispatch table which contains the MMIO offset
relative to the SDMA's MMIO base address.
- An updated writeMMIO method to iterate over the SDMA MMIO address
ranges and call the appropriate SDMA MMIO method which matches the
MMIO offset.
- Moved all SDMA related MMIO data bit twiddling, masking, etc. into
the MMIO methods themselves instead of in the writeMMIO method in
SDMAEngine.
Change-Id: Ifce626f84d52f9e27e4438ba4e685e30dbf06dbc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70040
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
If an MMIO was previously written and the driver reads it, we should
return the value that was previously read. This overwrites the MMIO
trace value which is the last resort fallback for finding an MMIO value.
This is needed to initialize newer GPU devices in gem5.
Change-Id: Ida2435290b706288e88518b5d920691cdb6dcc09
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70039
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
During GPUFS checkpoint restore, doorbells callbacks are created based
on certain MQD attributes. These callbacks are required to create new
SDMA doorbells. If these attributes are not present in the checkpoint,
the restore hangs indefinitely waiting for ioctl calls that access these
doorbells to finish execution. This commit adds the attributes required
for checkpoint restore to proceed.
Change-Id: Id3d1b7a2627d4c50133d923096495957a233f675
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70077
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
For non-KVM CPUs the VBIOS memory falls into an I/O hole and therefore
gets routed to the PIO bus in gem5. This gets routed to the GPU in the
case of a ROM write. We write to the ROM as a way to "load" the VBIOS
without creating holes in the KVM VM.
This write method allows the same scripts as KVM to be used by writing
to the ROM area and overwriting what might already be there from the
--gpu-rom option.
Change-Id: I8c2d2aa05a823569a774dfdd3bf2d2e773f38683
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70037
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Converted the generic DPRINTF messages for the GICv2 register reads
and writes (showing only the memory mapped address) to finer grained
DPRINTF messages showing the names of the mapped registers being
accessed.
This change is intended to make it easier to debug the GIC setup from
the gem5 debug trace.
Change-Id: Ic418b2ea8438fed6a5a810ebc0b686cd4c891cb0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69681
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The GPUFS checkpoint restoration mechanism expects to find a
PM4MapQueues packet in the checkpoint. Since this was not being
checkpointed, the restore phase retrieved a null packet which led to a
segmentation fault. This commit adds PM4MapQueues to the checkpoint and
restores it when deserializing the checkpoint
Change-Id: Ib74a9f36fe89d740a74f94314ada41ecc363abe9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69298
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Create a version of listen() which handles common logic internally,
including scanning for an available port number, and notifying what
port was chosen.
The port is managed internal to ListenSocket, so that the logic
interacting with it doesn't need to manually manage a port number, and
hence a port number does not need to exist for non AF_INET sockets.
Change-Id: Ie371eccc4d0da5e7b90714508e4cb72fb0091875
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69160
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
The initial value of register is set in constructor but there is no
standard way to assign the initial value and default value at the same
time out of that. So we decided to add an extra method to set the
initialValue to current register value. The usecase would be:
reg.get().field1 = val1;
reg.get().field2 = val2;
reg.resetInitialValue();
Change-Id: Ibc5454e2945cc6aff943e6599043edd8ca442f5f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67917
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
As of Linux 5.11, the MC146818 code was changed to avoid reading garbage
data that may occur if the is a read while the registers are being
updated:
github.com/torvalds/linux/commit/05a0302c35481e9b47fb90ba40922b0a4cae40d8
Previously toggling this bit was fine as Linux would check twice. It now
checks before and after reading time information, causing it to retry
infinitely until eventually Linux bootup fails due to watchdog timeout.
This changeset always sets update in progress to false. Since this is a
simulation, the updates probably will not be occurring at the same time
a read is occurring.
Change-Id: If0f440de9f9a6bc5a773fc935d1d5af5b98a9a4b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66731
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
When using the typed register template, most functionality of the class
can be controlled using callbacks. For instance, callbacks can be
installed to handle reads or writes to a register without having to
subclass the template and override those methods using inheritance.
The recently added reset() method did not follow this pattern though,
which has two problems. First, it's inconsistent with how the class is
normally used. Second, once you've defined a subclass, the reader,
writer, etc, callbacks still expect the type of the original class.
That means these have to either awkwardly use a type different from the
actual real type of the register, or use awkward, inefficient, and/or
dangerous casting to get back to the true type.
To address these problems, this change adds a resetter(...) method
which works like the reader(...) or writer(...) methods to optionally
install a callback to implement any special reset behavior.
Change-Id: Ia74b36616fd459c1dbed9304568903a76a4b55de
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67203
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
The ResetRequestPort and ResetResponsePort have a few problems:
1. A reset signal should happen during the time a reset is asserted,
or in other words the device should stay in reset and not doing
anything while reset is asserted. It should not immediately restart
execution while the reset is still held.
2. These names are misleading, since there is no response. These names
are inherited from other port types where there is an actual response.
There is a new generic SignalSourcePort and SignalSinkPort set of port
classes which are templated on the type of signal they propogate, and
which can be used in place of reset ports in c++. These ports can
still have a specialized role which will ensure that only reset ports
are connected to each other for a form of type checking, although
the underlying c++ instances are more interoperable than that.
Change-Id: Id98bef901ab61ac5b200dbbe49439bb2d2e6c57f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66675
Maintainer: Gabe Black <gabeblack@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These are largely compatibility wrappers around the Signal*Port
classes. The python versions of these types enforce more specific
compatibility, but on the c++ side the Signal*Port<bool> classes can
be used directly instead.
Change-Id: I1325074d0ed1c8fc6dfece5ac1ee33872cc4f5e3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66673
Maintainer: Gabe Black <gabeblack@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>