Currently gem5 assumes the amdgpu device to be Vega10. In order to
support more devices we need to handle situations where different
registers and addresses have the same functionality but different
offsets on different devices.
This changeset adds an NBIO class to handle device discovery and driver
initialization related tasks, pulling them out of the AMDGPUDevice
class. The offsets used for MMIOs are reworked slightly to use offsets
rather than absolute addresses. This is because we cannot determine the
absolute address in the constructor since the BAR has not been assigned
by the OS yet.
Change-Id: I14b364374e086e185978334425a4e265cf2760d0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70041
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently the amdgpu simulated device is assumed to be a Vega10. As a
result there are a few things that are hardcoded. One of those is the
number of SDMAs. In order to add a newer device, such as MI100+, we need
to enable a flexible number of SDMAs.
In order to support a variable number of SDMAs and with the MMIO offsets
of each device being potentially different, the MMIO interface for SDMAs
is changed to use an SDMA class method dispatch table with forwards a
32-bit value from the MMIO packet to the MMIO functions in SDMA of the
format `void method(uint32_t)`. Several changes are made to enable this:
- Allow the SDMA to have a variable MMIO base and size. These are
configured in python.
- An SDMA class method dispatch table which contains the MMIO offset
relative to the SDMA's MMIO base address.
- An updated writeMMIO method to iterate over the SDMA MMIO address
ranges and call the appropriate SDMA MMIO method which matches the
MMIO offset.
- Moved all SDMA related MMIO data bit twiddling, masking, etc. into
the MMIO methods themselves instead of in the writeMMIO method in
SDMAEngine.
Change-Id: Ifce626f84d52f9e27e4438ba4e685e30dbf06dbc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70040
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
If an MMIO was previously written and the driver reads it, we should
return the value that was previously read. This overwrites the MMIO
trace value which is the last resort fallback for finding an MMIO value.
This is needed to initialize newer GPU devices in gem5.
Change-Id: Ida2435290b706288e88518b5d920691cdb6dcc09
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70039
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
During GPUFS checkpoint restore, doorbells callbacks are created based
on certain MQD attributes. These callbacks are required to create new
SDMA doorbells. If these attributes are not present in the checkpoint,
the restore hangs indefinitely waiting for ioctl calls that access these
doorbells to finish execution. This commit adds the attributes required
for checkpoint restore to proceed.
Change-Id: Id3d1b7a2627d4c50133d923096495957a233f675
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70077
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
For non-KVM CPUs the VBIOS memory falls into an I/O hole and therefore
gets routed to the PIO bus in gem5. This gets routed to the GPU in the
case of a ROM write. We write to the ROM as a way to "load" the VBIOS
without creating holes in the KVM VM.
This write method allows the same scripts as KVM to be used by writing
to the ROM area and overwriting what might already be there from the
--gpu-rom option.
Change-Id: I8c2d2aa05a823569a774dfdd3bf2d2e773f38683
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70037
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Converted the generic DPRINTF messages for the GICv2 register reads
and writes (showing only the memory mapped address) to finer grained
DPRINTF messages showing the names of the mapped registers being
accessed.
This change is intended to make it easier to debug the GIC setup from
the gem5 debug trace.
Change-Id: Ic418b2ea8438fed6a5a810ebc0b686cd4c891cb0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69681
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The GPUFS checkpoint restoration mechanism expects to find a
PM4MapQueues packet in the checkpoint. Since this was not being
checkpointed, the restore phase retrieved a null packet which led to a
segmentation fault. This commit adds PM4MapQueues to the checkpoint and
restores it when deserializing the checkpoint
Change-Id: Ib74a9f36fe89d740a74f94314ada41ecc363abe9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69298
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Create a version of listen() which handles common logic internally,
including scanning for an available port number, and notifying what
port was chosen.
The port is managed internal to ListenSocket, so that the logic
interacting with it doesn't need to manually manage a port number, and
hence a port number does not need to exist for non AF_INET sockets.
Change-Id: Ie371eccc4d0da5e7b90714508e4cb72fb0091875
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69160
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
The initial value of register is set in constructor but there is no
standard way to assign the initial value and default value at the same
time out of that. So we decided to add an extra method to set the
initialValue to current register value. The usecase would be:
reg.get().field1 = val1;
reg.get().field2 = val2;
reg.resetInitialValue();
Change-Id: Ibc5454e2945cc6aff943e6599043edd8ca442f5f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67917
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
As of Linux 5.11, the MC146818 code was changed to avoid reading garbage
data that may occur if the is a read while the registers are being
updated:
github.com/torvalds/linux/commit/05a0302c35481e9b47fb90ba40922b0a4cae40d8
Previously toggling this bit was fine as Linux would check twice. It now
checks before and after reading time information, causing it to retry
infinitely until eventually Linux bootup fails due to watchdog timeout.
This changeset always sets update in progress to false. Since this is a
simulation, the updates probably will not be occurring at the same time
a read is occurring.
Change-Id: If0f440de9f9a6bc5a773fc935d1d5af5b98a9a4b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66731
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
When using the typed register template, most functionality of the class
can be controlled using callbacks. For instance, callbacks can be
installed to handle reads or writes to a register without having to
subclass the template and override those methods using inheritance.
The recently added reset() method did not follow this pattern though,
which has two problems. First, it's inconsistent with how the class is
normally used. Second, once you've defined a subclass, the reader,
writer, etc, callbacks still expect the type of the original class.
That means these have to either awkwardly use a type different from the
actual real type of the register, or use awkward, inefficient, and/or
dangerous casting to get back to the true type.
To address these problems, this change adds a resetter(...) method
which works like the reader(...) or writer(...) methods to optionally
install a callback to implement any special reset behavior.
Change-Id: Ia74b36616fd459c1dbed9304568903a76a4b55de
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67203
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
The ResetRequestPort and ResetResponsePort have a few problems:
1. A reset signal should happen during the time a reset is asserted,
or in other words the device should stay in reset and not doing
anything while reset is asserted. It should not immediately restart
execution while the reset is still held.
2. These names are misleading, since there is no response. These names
are inherited from other port types where there is an actual response.
There is a new generic SignalSourcePort and SignalSinkPort set of port
classes which are templated on the type of signal they propogate, and
which can be used in place of reset ports in c++. These ports can
still have a specialized role which will ensure that only reset ports
are connected to each other for a form of type checking, although
the underlying c++ instances are more interoperable than that.
Change-Id: Id98bef901ab61ac5b200dbbe49439bb2d2e6c57f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66675
Maintainer: Gabe Black <gabeblack@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These are largely compatibility wrappers around the Signal*Port
classes. The python versions of these types enforce more specific
compatibility, but on the c++ side the Signal*Port<bool> classes can
be used directly instead.
Change-Id: I1325074d0ed1c8fc6dfece5ac1ee33872cc4f5e3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66673
Maintainer: Gabe Black <gabeblack@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When adding a long list of registers, it can be easy to miss one which
will offset all the registers after it. It can be hard to find those
sorts of problems, and tedious and error prone to fix them.
This change adds a mechanism to simply annotate what offset a register
should have. That should also make the register list more self
documenting, since you'll be able to easily see what offset a register
has from the source without having to count up everything in front of it.
Change-Id: Ia7e419ffb062a64a10106305f875cec6f9fe9a80
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66431
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This partly reverts commit ec75787aef
by fixing the original problem noted by Bobby (long regressions):
setupThreadContext has to be implemented otherswise the GICv3 cpu interface
will end up holding old references when switching TC/ISAs.
This new implementation is still setting up the cpu interface reference
in the ISA only when it is required, but it is storing the
TC/ISA reference within the interface every time the ISA::setupThreadContext
gets called.
Change-Id: I2f54f95761d63655162c253e887b872f3718c764
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65931
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Currently when RLC queues (user mode queues) are mapped, the read/write
pointers of the ring buffer are set to zero. However, these queues could
be unmapped and then remapped later. In that situation the read/write
pointers should be the previous value before unmapping occurred. Since
the read pointer gets reset to zero, the queue begins reading from the
start of the ring, which usually contains older packets. There is a 99%
chance those packets contain addresses which are no longer in the page
tables which will cause a page fault.
To fix this we update the MQD with the current read/write pointer values
and then writeback the MQD to memory when the queue is unmapped. This
requires adding a pointer to the MQD and the host address of the MQD
where it should be written back to. The interface for registering RLC
queue is also simplified. Since we need to pass the MQD anyway, we can
get values from it as well.
Fixes b+tree and streamcluster from rodinia (when using RLC queues).
Change-Id: Ie5dad4d7d90ea240c3e9f0cddf3e844a3cd34c4f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65791
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
An example case,
```python
mem_side_port = RequestPort(
"This port sends requests and " "receives responses"
)
```
This is the residue of running the python formatter.
This is done by finding all tokens matching the regex `"\s"(?![.;"])`
and manually replacing them by empty strings.
Change-Id: Icf223bbe889e5fa5749a81ef77aa6e721f38b549
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66111
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently the SDMA queue type is guessed in the trap method by looking
at which queue in the engine is processing packets. It is possible for
both queues to be processing (e.g., one queue sent a DMA and is waiting
then switch to another queue), triggering an assert.
Instead store the queue type in the queue itself and use that type in
trap to determine which ring ID to use for the interrupt packet.
Change-Id: If91c458e60a03f2013c0dc42bab0b1673e3dbd84
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65691
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The current SDMA wrap around handling only considers the ring buffer
location as seen by the GPU. Eventually when the end of the SDMA ring
buffer is reached, the driver waits until the rptr written back to the
host catches up to what the driver sees before wrapping around back to
the beginning of the buffer. This writeback currently does not happen at
all, causing hangs for applications with a lot of SDMA commands.
This changeset first fixes the sizes of the queues, especially RLC
queues, so that the wrap around occurs in the correct place. Second, we
now store the rptr writeback address and the absoluate (unwrapped) rptr
value in each SDMA queue. The absolulte rptr is what the driver sends to
the device and what it expects to be written back.
This was tested with an application which basically does a few hundred
thousand hipMemcpy() calls in a loop. It should also fix the issue with
pannotia BC in fullsystem mode.
Change-Id: I53ebdcc6b02fb4eb4da435c9a509544066a97069
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65351
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
The PM4 release_mem packet is used as a DMA fence in the driver. It
specifies which queue the interrupt came from by encoding the me, pipe,
and queue fields from the map_queue packet into the interrupt ring ID.
Currently these fields are incorrect because (1) the order in the
bitfield is backwards, (2) the queue constructor assigns a pointer to
the PM4MapQueue packet containing this data to the dmaBuffer which gets
deleted in short order, and (3) the order of the encoding of ring ID is
incorrect.
This change fixes these issues by (1) placing the struct vales in
correct order, (2) creating a const copy of the dmaBuffer on
construction, and (3) using the ring ID encoding expected by the driver:
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/roc-4.3.x/
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c#L5989
Change-Id: I72c382980e57573f8a8a6879912c4139c7e2f505
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65095
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
The PM4 NOP header is used to insert spaces in the PM4 ring and can
therefore be any size. This includes zero. A size of zero is denoted by
a value of 0x3fff in the NOP packet header. Currently we assume this
means the remainder of the PM4 queue up to the wptr is empty/NOPs. This
is not always true.
This changeset reworks the PM4 NOP packet to handle the value of 0x3fff
as a special value and advances the rptr by 0 bytes. This fixes issues
where there were additional packets in the queue which were being
skipped over by fast forwarding. Since those packets could be anything,
that leads to undefined behavior afterwards.
Change-Id: I3f5c3f4b7dd50f93ba503fea97454a9d41771e30
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65094
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>