The initial value of register is set in constructor but there is no
standard way to assign the initial value and default value at the same
time out of that. So we decided to add an extra method to set the
initialValue to current register value. The usecase would be:
reg.get().field1 = val1;
reg.get().field2 = val2;
reg.resetInitialValue();
Change-Id: Ibc5454e2945cc6aff943e6599043edd8ca442f5f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67917
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
As of Linux 5.11, the MC146818 code was changed to avoid reading garbage
data that may occur if the is a read while the registers are being
updated:
github.com/torvalds/linux/commit/05a0302c35481e9b47fb90ba40922b0a4cae40d8
Previously toggling this bit was fine as Linux would check twice. It now
checks before and after reading time information, causing it to retry
infinitely until eventually Linux bootup fails due to watchdog timeout.
This changeset always sets update in progress to false. Since this is a
simulation, the updates probably will not be occurring at the same time
a read is occurring.
Change-Id: If0f440de9f9a6bc5a773fc935d1d5af5b98a9a4b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66731
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
When using the typed register template, most functionality of the class
can be controlled using callbacks. For instance, callbacks can be
installed to handle reads or writes to a register without having to
subclass the template and override those methods using inheritance.
The recently added reset() method did not follow this pattern though,
which has two problems. First, it's inconsistent with how the class is
normally used. Second, once you've defined a subclass, the reader,
writer, etc, callbacks still expect the type of the original class.
That means these have to either awkwardly use a type different from the
actual real type of the register, or use awkward, inefficient, and/or
dangerous casting to get back to the true type.
To address these problems, this change adds a resetter(...) method
which works like the reader(...) or writer(...) methods to optionally
install a callback to implement any special reset behavior.
Change-Id: Ia74b36616fd459c1dbed9304568903a76a4b55de
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67203
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
The ResetRequestPort and ResetResponsePort have a few problems:
1. A reset signal should happen during the time a reset is asserted,
or in other words the device should stay in reset and not doing
anything while reset is asserted. It should not immediately restart
execution while the reset is still held.
2. These names are misleading, since there is no response. These names
are inherited from other port types where there is an actual response.
There is a new generic SignalSourcePort and SignalSinkPort set of port
classes which are templated on the type of signal they propogate, and
which can be used in place of reset ports in c++. These ports can
still have a specialized role which will ensure that only reset ports
are connected to each other for a form of type checking, although
the underlying c++ instances are more interoperable than that.
Change-Id: Id98bef901ab61ac5b200dbbe49439bb2d2e6c57f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66675
Maintainer: Gabe Black <gabeblack@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These are largely compatibility wrappers around the Signal*Port
classes. The python versions of these types enforce more specific
compatibility, but on the c++ side the Signal*Port<bool> classes can
be used directly instead.
Change-Id: I1325074d0ed1c8fc6dfece5ac1ee33872cc4f5e3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66673
Maintainer: Gabe Black <gabeblack@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When adding a long list of registers, it can be easy to miss one which
will offset all the registers after it. It can be hard to find those
sorts of problems, and tedious and error prone to fix them.
This change adds a mechanism to simply annotate what offset a register
should have. That should also make the register list more self
documenting, since you'll be able to easily see what offset a register
has from the source without having to count up everything in front of it.
Change-Id: Ia7e419ffb062a64a10106305f875cec6f9fe9a80
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66431
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This partly reverts commit ec75787aef
by fixing the original problem noted by Bobby (long regressions):
setupThreadContext has to be implemented otherswise the GICv3 cpu interface
will end up holding old references when switching TC/ISAs.
This new implementation is still setting up the cpu interface reference
in the ISA only when it is required, but it is storing the
TC/ISA reference within the interface every time the ISA::setupThreadContext
gets called.
Change-Id: I2f54f95761d63655162c253e887b872f3718c764
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65931
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Currently when RLC queues (user mode queues) are mapped, the read/write
pointers of the ring buffer are set to zero. However, these queues could
be unmapped and then remapped later. In that situation the read/write
pointers should be the previous value before unmapping occurred. Since
the read pointer gets reset to zero, the queue begins reading from the
start of the ring, which usually contains older packets. There is a 99%
chance those packets contain addresses which are no longer in the page
tables which will cause a page fault.
To fix this we update the MQD with the current read/write pointer values
and then writeback the MQD to memory when the queue is unmapped. This
requires adding a pointer to the MQD and the host address of the MQD
where it should be written back to. The interface for registering RLC
queue is also simplified. Since we need to pass the MQD anyway, we can
get values from it as well.
Fixes b+tree and streamcluster from rodinia (when using RLC queues).
Change-Id: Ie5dad4d7d90ea240c3e9f0cddf3e844a3cd34c4f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65791
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
An example case,
```python
mem_side_port = RequestPort(
"This port sends requests and " "receives responses"
)
```
This is the residue of running the python formatter.
This is done by finding all tokens matching the regex `"\s"(?![.;"])`
and manually replacing them by empty strings.
Change-Id: Icf223bbe889e5fa5749a81ef77aa6e721f38b549
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66111
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently the SDMA queue type is guessed in the trap method by looking
at which queue in the engine is processing packets. It is possible for
both queues to be processing (e.g., one queue sent a DMA and is waiting
then switch to another queue), triggering an assert.
Instead store the queue type in the queue itself and use that type in
trap to determine which ring ID to use for the interrupt packet.
Change-Id: If91c458e60a03f2013c0dc42bab0b1673e3dbd84
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65691
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The current SDMA wrap around handling only considers the ring buffer
location as seen by the GPU. Eventually when the end of the SDMA ring
buffer is reached, the driver waits until the rptr written back to the
host catches up to what the driver sees before wrapping around back to
the beginning of the buffer. This writeback currently does not happen at
all, causing hangs for applications with a lot of SDMA commands.
This changeset first fixes the sizes of the queues, especially RLC
queues, so that the wrap around occurs in the correct place. Second, we
now store the rptr writeback address and the absoluate (unwrapped) rptr
value in each SDMA queue. The absolulte rptr is what the driver sends to
the device and what it expects to be written back.
This was tested with an application which basically does a few hundred
thousand hipMemcpy() calls in a loop. It should also fix the issue with
pannotia BC in fullsystem mode.
Change-Id: I53ebdcc6b02fb4eb4da435c9a509544066a97069
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65351
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
The PM4 release_mem packet is used as a DMA fence in the driver. It
specifies which queue the interrupt came from by encoding the me, pipe,
and queue fields from the map_queue packet into the interrupt ring ID.
Currently these fields are incorrect because (1) the order in the
bitfield is backwards, (2) the queue constructor assigns a pointer to
the PM4MapQueue packet containing this data to the dmaBuffer which gets
deleted in short order, and (3) the order of the encoding of ring ID is
incorrect.
This change fixes these issues by (1) placing the struct vales in
correct order, (2) creating a const copy of the dmaBuffer on
construction, and (3) using the ring ID encoding expected by the driver:
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/roc-4.3.x/
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c#L5989
Change-Id: I72c382980e57573f8a8a6879912c4139c7e2f505
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65095
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
The PM4 NOP header is used to insert spaces in the PM4 ring and can
therefore be any size. This includes zero. A size of zero is denoted by
a value of 0x3fff in the NOP packet header. Currently we assume this
means the remainder of the PM4 queue up to the wptr is empty/NOPs. This
is not always true.
This changeset reworks the PM4 NOP packet to handle the value of 0x3fff
as a special value and advances the rptr by 0 bytes. This fixes issues
where there were additional packets in the queue which were being
skipped over by fast forwarding. Since those packets could be anything,
that leads to undefined behavior afterwards.
Change-Id: I3f5c3f4b7dd50f93ba503fea97454a9d41771e30
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65094
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
SDMA traps are used in the driver as a DMA fence. To pass a fence, the
SDMA sends the driver the interrupt context from a trap packet and the
ring ID which specifies which queue in the SDMA engine is passing a
fence. Currently the interrupt context is using the wrong value in the
packet and the ring ID is hard-coded to always be the gfx queue.
This changeset uses the correct interrupt context from the SDMA packet
and sets the ring ID to either 0 if the gfx queue is currently being
processed or 3 if the page queue is being processed.
The relevant interrupt service routine in the driver can be found at:
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/roc-4.3.x/
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c#L2129
Change-Id: Ie4a4a9d6ab1d3bf83bf76bb57a02a91100217b51
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65093
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The interrupt handler's base address is sent via MMIO and must be
shifted by 8 bits to convert to a byte address. The current code is
shifting the MMIO dword first then assigning, resulting in the top 8
bits being shifted out.
This changeset fixes the issue by assigning the dword to the 64-bit
address first then shifting after. Similarly, the upper dword is cast to
a 64-bit value first before shifting.
This fixes some "fence fallback timeout" errors in the m5term output.
These timeouts become a problem because the driver will reset after a
few hundred of them, killing any running GPU applications as part of the
process.
Change-Id: I0beec313f533765c94063bcf4de8c65aacf2986b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65092
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
The GART table is a legacy 1-level page table primarily used for
supervisor mode accesses to GPUs. The PTE size is 64-bits, not 32-bit.
This causes memory sizes >3GB (in X86) to fail loading amdgpu driver.
This changeset fixes the issue by setting the GART mappings to the
correct data type.
Change-Id: Ibfba2443675fe28316d26afa5f1a14885fdce40c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65091
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
The current implementation of SDMA copy calls the GPU memory manager's
read/write method one time passing a physical address as the
source/destination. This implicitly assumes the physical addresses are
contiguous which is generally not true for large allocations. This
results in reading from/writing to the wrong address.
This changeset fixes the problem by copying large copies in chunks of
the minimum possible page size on the GPU (4kB). Each page is translated
seperately to ensure the correct physical address. The final copy "done"
callback is only used for the last transfer. The transfers should
complete in order so the copy command will not complete until all chunks
have been copied. Tested and verified on an application with a large
allocation (~5GB).
Change-Id: I27018a963da7133f5e49dec13b0475c3637c8765
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64752
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Some CPU wrappers like the Fastmodel one do extend the
ThreadContext interface in order to retrieve system register
state... By bypassing the TC interface and by using the ISA
instead, we are basically forcing users to extend the ISA
as well to intercept these calls.
So with this patch we are making sure every system register is accessed
(like HCR_EL2 or SCR_EL3) through the thread context. This of course
does not apply to the CPU interface registers as we still use the ISA
storage for them. In the future we should probably move that storage
from the ISA class to the Gicv3CPUInterface class itself
This is also simplifying Gicv3CPUInterface::isEL3OrMon:
currEL already covers the AArch32 case so no need to
differentiate between AArch32 and AArch64
Change-Id: I446a14a6e12b77e1a62040b3422f79ae52cc9eec
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64913
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
This debug flag is used to print spammy SDMA DPRINTFs, such as an SDMA
copy printing the data of large transfers 8 bytes per line at a time. For
those prints, the SDMAEngine flag will now only print the first and last
qword of the transfer and the new SDMAData flag is needed for verbose
data printing. This makes the SDMAEngine flag still useful for verifying
copies in applications with predictable data such as square.
Additionally, the memory allocation/deallocation done solely for a print
statement is removed in favor of casting the data to the printed type.
Change-Id: I18c1918ef9085cca4570f79881ee63d510ccc32f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64452
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
This map was originally used for fast access to the GART table. It is no
longer needed as the table has been moved to the AMDGPUVM class. Along
with commit 12ec5f9172 which reads
functionally from device memory, this table is no longer needed and is
essentially a duplicate copy of device memory for anything written over
the PCI BAR.
This changeset removes the map entirely which will reduce the memory
footprint of simulations and potentially avoid stale copies of data
when reading over the PCI BAR.
Change-Id: I312ae38f869c6a65e50577b1c33dd055078aaf32
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63951
Reviewed-by: Matt Sinclair <mattdsinclair.wisc@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
SDMA atomic packets are used in conjunction with RLC queues in SDMA for
synchronization similar to how HSA signals are used with BLIT kernels
when SDMA is disabled. Implement a skeleton of the SDMA atomic packet
methods as well as the atomic add64 operation.
The atomic add operation appears to be the only operation used in ROCm,
so this implementation is fairly complete. See:
https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/
rocm-4.2.x/src/core/runtime/amd_blit_sdma.cpp#L880
Change-Id: I62cc337f2ffe590bdb947b48053760ee8b3a6f32
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63174
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There can only ever be two RLC queues maximum. Use this information for
a simpler data structure to store doorbell information. The patch
changes the std::unordered_map previously used to std::array. This will
also be useful in avoiding erase-while-iterating issues needed to
unregister all queues at once.
Change-Id: I95600e40de51cb1a992a20bcebaf7580ea4d0be8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63172
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Previously framebuffer reads would try reading from MMIO trace, special
addresses, and then anything previously written to a special address
range. This does not handle direct large BAR reads, causing incorrect
results in some applications that were doing this.
Rework the readFramebuffer method to do the following. Remove the MMIO
trace read altogether, as there were not any framebuffer reads from the
trace to begin with. Read special addresses first to avoid overwriting
by previous writes. Next read previous writes to special ranges. The
special range is the GART table. These are required for functional
translations. Lastly read from the device memory directly. This does a
functional read required by the PCIDevice read method which is
non-timing. Reading from device memory is preferred over the map type
used for GART to avoid duplication of a potentially huge amount of data.
With this changeset all but one of the HIP samples and HIP examples
applications now run and pass verification of results.
Change-Id: Id3b788bfc5eaf17cfa1897f25d26f3725d4db321
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63171
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
The memory management hub ("mmhub") is an aperture that aliases the GPU
device memory. MMHUB addresses functionally map to the same device
address, with the exception that it is guaranteed not to overlap with
host memory. This is useful in gem5 for APIs with Addr type as it
prevents sending e.g., DMAs to the wrong place.
Change-Id: Ia296809a8dc2c5fbdeba6d70cd53215f9ab36c93
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63031
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>