There is a race condition in VIPER where an atomic issued to the same
address can occur resulting in multiple trigger messages signalling the
compleition of the atomic operation. The first message was deallocating
the TBE causing the second message to dereference a nullptr when looking
up the TBE.
A counter is added to track the number of in flight AtomicDone trigger
messages. The AtomicDone is not called until the last in flight message
arrives at the trigger queue. The remaining messages call AtomicNotDone
which simply pops the message from the queue and keeps the TBE
allocated.
Change-Id: Ie1de0436861a7c393ad6d2fb2faceb83c18d4cc3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39175
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
A prior commit (f6ec145fc0) fixed early LGKM decrementing for flat loads
and stores, but failed to address flat atomics.
Per the GCN3 ISA, LGKM count is decremented on flat atomics with return
when the data has been returned. This patch checks if the flat
instruction is an atomic with return, and decrements LGKM count if so.
Change-Id: I5c0c2c205a8b21327d4c42ba71c59842c15bd63b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39155
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently, the name of the stats group of thr Root object is
Stats, which is likely to be confused with the Stats namespace.
This commit renames the struct to RootStats. This allows the
Stats namespace to be expressed as `Stats::`, which is
consistent with how the namespace is accessed in other part of
gem5.
Change-Id: Ieb425c3df1f5c0d5f11b1a467a36b2e0e07b2771
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38915
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This prototype might convince the compiler that it should refer to
curTick indirectly through the linker, but curTick is inline (and making
it not has very high overhead), so there's a decent chance no non-inline
version will be emitted.
Change-Id: Iab5aacb145d4a974bc1bc0abdf7275c40fbb9c38
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38997
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The original implementation of curTick used a thread local variable,
_curEventQueue, and its getCurTick() method, to retrieve the curTick for
the currently active event queue. That meant that core.hh needed to
include eventq.hh so that the EventQueue type was available, which also
indirectly brought in a lot of other dependencies.
Unfortunately this couldn't easily be fixed by making curTick()
non-inline since this added a significant amount of overhead when
tested.
Instead, this change makes the code in core.hh/core.cc keep a pointer
directly to a Tick. The code which sets _curEventQueue now also sets
that pointer when _curEventQueue changes.
The way curTick() now reaches into the guts of the current EventQueue
directly is not great from a modularity perspective, but if curTick is
considered an extension of the EventQueue, then it's just odd that this
part is broken out into a different file.
Change-Id: I8341b40fe75e90672eb1d70e1a368975fcbfe926
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38996
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
SimplePoolManager doesn't allow mapping of two WGs
simultaneously on the same Compute Unit (provided
the previous WG has been mapped to all the SIMDs)
even if there is sufficient VRF and SRF space
available.
DynPoolManager takes care of that by dynamically
allocating and deallocating register file space
to wavefronts
Change-Id: I2255c68d4b421615d7b231edc05d3ebb27cbd66c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32034
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>
Note:
Some less frequently needed CSR registers (e.g. hpm and pmp registers)
are commented out on purpose. Instructions to add them back are
described in remote_gdb.hh comments. This is to avoid spamming the
remote GDB log when using `info reg all`.
Changes:
1. Added GDB XML files to the ext/ directory (mostly from QEMU)
2. Modified RiscvGdbRegCache
- struct r: added CSR registers
- getRegs, setRegs: reading / setting CSR registers
3. Modified RemoteGDB
- availableFeatures: indicate support for XML registers
- getXferFeaturesRead: return XML blobs
Change-Id: Ica03b63edb3f0c9b6a7789228b995891dbfb26b2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38955
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When in non-caching mode, performance metrics are not meaningful, and
we're just interested in functional level behavior. Going through the
DMA FIFO in the HDLCD controller is very inefficient, and prevents
reading a batch of pixels from memory all in one go.
Change-Id: I3fb6d4d06730b5a94b5399f01aa02186baa5c9b3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38721
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Before the commit, the bootloader had a hardcoded entry point that it
would jump to.
However, the Linux kernel arm64 v5.8 forced us to change the kernel
entry point because the required memory alignment has changed at:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?h=v5.8&id=cfa7ede20f133cc81cef01dc3a516dda3a9721ee
Therefore the only way to have a single bootloader that boots both
pre-v5.8 and post-v5.8 kernels is to pass that information from gem5
to the bootloader, which we do in this patch via registers.
This approach was already used by the 32-bit bootloader, which passed
that value via r3, and we try to use the same register x3 in 64-bit.
Since we are now passing this information, the this patch also removes
the hardcoding of DTB and cpu-release-addr, and also passes those
values via registers.
We store the cpu-release-addr in x5 as that value appears to have a
function similar to flags_addr, which is used only in 32-bit arm and
gets stored in r5.
This commit renames atags_addr to dtb_addr, since both are mutually
exclusive, and serve a similar purpose, DTB being the newer recommended
approach.
Similarly, flags_addr is renamed to cpu_release_addr, and it is moved
from ArmSystem into ArmFsWorkload, since it is not an intrinsic system
property, and should be together with dtb_addr instead.
Before this commit, flags_addr was being set from FSConfig.py and
configs/example/arm/devices.py to self.realview.realview_io.pio_addr
+ 0x30. This commit moves that logic into RealView.py instead, and
sets the flags address 8 bytes before the start of the DTB address.
JIRA: https://gem5.atlassian.net/browse/GEM5-787
Change-Id: If70bea9690be04b84e6040e256a9b03e46710e10
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35076
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These functions return iterators which are inconsistent with the usage
model for this type. It should be accessed using the peek, push, and pop
methods and not iterators. If you need a class with iterators which is
oriented around accessing individual elements at a time, the
CircularQueue type is likely a better choice.
Change-Id: I9f37eab12e490b63d870d378a91f601dad353f25
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38998
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This is implemented similary to the NonCachingSimpleCPU, except that
both the normal atomic and noncaching atomic behaviors are implemented
by the same class. The sendDma function now dispatches to a method which
implements one or the other behavior since that function was getting too
big and complex.
Change-Id: I7972971ef41d1373424e587cf67c8444d50de748
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38719
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Instead of generating all of the DMA packets when a request is
initiated, keep track of the necessary properties and generate them as
needed.
The primary benefit of this approach is that if we don't actually need
packets, for instance if we can satisfy the request using a memory
backdoor, we can just skip them. Otherwise we'll have wasted time
creating them, and then have to clean them up.
Change-Id: I04d399fb7bce1ff9a44979c311be356baf2db555
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38717
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
CircularQueue provides iterators which make it easier for users to
interact with it and helps abstract its internal state, but at the same
time it prevents standard algorithms like std::copy from recognizing
opportunities to use bulk copies to speed up execution. It also hides
the seams when wrapping around the buffer happens which std::copy
wouldn't know how to handle.
CircleBuf seems to be intended as a simpler type which doesn't hold
complex entries like the CircularQueue does, and instead just acts as a
wrap around buffer, like the name suggests. This change reimplements it
to not use CircularQueue, and to instead use an underlying vector.
The way internal indexing is handled CircularQueue was simplified
recently, and using the same scheme here means that this code is
actually not much more verbose than it was before. It also intrinsically
handles indexing and bulk accesses, and uses std::copy_n in a way that
lets it recognize and take advantage of contiguous storage and bulk
copies.
Change-Id: I78e7cfe174c52f60c95c81e5cd3d71c6052d4d41
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38896
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabe.black@gmail.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This method lets you stretch the current chunk, if you want to skip over
some of the upcoming chunks since you've already handled their bytes.
For instance, if you were iterating over pages in a range of virtual
addresses, you might be able to handle multiple page sized chunks at
once if they were represented by a single large sized page table entry.
This mechanism would let you move past all the pages you had just
handled without having to walk through them all one by one.
Change-Id: I7d962f548947b77f0aa1b725036dbcc9e1b28659
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38718
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
FLAT instructions used to decrement lgkm count on execute, while the
GCN3 ISA specifies that lgkm count should be decremented on data being
returned or data being written.
This patch changes it so that lgkm is decremented after initiateAcc (for
stores) and after completeAcc (for loads) to better reflect the ISA
definition.
This fixes a bug where waitcnts would be satisfied even though the
memory access wasn't completed, which lead to instructions using the
wrong data.
Change-Id: I596cb031af9cda8d47a1b5e146e4a4ffd793d36c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38696
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Since the beginning fs_bigLITTLE has been pointing to a default
default_rcs = 'bootscript.rcS'
as a System.readfile parameter. That script is not present in
the gem5 repo and all the other fs scripts (starter_fs.py, fs.py
through Options.py) are using an emptry string as default
readfile param value.
We are hence aligning to the other scripts by removing this
default value
Change-Id: I20dc7714deae890d61706459c8d13bd8f5aac7a0
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38815
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This function had a comment claiming that returning an arbitrary request
from the call was necessary for page table walker statistics, but
looking at the actual code, the return type was never used. Also
returning whatever the last request happens to be seems arbitrary, and a
bad boundary for modularization. The page table walker should not depend
on the internal implementation of the DMA port.
Change-Id: I00281fbaf6aeb85b15baf54f3d4a23ca1ac72b8b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38716
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
ARM reaches in and pads out the source register index list behind the
parser's back to force dest regs to also be sources in case an
instruction fails predication and needs to forward the original register
values. It shouldn't be hacking up these values in that way, but since
it is, this will let it continue to do so while still fitting in the new
system where each instruction allocates its src/dest reg index arrays to
size.
Change-Id: Ia296be9f63123f18f6cdc0d3bb1314d33e759b3a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38380
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This class had been trying to keep all indices within the modulus of the
queue size, and to use all elements in the underlying storage by making
the empty and full conditions alias, differentiated by a bool. To keep
track of the difference between a storage location on one trip around
the queue vs other times around, ie an alias once the indices had
wrapped, it also keep track of a "round" value in both the queue itself,
and any iterators it created.
All this bookkeeping significantly complicated the data structure.
Instead, this change modifies it to keep track of a monotonically
increasing index which is wrapped at the time it's used. Only the head
index and current size need to be tracked in the queue itself, and only
a pointer to the queue and an index need to be tracked in the iterators.
Theoretically, it's possible that this value could overflow eventually
since it increases forever, unlike before where the index wrapped and
was never larger than the queue's capacity. In practice, the type of the
index was changed from a uint32_t to a size_t, probably a 64 bit value
in modern systems, which will hold much larger values. Also, the round
counter and the index values together acted like a smaller than 64 bit
value anyway, since the round counter would overflow after 2^32 times
around a less than 2^32 entry queue.
One minor interface difference is that the head() and tail() values
returned by the queue are no longer pre-wrapped to be modulo the queue's
capacity. As long as consumers don't try to be overly clever and feed in
fixed values, do their own bounds checking, etc., something that would
be cumbersome considering the wrapping nature of the structure, this
shouldn't be an issue.
Also, since external consumers no longer need to worry about wrapping,
since only one of them was used in only one place, and because they
weren't even marked as part of the interface, the modulo helper
functions have been eliminated from the queue. If other code wants to
perform modulo arithmetic for some reason (which the queue no longer
requires) they can accomplish basically the same thing in basically the
same amount of code using normal math.
Also, rather than inherit from std::vector, this change makes the vector
internal to the queue. That prevents methods of the vector that aren't
aware of the circular nature of the structure from leaking out if
they're not overridden or otherwise proactively blocked.
On top of simplifying the implementation, this also makes it perform
*slightly* better. To measure that, I ran the following command:
$ time build/ARM/base/circular_queue.test.opt --gtest_repeat=100000 > /dev/null
and found a few percent improvement in total run time. While this
difference was small and not measured against realistic usage of the
data structure, it was still measurable, and at minimum doesn't seem to
have hurt performance.
Change-Id: Ic2baa28de135be7086fa94579bbec451d69b3b15
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38478
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>