Currently, if a maintainer is removed from a change, the maintainer
will be added again. This change prevents the bot from adding the
removed maintainer again.
The bot will query all updates related to reviewer addition/removal
for each new change. If a reviewer has ever been added/removed
from a change, that reviewer won't be added to that change again.
Change-Id: Ifaab5ebd7ebf3e6453b2551d3e37c1b9e214c906
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50187
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
The LSQSenderState that was attached to Request was not useful.
All the fields were either a duplicate of information in the
LSQRequest or totally unused.
The LSQRequest class now inherits from Packet::SenderState and is
attached to the Packet that are sent to memory. We do not need
anymore the indirection Packet->SenderState->LSQRequest.
This helps making the code clearer as it was sometimes hard to
follow the difference between what the LSQRequest and
LSQSenserState was doing
(ex: number of outstanding requests in the memory).
Change-Id: I5b21e007e6d183c6aa79c27c1787ca56dcbc3fb0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50733
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently, the GPU VIPER TCC protocol handles races between atomics in
the triggerQueue_in. This in_port does not check for resource
availability, which can cause the trigger queue to execute multiple
times. Although this is the expected behavior, the code for handling
atomic races decrements the atomicDoneCnt flag in the trigger queue,
which is not safe since resource contention may cause it to execute
multiple times.
To resolve this issue, this commit moves the decrementing of this
counter to a new action that is called in an event that happens only
when the race between atomics is detected.
Change-Id: I552fd4f34fdd9ebeec99fb7aeb4eeb7b150f577f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51368
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
In the GPU VIPER TCC, programs with mixes of atomics and data
accesses to the same address, in the same kernel, can experience
deadlock when large applications (e.g., Pannotia's graph analytics
algorithms) are running on very small GPUs (e.g., the default 4 CU GPU
configuration). In this situation, deadlocks occur due to resource
stalls interacting with the behavior of the current implementation for
handling races between atomic accesses. The specific order of events
causing this deadlock are:
1. TCC is waiting on an atomic to return from directory
2. In the meantime it receives another atomic to the same address -- when
this happens, the TCC increments number of atomics to this address
(numAtomics = 2) that are pending in TBE, and does a write through of the
atomic to the directory.
3. When the first atomic returns from the Directory, it decrements the
numAtomics counter. numAtomics was at 2 though, because of step #2. So
it doesn't deallocate the TBE entry and calls Event:AtomicNotDone.
4. Another request (a LD) to the same address comes along for the same
address. The LD does z_stall since the second atomic is pending –- so the
LD retries every cycle until the deadlock counter times out (or until the
second atomic comes back).
5. The second atomic returns to the TCC. However, because there are so
many LD's pending in the cache, all doing z_stall's and retrying every cycle,
there are a lot of resource stalls. So, when the second atomic returns, it is
forced to retry its operation multiple times -- and each time it decrements
the atomicDoneCnt flag (which was added to catch a race between atomics
arriving and leaving the TCC in 7246f70bfb) repeatedly. As a result
atomicDoneCnt becomes negative.
6. Since this atomicDoneCnt flag is used to determine when Event:AtomicDone
happens, and since the resource stalls caused the atomicDoneCnt flag to become
negative, we never complete the atomic. Which means the pending LD can never
access the line, because it's stuck waiting for the atomic to complete.
7. Eventually the deadlock threshold is reached.
To fix this issue, this commit changes the VIPER TCC protocol from using
z_stall to using the stall_and_wait buffer method that the
Directory-level of the SLICC already uses. This change effectively
prevents resource stalls from dominating the TCC level, by putting
pending requests for a given address in a per-address stall buffer.
These requests are then woken up when the pending request returns.
As part of this change, this change also makes two small changes to the
Directory-level protocol (MOESI_AMD_BASE-dir):
1. Updated the names of the wakeup actions to match the TCC wakeup actions,
to avoid confusion.
2. Changed transition(B, UnblockWriteThrough, U) to check all stall buffers,
as some requests were being placed later in the stall buffer than was
being checked. This mirrors the changes in 187c44fe44 to other Directory
transitions to resolve races between GPU and DMA requests, but for
transitions prior workloads did not stress.
Change-Id: I60ac9830a87c125e9ac49515a7fc7731a65723c2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51367
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
0F 38 is the two bytes prefixes to decode a three-byte opcode.
To prevent errors, the two_bytes_opcode decoder will complain
if it tries to decode 38 as the opcode, because it is a prefix.
The decoder, will treat 38 as a prefix, preventing it to
end in the two_byte_opcode decoder.
However, using the VEX prefix is possible to reach this
forbidden state.
The set of bytes C4 01 01 38 00 will trigger the mentioned
M5InternalError.
The previous instruction is not valid, but it could be
decoded from an speculative path. In its place, a UD2
instructtion should be emitted if the VEX prefix is
present.
Change-Id: I6b7c4b3593dd8e6e8ac99aaf306b8feeb7784b56
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49990
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Move GpuTLB and TLBCoalescer to GCN3 as the TLB format is specific to
GCN3 and SE mode / APU simulation. Vega will have its own TLB,
coalescer, and walker suitable for a dGPU. This also adds a using alias
for the TLB translation state to reduce the number of references to
TheISA and X86ISA. X86 specific includes are also removed.
Change-Id: I34448bb4e5ddb9980b34a55bc717bbcea0e03db5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49847
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Global instructions are new in Vega and are essentially FLAT
instructions from GCN3 but guaranteed to go to global memory where as
flat can go to global or local memory.
This reworks the flat instruction classes so that the initiateAcc /
execute / completeAcc logic can be reused for flat, global, and later
scratch subtypes of flat instructions. The decoder creates a flat
instruction class which sets instruction flags based on the flat
instruction's SEG field. There are new initOperandInfo and
generateDissasmbly methods for flat and global. The number of operands
and operand index getters are modified to check the flags and return the
correct value for the subtype.
Change-Id: I1db4a3742aeec62424189e54c38c59d6b1a8d3c1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47106
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Kyle Roarty <kyleroarty1716@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The Executable class was used both for the generic gem5 target, and as a
base for the GTest binaries, the systemc test binaries, etc.
Unfortunately, the gem5 binary needs to include src/base/date.cc, and to
ensure that that file is up to date, it needs to depend on all the other
object files. No other binary should have that, but it was included by
inheritance.
Also, depending on the object file works well when those object files
and the date.cc object file are all part of the same binary and not
mixed and matched. That is not true for the GTest binaries for instance,
and so building a unit test would also build all the other unit test
object files because they are dependencies for date.to, date.tdo, etc.
If they already exist, then they would satisfy the dependency and not be
rebuilt.
Change-Id: Ia9cdddc5b2593678e714c08655eb440d7f5b5d1f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51088
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
This is removing the cached boolean variables from the ISA class.
The ISA is now using a release object.
It is importing it from the ArmSystem in case of a FS simulation,
and it is using its own ArmRelease object in SE mode
This allows us to add/remove SE extensions from python, rather than
hardcoding them in the ISA constructor (in case of SE)
Change-Id: I2b0b2f113e7bb9e28ac86bf2139413e2a71eeb01
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51012
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
The constructor of the LoadQueue and StoreQueue were adding
an additional entry compared to the given configuration.
The removed comment was saying that this additional entry was
used as a dummy entry.
This is not necessary anymore with the current structure.
It was even leading to incorrect behavior as a loadQueue
could have one more outstanding load than specified
by the configuration.
Valgrind does not spot any illegal access.
Change-Id: I41507d003e4d55e91215e21f57119af7b3e4d465
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50732
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
This parameter is used to figure out if two addresses are on the same or
different pages, and could be used to find what page they were on and
the page offset, although it doesn't look like the later two are
actually used.
This value could possibly come from the TLB parameter attached to the
prefetcher, but making it explicit makes these more symmetric with the
Ruby prefetcher, and reduces the complexity of the TLB implementation.
Change-Id: I6921943c49af19971b84225ecfd1127304363426
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50352
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
This made it skip over 70 pages to be "what it was before" my page table
changes. I'm not sure what changes this is referring to, and the class
which manages page tables in the guest memory uses the allocPhysPages
method to allocate its memory and would cooperate with anything else
using this mechanism without having to have special accomodation.
I removed this hack and hello world seems to work fine, but there may be
some other test case which exposes some problems.
Change-Id: I16e0d8835452df9c3e79738a1eed05b4cc9372b7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50349
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
LULESH is a popular GPU HPC application that acts as a good test
for several memory and compute patterns. Thus, including it in
the weekly regressions will help verify correctness and
functionality for code that affects the GPU. The default LULESH
input runs 10 iterations and takes 3-4 hours. Hence, it is not
appropriate for nightly regressions.
Change-Id: Ic1b73ab32fdd5cb1b973f2676b272adb91b2a98e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50952
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
HeteroSync does a good job of testing the GPU memory system and
atomics support, without requiring a long runtime. Thus, this
commit adds a mutex and barrier test from HeteroSync to the
nightly regression to ensure these components are tested.
Change-Id: I65998a0a63d41dd3ba165c3a000cee7e42e9034a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50951
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
In the MemPool object, the idea of a limit of the pool (largest page)
and the total number of pages were conflated, as was the page number of
the next "free" page and the total number of pages allocated. Both of
those would only be equivalent if the memory pool starts at address
zero, which is not generally true and could be true for at most one pool
at a time even when it is occasionally true.
Instead, this change fixes up MemPool to keep tree values, a starting
page number, the page number of the next free page, and the total number
of pages in the pool, both allocated and unallocated.
With those three values, we can accurately report the number of
allocated pages (not just the number of pages of any kind below the next
free one), the total number of free pages, and the total number of pages
in general (not the largest numbered page in the pool).
The value serialized by the System class was adjusted so that it will
stay compatible with previous checkpoints. The value unserialized by the
system class is passed to the MemPool as a limit, which has not changed
and so doesn't need to be updated. It gets translated into the total
number of pages in the MemPool constructor.
Change-Id: I8268ef410b41bf757df9ee5585ec2f6b0d8499e1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50687
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Turn the functions within it into virtual methods on the ISA classes.
Eliminate the implementation in MIPS, which was just copy pasted from
Alpha long ago. Fix some minor style issues in ARM. Remove templating.
Switch from using an "XC" type parameter to using the ThreadContext *
installed in all ISA classes.
The ARM version of these functions actually depend on the ExecContext
delaying writes to MiscRegs to work correctly. More insiduously than
that, they also depend on the conicidental ThreadContext like
availability of certain functions like contextId and getCpuPtr which
come from the class which happened to implement the type passed into XC.
To accomodate that, those functions need both a real ThreadContext, and
another object which is either an ExecContext or a ThreadContext
depending on how the method is called.
Jira Issue: https://gem5.atlassian.net/browse/GEM5-1053
Change-Id: I68f95f7283f831776ba76bc5481bfffd18211bc4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50087
Maintainer: Gabe Black <gabe.black@gmail.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>