The code which generated SimObject related param wrappers, cxx wrappers,
enum headers, etc was organized strangely. All the functions which
were used as SCons Actions were listed next to each other, and then all
the code which would set up each of those types of files and actually
use the Actions were next to each other.
This change rearranges that code so that the Action function is
immediately before the code which applies it. Or in other words, this
section of the SConscript is now grouped by the files being created,
rather than the type of the piece of machinery being defined to do that.
Change-Id: Ideee7bd44dac89c51840ec5970d95f6ccbbd1c8f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49402
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Square and HeteroSync's pre-built binaries were downloaded into the
tests folder in the nightly regression script, but the docker
command running them assumed we were in GEM5_ROOT. This commit
fixes this problem by specificying the benchmark root for the
applications.
Change-Id: I905c8bde7231bc708db01bff196fd85d99c7ceac
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51247
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
GFX7 (not supported in gem5) and GFX8 have a bug with how virtual
addresses are calculated for their HSA queues. The ROCr component of
ROCm solves this problem by doubling the HSA queue size that is
requested, then mapping all virtual addresses in the second half of the
queue to the same virtual addresses as the first half of the queue.
This commit fixes gem5's support to mimic this behavior.
Note that this change does not affect Vega's HSA queue support, because
according to the ROCm documentation, Vega does not have the same problem
as GCN3.
Change-Id: I133cf1acc3a00a0baded0c4c3c2a25f39effdb51
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51371
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
All syscalls with the "at" suffix rely on a directory file descriptor
(dirfd) and a pathname, provided as arguments to the syscall
If the pathname is relative, then it is interpreted relative to the
directory referred to by the file descriptor dirfd (rather than relative
to the current working directory of the calling process)
Prior to this patch, only the openat syscall was properly implemented.
Other syscalls were discarding the dirfd argument and producing
a warning instead
JIRA: https://gem5.atlassian.net/browse/GEM5-1098
Change-Id: I0cc20c6ef79fca8c8d1c2c9a52eb54ede3d51312
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51048
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
DNNMark is representative of several simple (fast) layers within ML
applications, which are heavily used in modern GPU applications. Thus,
we want to make sure support for these applications are tested. This
commit updates the weekly regression to run three variants: fwd_softmax,
bwd_bn, and fwd_pool -- ensuring we test both inference and training as
well as a variety of ML layers.
Change-Id: I38bfa9bd3a2817099ece46afc2d6132ce346e21a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51187
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Align allocation requests in Process::allocateMem to page boundaries,
rather than assume that they already are. This frees the caller from
having to know what boundary to align things to. The older version would
make the caller more aware of the extent of the allocation in theory,
but in reality the caller would just blindly perform the alignment like
this function is anyway.
Change-Id: I897714d4481d961255a9e44ae080135e507be199
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50757
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently, if a maintainer is removed from a change, the maintainer
will be added again. This change prevents the bot from adding the
removed maintainer again.
The bot will query all updates related to reviewer addition/removal
for each new change. If a reviewer has ever been added/removed
from a change, that reviewer won't be added to that change again.
Change-Id: Ifaab5ebd7ebf3e6453b2551d3e37c1b9e214c906
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50187
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
The LSQSenderState that was attached to Request was not useful.
All the fields were either a duplicate of information in the
LSQRequest or totally unused.
The LSQRequest class now inherits from Packet::SenderState and is
attached to the Packet that are sent to memory. We do not need
anymore the indirection Packet->SenderState->LSQRequest.
This helps making the code clearer as it was sometimes hard to
follow the difference between what the LSQRequest and
LSQSenserState was doing
(ex: number of outstanding requests in the memory).
Change-Id: I5b21e007e6d183c6aa79c27c1787ca56dcbc3fb0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50733
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently, the GPU VIPER TCC protocol handles races between atomics in
the triggerQueue_in. This in_port does not check for resource
availability, which can cause the trigger queue to execute multiple
times. Although this is the expected behavior, the code for handling
atomic races decrements the atomicDoneCnt flag in the trigger queue,
which is not safe since resource contention may cause it to execute
multiple times.
To resolve this issue, this commit moves the decrementing of this
counter to a new action that is called in an event that happens only
when the race between atomics is detected.
Change-Id: I552fd4f34fdd9ebeec99fb7aeb4eeb7b150f577f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51368
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
In the GPU VIPER TCC, programs with mixes of atomics and data
accesses to the same address, in the same kernel, can experience
deadlock when large applications (e.g., Pannotia's graph analytics
algorithms) are running on very small GPUs (e.g., the default 4 CU GPU
configuration). In this situation, deadlocks occur due to resource
stalls interacting with the behavior of the current implementation for
handling races between atomic accesses. The specific order of events
causing this deadlock are:
1. TCC is waiting on an atomic to return from directory
2. In the meantime it receives another atomic to the same address -- when
this happens, the TCC increments number of atomics to this address
(numAtomics = 2) that are pending in TBE, and does a write through of the
atomic to the directory.
3. When the first atomic returns from the Directory, it decrements the
numAtomics counter. numAtomics was at 2 though, because of step #2. So
it doesn't deallocate the TBE entry and calls Event:AtomicNotDone.
4. Another request (a LD) to the same address comes along for the same
address. The LD does z_stall since the second atomic is pending –- so the
LD retries every cycle until the deadlock counter times out (or until the
second atomic comes back).
5. The second atomic returns to the TCC. However, because there are so
many LD's pending in the cache, all doing z_stall's and retrying every cycle,
there are a lot of resource stalls. So, when the second atomic returns, it is
forced to retry its operation multiple times -- and each time it decrements
the atomicDoneCnt flag (which was added to catch a race between atomics
arriving and leaving the TCC in 7246f70bfb) repeatedly. As a result
atomicDoneCnt becomes negative.
6. Since this atomicDoneCnt flag is used to determine when Event:AtomicDone
happens, and since the resource stalls caused the atomicDoneCnt flag to become
negative, we never complete the atomic. Which means the pending LD can never
access the line, because it's stuck waiting for the atomic to complete.
7. Eventually the deadlock threshold is reached.
To fix this issue, this commit changes the VIPER TCC protocol from using
z_stall to using the stall_and_wait buffer method that the
Directory-level of the SLICC already uses. This change effectively
prevents resource stalls from dominating the TCC level, by putting
pending requests for a given address in a per-address stall buffer.
These requests are then woken up when the pending request returns.
As part of this change, this change also makes two small changes to the
Directory-level protocol (MOESI_AMD_BASE-dir):
1. Updated the names of the wakeup actions to match the TCC wakeup actions,
to avoid confusion.
2. Changed transition(B, UnblockWriteThrough, U) to check all stall buffers,
as some requests were being placed later in the stall buffer than was
being checked. This mirrors the changes in 187c44fe44 to other Directory
transitions to resolve races between GPU and DMA requests, but for
transitions prior workloads did not stress.
Change-Id: I60ac9830a87c125e9ac49515a7fc7731a65723c2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51367
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
0F 38 is the two bytes prefixes to decode a three-byte opcode.
To prevent errors, the two_bytes_opcode decoder will complain
if it tries to decode 38 as the opcode, because it is a prefix.
The decoder, will treat 38 as a prefix, preventing it to
end in the two_byte_opcode decoder.
However, using the VEX prefix is possible to reach this
forbidden state.
The set of bytes C4 01 01 38 00 will trigger the mentioned
M5InternalError.
The previous instruction is not valid, but it could be
decoded from an speculative path. In its place, a UD2
instructtion should be emitted if the VEX prefix is
present.
Change-Id: I6b7c4b3593dd8e6e8ac99aaf306b8feeb7784b56
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49990
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Move GpuTLB and TLBCoalescer to GCN3 as the TLB format is specific to
GCN3 and SE mode / APU simulation. Vega will have its own TLB,
coalescer, and walker suitable for a dGPU. This also adds a using alias
for the TLB translation state to reduce the number of references to
TheISA and X86ISA. X86 specific includes are also removed.
Change-Id: I34448bb4e5ddb9980b34a55bc717bbcea0e03db5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49847
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Global instructions are new in Vega and are essentially FLAT
instructions from GCN3 but guaranteed to go to global memory where as
flat can go to global or local memory.
This reworks the flat instruction classes so that the initiateAcc /
execute / completeAcc logic can be reused for flat, global, and later
scratch subtypes of flat instructions. The decoder creates a flat
instruction class which sets instruction flags based on the flat
instruction's SEG field. There are new initOperandInfo and
generateDissasmbly methods for flat and global. The number of operands
and operand index getters are modified to check the flags and return the
correct value for the subtype.
Change-Id: I1db4a3742aeec62424189e54c38c59d6b1a8d3c1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47106
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Kyle Roarty <kyleroarty1716@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The Executable class was used both for the generic gem5 target, and as a
base for the GTest binaries, the systemc test binaries, etc.
Unfortunately, the gem5 binary needs to include src/base/date.cc, and to
ensure that that file is up to date, it needs to depend on all the other
object files. No other binary should have that, but it was included by
inheritance.
Also, depending on the object file works well when those object files
and the date.cc object file are all part of the same binary and not
mixed and matched. That is not true for the GTest binaries for instance,
and so building a unit test would also build all the other unit test
object files because they are dependencies for date.to, date.tdo, etc.
If they already exist, then they would satisfy the dependency and not be
rebuilt.
Change-Id: Ia9cdddc5b2593678e714c08655eb440d7f5b5d1f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51088
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
This is removing the cached boolean variables from the ISA class.
The ISA is now using a release object.
It is importing it from the ArmSystem in case of a FS simulation,
and it is using its own ArmRelease object in SE mode
This allows us to add/remove SE extensions from python, rather than
hardcoding them in the ISA constructor (in case of SE)
Change-Id: I2b0b2f113e7bb9e28ac86bf2139413e2a71eeb01
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51012
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
The constructor of the LoadQueue and StoreQueue were adding
an additional entry compared to the given configuration.
The removed comment was saying that this additional entry was
used as a dummy entry.
This is not necessary anymore with the current structure.
It was even leading to incorrect behavior as a loadQueue
could have one more outstanding load than specified
by the configuration.
Valgrind does not spot any illegal access.
Change-Id: I41507d003e4d55e91215e21f57119af7b3e4d465
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50732
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
This parameter is used to figure out if two addresses are on the same or
different pages, and could be used to find what page they were on and
the page offset, although it doesn't look like the later two are
actually used.
This value could possibly come from the TLB parameter attached to the
prefetcher, but making it explicit makes these more symmetric with the
Ruby prefetcher, and reduces the complexity of the TLB implementation.
Change-Id: I6921943c49af19971b84225ecfd1127304363426
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50352
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>