Previously, with HSAIL, we were guaranteed by the HSA specification
that the GPU will never issue unaligned accesses. However, now
that we are directly running GCN this is no longer true.
Accordingly, this commit adds support for unaligned accesses.
Moreover, to reduce the replication of nearly identical
code for the different request types, I also added new helper
functions that are called by all the different memory request
producing instruction types in op_encodings.hh.
Adding support for unaligned instructions requires changing
the statusBitVector used to track the status of the memory
requests for each lane from a bit per lane to an int per lane.
This is necessary because an unaligned access may span multiple
cache lines. In the worst case, each lane may span multiple
cache lines. There are corresponding changes in the files that
use the statusBitVector.
Change-Id: I319bf2f0f644083e98ca546d2bfe68cf87a5f967
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29920
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Kernel end release was turned on for VIPER protocol, which
is in fact write-through based and thus no need to have
release operation. This changeset splits the option
'impl_kern_boundary_sync' into 'impl_kern_launch_acq'
and 'impl_kern_end_rel', and turns off release on VIPER.
Change-Id: I5490019b6765a25bd801cc78fb7445b90eb02a3d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29917
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Initialize _stackSize and _stackMin to the maximum stack size values.
The are setup in each arch's Process::initState and may be uninitialized
until then. If a stack fixup occurs before these are setup, addresses
which are not in the stack might be allocated on the stack. This
prevents that until they are initialized in Process::initState. If an
access occurs before that with these initial values, the stack fixup
will simply allocate a page of memory in the stack space. However, it
will not print the typical info messages about growing the stack during
this time.
Change-Id: I9f9316734f4bf1f773fc538922e83b867731c684
JIRA: https://gem5.atlassian.net/browse/GEM5-629
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30394
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Certain micro-op sequences were only setting isLastMicroop flags,
and did not set the isFirstMicroop flag. This adds the missing
setFirstMicroop() calls. This fixes tracing issues (e.g. Tarmac)
of certain micro-opped instruction sequences such as LD1.
Change-Id: I7de3ee2759e2b4e1065a7cbac4186f11227d84be
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30034
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These defaults are never used. There was an assert in the predictors
until recently which was asserting that one of the arguments didn't
have the default value, I think to verify that the default wasn't used
by accident(?), but it could be used purposefully. That would cause
gem5 to crash and has been removed.
Beyond that, there's no reason to have default values for those
arguments in the first place, so this change removes them. That makes
the code slightly simpler, and avoids them being used by accident.
Additionally, the defalt values of the arguments made the function
signatures inconsistent, even though they were supposed to override
each other.
JIRA: https://gem5.atlassian.net/browse/GEM5-483
Change-Id: I28f8d2048985c12ec9cac018a868a32bfa20dc6c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30375
Reviewed-by: Hsuan Hsu <hsuan.hsu@mediatek.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The sandbox module is providing a sandbox environment for
a specific TestCase via the multiprocessing package.
This isolation/complexity is not strictly needed as testlib is already
forking a new process via subprocess. As it is now, a TestRunner will
generate:
TestRunner -> multiprocessing.Process -> subprocess.Popen
(2 generated procs)
With this patch we are removing the intermediate layer
TestRunner -> subprocess.Popen
(1 generated proc)
JIRA: https://gem5.atlassian.net/projects/GEM5/issues/GEM5-533
Change-Id: Icd5cadbe316653a9269ab098ec4c07f21b864ad3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30215
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
These are the stats in the base class, not in any derived classes. Only
Alpha has an additional stats. These were not really "kernel"
statistics, they were just applicable primarily in FS. They are
potentially applicable to any simulation, but will probably not be
incremented in SE simulations.
Also this merges these stats from being per thread to being per
workload, ie operating system instance. This is probably more relevant
since exactly what thread within a workload runs which particular
instruction is not very important/predictable, but the aggregate
behavior is. If necessary, this could be adjusted in the future to
split things back out again into stats per thread while keeping them
inside the single workload object.
Change-Id: I130e11a9022bdfcadcfb02c7995871503114cd53
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/25147
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There was an add-hoc check added to getAddrRanges, but the other methods
would just segfault if they tried to talk to their peers. This change
wraps all the calls in try blocks and catches the exception which the
peer will throw if it's the default and the port is not actually
connected to anything.
Change-Id: Ie46be0230f33f74305c599b251ca319a65ba008d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30296
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
If a port is unbound, trying to call its peer will likely cause a
segfault. Rather than check if a port is bound every time you go to use
it, we can instead bind to a default peer which just throws an exception
back to the caller. The caller can catch the exception and report the
error.
This change adds a common new class to throw as the exception, and also
a small utility function which reports the error and dies.
Change-Id: Ia58a2030922c73e2fd7d139822bce38d9b0f2171
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30295
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The XBar uses the concept of Layers to model throughput and
instantiates one Layer per master. As it forwards a packet to and from
master, the corresponding Layer is marked as occupied for a number of
cycles. Requests/responses to/from a master are blocked while the
corresponding Layer is occupied. Previously the delay would be
calculated based on the formula 1 + size / width, which assumes that
the Layer is always occupied for 1 cycle while processing the packet
header. This changes makes the header latency a parameter which
defaults to 1.
Change-Id: I12752ab4415617a94fbd8379bcd2ae8982f91fd8
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30054
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Tested-by: kokoro <noreply+kokoro@google.com>
Setting disable_partial part way through the checks for various build
targets is incorrect and will affect targets based on the order they're
checked.
This change moves the check earlier, makes it consistent across all
builds whether fast is included or not, and stops passing it in as an
option to makeEnv since it now applies universally.
By disabling partial linking consistently, we avoid missing bugs where
only the "fast" version of gem5 doesn't build correctly because of the
multitude of g++ bugs having to do with combining LTO and partial
linking.
This also simplifies the logic in the SConscript by having fewer
independently moving parts.
Change-Id: Iff69f39868e948d3b9a5b11ea80bbfed19419b59
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29303
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
The ThreadContext can be used to access the cpu if needed, and is a
more representative interface to various pieces of state than the CPU
itself. Also convert some of the methods in Interupts to use the
locally stored ThreadContext pointer instead of taking one as an
argument. This makes calling those methods simpler and less error
prone.
Change-Id: I740bd99f92e54e052a618a4ae2927ea1c4ece193
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28988
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The System class has a few different arrays of values which each
correspond to a thread of execution based on their position. This
change collects them together into a single class to make managing them
easier and less error prone. It also collects methods for manipulating
those threads as an API for that class.
This class acts as a collection point for thread based state which the
System class can look into to get at all its state. It also acts as an
interface for interacting with threads for other classes. This forces
external consumers to use the API instead of accessing the individual
arrays which improves consistency.
Change-Id: Idc4575c5a0b56fe75f5c497809ad91c22bfe26cc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/25144
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Instead of calling into object files after the fact and asking them to
put symbols into a target symbol table, this change makes object files
fill in a symbol table themselves at construction. Then, that table can
be retrieved and used to fill in aggregate tables, masked, moved,
and/or filtered to have only one type of symbol binding.
This simplifies the symbol management API of the object file types
significantly, and makes it easier to deal with symbol tables alongside
binaries in the FS workload classes.
Change-Id: Ic9006ca432033d72589867c93d9c5f8a1d87f73c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/24787
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>