This patch fixes the MESI_Three_Level protocols so that it correctly
informers the Ruby sequencer when a line eviction occurs. Furthermore,
the patch allows the protocol to recognize the 'Store_Conditional'
RubyRequestType and shortcuts this operation if the monitored line
has been cleared from the address monitor. This prevents certain
livelock behaviour in which a line could ping-pong between competing
cores.
The patch establishes a new C/C++ preprocessor definition which allows
the Sequencer to send the 'Store_Conditional' RubyRequestType to
MESI_Three_Level instead of 'ST'. This is a temporary measure until
the other protocols explicitely recognize 'Store_Conditional'.
Change-Id: I27ae041ab0e015a4f54f20df666f9c4873c7583d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28328
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
The implementation for load-linked/store-conditional did not work
correctly for multi-core simulations. Since load-links were treated as
stores, it was not possible for a line to have multiple readers which
often resulted in livelock when using these instructions to implemented
mutexes. This improved implementation treats load-linked instructions
similarly to loads but locks the line after a copy has been fetched
locally. Writes to a monitored address ensure the 'linked' property is
blown away and any subsequent store-conditional will fail.
Change-Id: I19bd74459e26732c92c8b594901936e6439fb073
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27103
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Using mechanisms added in previous CLs, this change modifies the m5
utility so that it can use any of the back ends enabled and implemented
by each variant, defaulting to one particular implementation if not is
selected explicitly.
On x86, the default mechanism is the magic address. All other variants
default to the magic instruction since they don't have a well
established address to use or even in most cases an implementation to
use.
The ability to override the particular magic address the utility wants
to use (necessary on variants such as aarch64) will be added in a future
CL.
Change-Id: I5fc414740e30759e7dde719cddcc8d5d41f8cc74
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27242
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Pouya Fotouhi <pfotouhi@ucdavis.edu>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
If the address space is changed (by writing to SATP), it can happen that
a page table walk is in progress. Previously, this failed if the ASID
changed, because we then used the wrong ASID for the lookup.
This commit makes sure that we don't access CSRs after the beginning of
the walk by loading SATP, STATUS, and PRV at the beginning.
Change-Id: I8c184c7ae7dd44d78e881bb5ec8d430dd480849c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28447
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The MESI_Three_Level protocol includes a transition in its L1
definition to invalidate an SM state but this transition does
not notify the L0 cache. The unintended side effect of this
allows stale values to be read by the L0 cache. This can cause
incorrect behaviour when executing LL/SC based mutexes. This
patch ensures that all invalidates to SM states are exposed to
the L0 cache.
Change-Id: I7fefabdaa8027fdfa4c9c362abd7e467493196aa
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28047
Reviewed-by: John Alsop <johnathan.alsop@amd.com>
Reviewed-by: Pouya Fotouhi <pfotouhi@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch fixes the following bugs:
- Previously when deltaPointer was 0 or 1, getting the last or penultimate deltas
would be wrong for non-pow2 deltas.size(). For example, if the last added delta
was to position 0, the previous should be in position 19, if deltas.size() = 20.
However, 0-1=4294967295, and 4294967295%20=15.
- When searching for the previous late and penultimate, the oldest entry was being
skipped.
Change-Id: Id800b60b77531ac4c2920bb90c15cc8cebb137a9
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/24538
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Javier Bueno Hedo <javier.bueno@metempsy.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When the internal utility function make_version_list sees a string, it
tries to convert it into a list using the map() function. In python 3,
that returns an iterator. The following call to zip() will consume those
iterators, and then the following calls to len() will die because they
don't work on map iterators.
This is only a problem if all the common components of the version lists
are equal, and the comparison needs to then check if one of the lists
was equal to the other but with more components. When versions are
equal, for instance when compiling with the oldest supported version of
gcc (4.8.0) this error surfaces and breaks our scons build.
A simple fix is to just wrap the call to map() with list() to convert
the iterator to a flat list, making the other logic work as before.
Change-Id: If9dc5cd7fff70c21229ac3dd9a017edeccd26148
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28309
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The priority queue comparator orders such that false gives the
entry a higher priority. Therefore, if it is desired to make
the entry with lowest decompression latency have higher priority,
the comparison must be inverted.
Can be tested with:
MultiCompressor(compressors=[
PerfectCompressor(decompression_latency=1),
PerfectCompressor(decompression_latency=2)])
Where it is expected that compressor0 (the one with decomp lat
of 1) is always chosen.
Change-Id: I44acbf5f51c6e47efdd2a16fba9596935cf2eb69
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28367
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Adds a TokenPort which uses tokens for flow control rather than the
standard retry mechanism in gem5. The port is intended to be used
for flow control where speculatively sending packets is not possible.
For example, GPU instructions require this to send memory requests
to the cache coalescer.
Change-Id: Id0d55ab65b7c773e97752b8514a780cdf7d88707
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27428
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This change adds support for HSA devices, which are
DMA devices that have an HSA packet processor (HSAPP).
An HSA packet processor model is also included. The
HSAPP is a DMA device that matains AQL packet queues
and handles extraction of AQL packets, scheduling
of AQL queues, and initiates kernel launch for HSA
devices.
Because these devices directly interact with low-level
software and aid in the implementation of the HSA ABI
we also include some headers from the ROCm runtime:
the hsa.h and kfd_ioctl.h headers. These aid with
support ROCm for the HSA devices and drivers.
Change-Id: I24305e0337edc6fa555d436697b4e607a1e097d5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28128
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Decoder: gpu_decoder.hh and decoder.cc:
The decoder is defined in these files. The decoder
is implemented as a lookup table of function pointers
where each decode function will decode to a unique
ISA instruction, or do some sub-decoding to infer
the next decode function to call.
The format for each OP encoding is defined in the
header file.
Registers:
registers.[hh|cc] define the special registers and
operand selector values, which are used to map
operands to registers/special values. many
convenience functions are also provides to determine
the source/type of an operand, for example vector
vs. scalar, register operand vs. constant, etc.
GPU ISA:
Some special GPU ISA state is maintained in gpu_isa.hh
and isa.cc. This class is used to hold some special
registers and values that can be used as operands
by ISA instructions. Eventually more ISA-specific
state should be moved here, and out of the WF class.
Vector Operands:
The operands for GCN3 instructions are defined in
operand.hh. This file defines both scalar and
vector operands wth GCN3 specific semantics. The
vector operand class is desgned around the generic
vec_reg.hh that is already present in gem5.
Instructions:
The GCN3 instructions are defined and implemented
throughout gpu_static_inst.[hh|cc], instructions.[hh|cc],
op_encodings.[hh|cc], and inst_util.hh. GCN3 instructions
all fall under one of the OP encoding types; for example
scalar memory operands are of the type SMEM, vector
ALU instructions can be VOP3, VOP2, etc. The base code
common to all instructions of a certain OP encoding type
is implemented in the OP encodings files, which includes
operand information, disassembly methods, encoding type,
etc.
Each individual ISA isntruction is implemented as
a class object in instructions.[hh|cc] and are derived
from one of the OP encoding types. The instructions.cc
file is primarily for the execute() methods of each
individual instruction, and the header file provides
the class definition and a few instruction specific
API calls.
Note that these instruction classes were auto-generated
but not using the gem5 ISA description language. A
custom ISA description was used and that cannot be released
publicly, therefore we are providing them already in C++.
Change-Id: I14d2a02d6b87109f41341c8f50a69a2cca9f3d14
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28127
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Sometimes when using the GuestABI mechanism, gem5 wants to know that a
function was called and with what arguments to do its own processing,
but doesn't want to return its own value since it will still let the
simulated system execute its own function. There are also situations
where gem5 wants to return a value, but not through the normal
mechanism. That happens when, for instance, a gem5 op is triggered by a
memory access, and that access is what should return the value, not a
particular fixed register.
This option is a template parameter rather than a function argument so
that if it's not going to be used, no "Return" type needs to be defined
since it's not present at all in the chain of functions invokeSimcall
expands to.
This will also make it easier to reuse generic ABIs in those situations
without having to make custom wrappers.
Change-Id: I969e78495c8f4e73f4de1a3dfb4d74c9b30f5af5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28288
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
PowerDomains group multiple objects together to regulate their power
state. There are 2 types of objects in a PowerDomain: leaders and
followers. The power state of a PowerDomain is the most performant
power state of any of the leaders. The power state of the followers is
determined by the power state of the PowerDomain they belong to: they
need to be in a power state which is more or equally performant to the
power state of the PowerDomain.
Leaders can be ClockedObjects or other PowerDomains. Followers can
only be ClockedObjects. PowerDomains can be be nested but a
PowerDomain can only be a leader of another PowerDomain, NOT a
follower. PowerDomains are not present in the hierarchy by default,
the user needs to create and configure them in the configuration file.
The user can add an hierachy by setting the led_by parameter. gem5
will then create leaders and followers for each domain and calculate
the allowed power states for the domain.
Objects in a PowerDomain need to have at least the ON state in the
possible_states.
An example of a powerDomain config is:
pd = PowerDomain()
cpu0 = BaseCPU()
cpu1 = BaseCPU()
shared_cache = BaseCache()
cache.power_state.led_by = pd
pd.led_by = [cpu0, cpu1]
This will create a PowerDomain, where the CPUs determine their own
power states and the shared cache (via the PowerDomain) follows those
power states (when possible).
Change-Id: I4c4cd01f06d45476c6e0fb2afeb778613733e2ff
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28051
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This commit adds the concept of possible power states to the
PowerState SimObject. This is a list of the power states a specific
object can be in. Before transitioning to a power state, a PowerState
object will first check if the requested power states is actually an
allowed state. The user can restricted the power states a
ClockedObject can go to during configuration. In addition, this change
sets the power states, a CPU can be in.
Change-Id: Ida414a87554a14f09767a272b54b5d19bfc8e911
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28050
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This commit does not make any functional changes but just rearranges
the existing code with regard to the power states. Previously, all
code regarding power states was in the ClockedObjects. However, it
seems more logical and cleaner to move this code into a separate
class, called PowerState. The PowerState is a now SimObject. Every
ClockedObject has a PowerState but this patch also allows for objects
with PowerState which are not ClockedObjects.
Change-Id: Id2db86dc14f140dc9d0912a8a7de237b9df9120d
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28049
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
fs_power.py is an example script that demonstrates how power models
can be used with gem5. Previously, the formulas used to calculate the
dynamic and static power of the cores and the L2 cache were using
stats in equations as determined by their path relative to the
SimObject where the power model is attached to or full paths. This CL
changes these formulas to refer to the stats only by their full paths.
Change-Id: I91ea16c88c6a884fce90fd4cd2dfabcba4a1326c
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27893
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
With the introduction of StatGroups the organization of stats has
changed and the power modeling framework has been broken. This CL uses
the new function Stats::resolve to retrieve pointers to the necesary
stats and use them in the power estimation formulas.
Change-Id: Iedaa97eeddf51f7a0a1f222918715da309943be3
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27892
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
This change adds a member function to the Group class that returns a
stat given its name. The function will go through all stats in the
group and its subgroups and will return the stat that matches the
name. For example, if g is the Group system.bigCluster.cpus then a
call to
p = g.resolveStat("ipc")
will return a pointer to the stat system.bigCluster.cpus.ipc.
Change-Id: I5af8401b38b41aee611728f6d1a595f99d22d9de
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27890
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>