This patch properly sets the access permissions in all controllers.
'Busy' was used for all transient states, which is incorrect in lots of
cases when we still hold a valid copy of the line and are able to handle
a functional read.
In the L2 controller these states were split to differentiate the access
permissions:
IFGXX -> IFGXX, IFGXXD
IGMO -> IGMO, IGMOU
IGMIOF -> IGMIOF, IGMIOFD
Same for the dir. controller:
IS -> IS, IS_M
MM -> MM, MM_M
The dir. controllers also has the states WBI/WBS for lines that have
been queued for a writeback. In these states we hold the data in the TBE
for replying to functional reads until the memory acks the write and we
move to I or S.
Other minor changes includes updated debug messages and asserts.
Change-Id: Ie4f6eac3b4d2641ec91ac6b168a0a017f61c0d6f
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/21927
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Pouya Fotouhi <pfotouhi@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch fixes some issues in the directory controller regarding DMA
handling:
1) Junk data messages were being sent immediately in response to DMA reads
for a line in the S state (one or more sharers, clean). Now, data is
fetched from memory directly and forwarded to the device. Some existing
transitions for handling GETS requests are reused, since it's essentially
the same behavior (except we don't update the list of sharers for DMAs)
2) DMA writes for lines in the I or S states would always overwrite the
whole line. We now check if it's only a partial line write, in which case
we fetch the line from memory, update it, and writeback.
3) Fixed incorrect DMA msg size
Some existing functions were renamed for clarity.
Change-Id: I759344ea4136cd11c3a52f9eaab2e8ce678edd04
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/21926
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Pouya Fotouhi <pfotouhi@ucdavis.edu>
Before this commit it would show only numbers:
Writing to misc reg 19 (19) : 0x74178
and now it also shows the name:
Writing MiscReg lockaddr (19 19) : 0x74178
MiscReg reads were already showing names and are unchanged, e.g.:
Reading MiscReg sctlr_el1 with clear res1 bits: 0x18100800
Change-Id: If46da88359ce4a549a6a50080a2b13077d41e373
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28467
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The temporal compactor was never initialized.
There were more possible indexes to the prec/succ vectors than
entries, so a block distance of zero would seg fault.
When checking for an address the wrong vector was being used.
From the original paper, "The prediction mechanism searches for
the PC of the accessed instruction in the index table"
Change-Id: I3c3aceac3c0adbbe8aef5c634c88cb35ba7487be
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28487
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch fixes the MESI_Three_Level protocols so that it correctly
informers the Ruby sequencer when a line eviction occurs. Furthermore,
the patch allows the protocol to recognize the 'Store_Conditional'
RubyRequestType and shortcuts this operation if the monitored line
has been cleared from the address monitor. This prevents certain
livelock behaviour in which a line could ping-pong between competing
cores.
The patch establishes a new C/C++ preprocessor definition which allows
the Sequencer to send the 'Store_Conditional' RubyRequestType to
MESI_Three_Level instead of 'ST'. This is a temporary measure until
the other protocols explicitely recognize 'Store_Conditional'.
Change-Id: I27ae041ab0e015a4f54f20df666f9c4873c7583d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28328
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
The implementation for load-linked/store-conditional did not work
correctly for multi-core simulations. Since load-links were treated as
stores, it was not possible for a line to have multiple readers which
often resulted in livelock when using these instructions to implemented
mutexes. This improved implementation treats load-linked instructions
similarly to loads but locks the line after a copy has been fetched
locally. Writes to a monitored address ensure the 'linked' property is
blown away and any subsequent store-conditional will fail.
Change-Id: I19bd74459e26732c92c8b594901936e6439fb073
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27103
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Using mechanisms added in previous CLs, this change modifies the m5
utility so that it can use any of the back ends enabled and implemented
by each variant, defaulting to one particular implementation if not is
selected explicitly.
On x86, the default mechanism is the magic address. All other variants
default to the magic instruction since they don't have a well
established address to use or even in most cases an implementation to
use.
The ability to override the particular magic address the utility wants
to use (necessary on variants such as aarch64) will be added in a future
CL.
Change-Id: I5fc414740e30759e7dde719cddcc8d5d41f8cc74
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27242
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Pouya Fotouhi <pfotouhi@ucdavis.edu>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
If the address space is changed (by writing to SATP), it can happen that
a page table walk is in progress. Previously, this failed if the ASID
changed, because we then used the wrong ASID for the lookup.
This commit makes sure that we don't access CSRs after the beginning of
the walk by loading SATP, STATUS, and PRV at the beginning.
Change-Id: I8c184c7ae7dd44d78e881bb5ec8d430dd480849c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28447
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The MESI_Three_Level protocol includes a transition in its L1
definition to invalidate an SM state but this transition does
not notify the L0 cache. The unintended side effect of this
allows stale values to be read by the L0 cache. This can cause
incorrect behaviour when executing LL/SC based mutexes. This
patch ensures that all invalidates to SM states are exposed to
the L0 cache.
Change-Id: I7fefabdaa8027fdfa4c9c362abd7e467493196aa
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28047
Reviewed-by: John Alsop <johnathan.alsop@amd.com>
Reviewed-by: Pouya Fotouhi <pfotouhi@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch fixes the following bugs:
- Previously when deltaPointer was 0 or 1, getting the last or penultimate deltas
would be wrong for non-pow2 deltas.size(). For example, if the last added delta
was to position 0, the previous should be in position 19, if deltas.size() = 20.
However, 0-1=4294967295, and 4294967295%20=15.
- When searching for the previous late and penultimate, the oldest entry was being
skipped.
Change-Id: Id800b60b77531ac4c2920bb90c15cc8cebb137a9
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/24538
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Javier Bueno Hedo <javier.bueno@metempsy.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When the internal utility function make_version_list sees a string, it
tries to convert it into a list using the map() function. In python 3,
that returns an iterator. The following call to zip() will consume those
iterators, and then the following calls to len() will die because they
don't work on map iterators.
This is only a problem if all the common components of the version lists
are equal, and the comparison needs to then check if one of the lists
was equal to the other but with more components. When versions are
equal, for instance when compiling with the oldest supported version of
gcc (4.8.0) this error surfaces and breaks our scons build.
A simple fix is to just wrap the call to map() with list() to convert
the iterator to a flat list, making the other logic work as before.
Change-Id: If9dc5cd7fff70c21229ac3dd9a017edeccd26148
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28309
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The priority queue comparator orders such that false gives the
entry a higher priority. Therefore, if it is desired to make
the entry with lowest decompression latency have higher priority,
the comparison must be inverted.
Can be tested with:
MultiCompressor(compressors=[
PerfectCompressor(decompression_latency=1),
PerfectCompressor(decompression_latency=2)])
Where it is expected that compressor0 (the one with decomp lat
of 1) is always chosen.
Change-Id: I44acbf5f51c6e47efdd2a16fba9596935cf2eb69
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28367
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Adds a TokenPort which uses tokens for flow control rather than the
standard retry mechanism in gem5. The port is intended to be used
for flow control where speculatively sending packets is not possible.
For example, GPU instructions require this to send memory requests
to the cache coalescer.
Change-Id: Id0d55ab65b7c773e97752b8514a780cdf7d88707
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27428
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This change adds support for HSA devices, which are
DMA devices that have an HSA packet processor (HSAPP).
An HSA packet processor model is also included. The
HSAPP is a DMA device that matains AQL packet queues
and handles extraction of AQL packets, scheduling
of AQL queues, and initiates kernel launch for HSA
devices.
Because these devices directly interact with low-level
software and aid in the implementation of the HSA ABI
we also include some headers from the ROCm runtime:
the hsa.h and kfd_ioctl.h headers. These aid with
support ROCm for the HSA devices and drivers.
Change-Id: I24305e0337edc6fa555d436697b4e607a1e097d5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28128
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Decoder: gpu_decoder.hh and decoder.cc:
The decoder is defined in these files. The decoder
is implemented as a lookup table of function pointers
where each decode function will decode to a unique
ISA instruction, or do some sub-decoding to infer
the next decode function to call.
The format for each OP encoding is defined in the
header file.
Registers:
registers.[hh|cc] define the special registers and
operand selector values, which are used to map
operands to registers/special values. many
convenience functions are also provides to determine
the source/type of an operand, for example vector
vs. scalar, register operand vs. constant, etc.
GPU ISA:
Some special GPU ISA state is maintained in gpu_isa.hh
and isa.cc. This class is used to hold some special
registers and values that can be used as operands
by ISA instructions. Eventually more ISA-specific
state should be moved here, and out of the WF class.
Vector Operands:
The operands for GCN3 instructions are defined in
operand.hh. This file defines both scalar and
vector operands wth GCN3 specific semantics. The
vector operand class is desgned around the generic
vec_reg.hh that is already present in gem5.
Instructions:
The GCN3 instructions are defined and implemented
throughout gpu_static_inst.[hh|cc], instructions.[hh|cc],
op_encodings.[hh|cc], and inst_util.hh. GCN3 instructions
all fall under one of the OP encoding types; for example
scalar memory operands are of the type SMEM, vector
ALU instructions can be VOP3, VOP2, etc. The base code
common to all instructions of a certain OP encoding type
is implemented in the OP encodings files, which includes
operand information, disassembly methods, encoding type,
etc.
Each individual ISA isntruction is implemented as
a class object in instructions.[hh|cc] and are derived
from one of the OP encoding types. The instructions.cc
file is primarily for the execute() methods of each
individual instruction, and the header file provides
the class definition and a few instruction specific
API calls.
Note that these instruction classes were auto-generated
but not using the gem5 ISA description language. A
custom ISA description was used and that cannot be released
publicly, therefore we are providing them already in C++.
Change-Id: I14d2a02d6b87109f41341c8f50a69a2cca9f3d14
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28127
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>