This patch augments the MESI_Three_Level Ruby protocol with hardware
transactional memory support.
The HTM implementation relies on buffering of speculative memory updates.
The core notifies the L0 cache controller that a new transaction has
started and the controller in turn places itself in transactional state
(htmTransactionalState := true).
When operating in transactional state, the usual MESI protocol changes
slightly. Lines loaded or stored are marked as part of a transaction's
read and write set respectively. If there is an invalidation request to
cache line in the read/write set, the transaction is marked as failed.
Similarly, if there is a read request by another core to a speculatively
written cache line, i.e. in the write set, the transaction is marked as
failed. If failed, all subsequent loads and stores from the core are
made benign, i.e. made into NOPS at the cache controller, and responses
are marked to indicate that the transactional state has failed. When the
core receives these marked responses, it generates a HtmFailureFault
with the reason for the transaction failure. Servicing this fault does
two things--
(a) Restores the architectural checkpoint
(b) Sends an HTM abort signal to the cache controller
The restoration includes all registers in the checkpoint as well as the
program counter of the instruction before the transaction started.
The abort signal is sent to the L0 cache controller and resets the
failed transactional state. It resets the transactional read and write
sets and invalidates any speculatively written cache lines. It also
exits the transactional state so that the MESI protocol operates as
usual.
Alternatively, if the instructions within a transaction complete without
triggering a HtmFailureFault, the transaction can be committed. The core
is responsible for notifying the cache controller that the transaction
is complete and the cache controller makes all speculative writes
visible to the rest of the system and exits the transactional state.
Notifting the cache controller is done through HtmCmd Requests which are
a subtype of Load Requests.
KUDOS:
The code is based on a previous pull request by Pradip Vallathol who
developed HTM and TSX support in Gem5 as part of his master’s thesis:
http://reviews.gem5.org/r/2308/index.html
JIRA: https://gem5.atlassian.net/browse/GEM5-587
Change-Id: Icc328df93363486e923b8bd54f4d77741d8f5650
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30319
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This starts the support of Hardware Transactional Memory on the mem side
* The following flags have been added:
HTM_START: The request starts a HTM transaction
HTM_COMMIT: The request commits a HTM transaction
HTM_CANCEL: The request cancels a HTM transaction
HTM_ABORT: The request aborts a HTM transaction
* The following fields have been added:
_instCount: The instruction count at the time this request is created
_htmAbortCause: The cause for HTM transaction abort
https://gem5.atlassian.net/browse/GEM5-587
Change-Id: Ic582a6566fdd23f30eb92723e629d0c4d4ca10e5
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30316
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The following APIs are not exported from the _m5 namespace and not
used by any of the debug glue code:
* m5.debug.findFlag
* m5.debug.setDebugFlag
* m5.debug.clearDebugFlag
* m5.debug.dumpDebugFlags
All of them have a clean Python interface where flags are exported
using the m5.debug.flags dictionary. There is also an m5.debug.help
function that lists the available debug flags.
Remove the unused APIs to avoid confusion.
Change-Id: I74738451eb5874f83b135adaccd30a0c6b478996
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34120
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There is currently no Python API to check if a debug flag is
enabled. Add a new status property that can be read or set to control
the status of a flag. The stat of a flag can also be queried by
converting it to a bool.
For example:
m5.debug.flags["XBar"].status = True
if m5.debug.flags["XBar"]:
print("XBar debugging is on")
Change-Id: I5a50c39ced182ab44e18c061c463d7d9c41ef186
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34119
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The debug flags API has a couple of quirks that should be cleaned
up. Specifically:
* Only CompoundFlag should expose a list of children.
* The global enable flag is just called "active", this isn't very
descriptive.
* Only SimpleFlag exposed a status member. This should be in the base
class to make the API symmetric.
* Flag::Sync() is an implementation detail and needs to be protected.
Change-Id: I4d7fd32c80891191aa04f0bd0c334c8cf8d372f5
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34118
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Compound flags are currently constructed using a constructor with a
finite set of arguments that default to nullptr that refer to child
flags. C++11 introduces two cleaner ways to achieve the same thing,
variadic templates and initializer_list. Use an initializer list to
pass dependent flags.
Change-Id: Iadcd04986ab20efccfae9b92b26c079b9612262e
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34115
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently there are independent round robin arbiter at each
input port and output port. Every time a VC is selected for
output allocation round robin is incremented irrespective of
if it is selected by its output port or not. This leads to
unfair arbitration at input port and is well known[1]. This
patch fixes it to increment only if the output port also
selects it.
[1] D. U. Becker and W. J. Dally, "Allocator implementations
for network-on-chip routers," Proceedings of the Conference
on High Performance Computing Networking, Storage and
Analysis, Portland, OR, 2009, pp. 1-12
Change-Id: I65963fb8082c51c0e3c6e031a8b87b4f5c3626e1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32601
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Currently the Switch Allocator takes up most of the simulation
wall clock time. This function checks for all VCs to see if it
should wakeup next. The input units which are simulated before
the switch allocator could have scheduled it already. This patch
adds a check for it.
Change-Id: I8609d4e7f925aa5e97198f6cd07466530f6fcf4c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32600
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
This change allows configuring each router with a certain number
of VCs for each VNET. This is beneficial when dealing with
heterogenous link widths in a system. Configuring VCs
for each router allows one to ensure equal throughput
within the network while avoiding head-of-line blocking.
Changing a router's VCs number can be done in topology files
using the vcs_per_vnet value argument of router.
Change-Id: Icf4f510248128429a1a11f19f9802ee96f340611
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32599
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This upgrades the garnet model to support HeteroGarnet
1) Static and dynamic multi-freq domains in network
2) Support for CDC
3) Separate links for each message class
4) Separate linkwidth for each message class
5) Support for SerDes
Change-Id: I6d00e3b5cb3745e849d221066cb46b2138c47871
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32597
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
During squash of branch predictor history, RAS recovery mess up the
stack because of function "restore" in RAS (src/cpu/pred/ras.cc). In
restore function, it does not update "usedEntries" variable resulting in
restore failure.
To be specific, in order to remove mispredicted call, it uses pop() and
it updates tos. However in order to restore mispredicted ret
instruction, it uses restore() but it does not update tos. This pair of
function call mess up the RAS resulting in many misspeculation.
The solution is to update usedEntries variable as “push” function does.
This is possible because restoration is done with reverse order of push
and pop.
Jira Issue: https://gem5.atlassian.net/browse/GEM5-732
Change-Id: Ia14e71c26d20b2795fd55a6a0dd3284c03570614
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33214
Reviewed-by: Trivikram Reddy <tvreddy@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Previously `tests/gem5/fs/linux/arm/run.py` contained an ugly,
hard-coded `gem5_root` variable. In this commit we pass `gem5_root`
as an argument from `tests/gem5/fs/linux/arm/test.py`, utilizing
`config.base_dir`.
Change-Id: I2b1e3369b1078cce9375fadb7c39fa4292648658
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33955
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
We were downloading resources to various different locations, for no
real reason. This standardizes the process. From this commit onwards,
all testing resources are downloaded to `tests/gem5/resources` by
default. This may be overriden via the `--bin-path` TestLib argument.
Note: In order to do this I have changed the meaning of the `bin-path`
TestLib argument slightly. Previously the `bin-path` assumed a flat
(non-existant) hierarchy. A simple directory of local resources. This
new bin-path functionality maintains logical sub-directories. This is
technically an API change and will be noted in the release notes.
Change-Id: I4df85c121fa65f787fd71f03d74361afea121380
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33145
Reviewed-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
This is part of a process of getting rid of the `tests/config`
directory, and placing these configs either where they are used,
removing them if unneeded, or moving them to `configs/example`.
These config files, in this patchset, are part of the realview tests
found in `tests/gem5/fs/linux/arm/`. They have been moved to
`tests/gem5/configs`.
Change-Id: I7706b59c58da6413f5f3dd816a1e5cd54a834a58
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33143
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
When the system call happens during the execution of the system call
instruction, it can be ambiguous what state takes precedence, the state
update from the instruction or the system call. These may be tracked
differently and found in an unpredictable order in, for example, the O3
CPU. An instruction can avoid updating any state explicitly, but
implicitly updated state (specifically the PC) will always update,
whether the instruction wants it to or not.
If the system call can be deferred by using a Fault object, then it's no
longer ambiguous. The PC update will be discarded, and the system call
can set the PC however it likes. Because there is no implicit PC update,
the PC needs to be walked forward, either to what it would have been
anyway, or to what the system call set in NPC.
In addition, because of the existing semantics around handling Faults,
the instruction no longer needs to be marked as serializing,
non-speculative, etc.
The "normal", aka architectural, aka FS version of the system call
instructions don't return a Fault artificially.
Change-Id: I72011a16a89332b1dcfb01c79f2f0d75c55ab773
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33281
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>