This patch augments the MESI_Three_Level Ruby protocol with hardware
transactional memory support.
The HTM implementation relies on buffering of speculative memory updates.
The core notifies the L0 cache controller that a new transaction has
started and the controller in turn places itself in transactional state
(htmTransactionalState := true).
When operating in transactional state, the usual MESI protocol changes
slightly. Lines loaded or stored are marked as part of a transaction's
read and write set respectively. If there is an invalidation request to
cache line in the read/write set, the transaction is marked as failed.
Similarly, if there is a read request by another core to a speculatively
written cache line, i.e. in the write set, the transaction is marked as
failed. If failed, all subsequent loads and stores from the core are
made benign, i.e. made into NOPS at the cache controller, and responses
are marked to indicate that the transactional state has failed. When the
core receives these marked responses, it generates a HtmFailureFault
with the reason for the transaction failure. Servicing this fault does
two things--
(a) Restores the architectural checkpoint
(b) Sends an HTM abort signal to the cache controller
The restoration includes all registers in the checkpoint as well as the
program counter of the instruction before the transaction started.
The abort signal is sent to the L0 cache controller and resets the
failed transactional state. It resets the transactional read and write
sets and invalidates any speculatively written cache lines. It also
exits the transactional state so that the MESI protocol operates as
usual.
Alternatively, if the instructions within a transaction complete without
triggering a HtmFailureFault, the transaction can be committed. The core
is responsible for notifying the cache controller that the transaction
is complete and the cache controller makes all speculative writes
visible to the rest of the system and exits the transactional state.
Notifting the cache controller is done through HtmCmd Requests which are
a subtype of Load Requests.
KUDOS:
The code is based on a previous pull request by Pradip Vallathol who
developed HTM and TSX support in Gem5 as part of his master’s thesis:
http://reviews.gem5.org/r/2308/index.html
JIRA: https://gem5.atlassian.net/browse/GEM5-587
Change-Id: Icc328df93363486e923b8bd54f4d77741d8f5650
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30319
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This upgrades the garnet model to support HeteroGarnet
1) Static and dynamic multi-freq domains in network
2) Support for CDC
3) Separate links for each message class
4) Separate linkwidth for each message class
5) Support for SerDes
Change-Id: I6d00e3b5cb3745e849d221066cb46b2138c47871
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32597
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
There was a missing option (--buffers-size) used to set the mandatory
queue size for the scalar controllers. This patch renames the option to
be more clear, and adds it to the argument parser.
Default of 128 taken from the implementation on the GCN staging branch
Change-Id: I58b6b57be07498cdf6e39c0bb85982674ec4caa6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32676
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This change sets the properties in hsaTopology to the proper values
specified by the user through command-line arguments. This ensures
that if the properties file is read by a program, it will return
the correct values for the simulated hardware.
This change also adds in a command-line argument for the lds size, as
it was the only other property used in hsaTopology that didn't have
a command-line argument. The default value (65536) is taken from
src/gpu-compute/LdsState.py
Change-Id: I17bb812491708f4221c39b738c906f1ad944614d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31995
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This was originally from the GCN staging branch, which only had
GPU_VIPER.py, but the other GPU_VIPER configs had DirMem as well, so I
applied this change to all of them.
The patch replaces the Directory in DirCntrl from DirMem to
RubyDirectoryMemory. This fixes errors that DirMem caused relating to
setting class variables. It also generates and sets addr_ranges in
DirCntrl as RubyDirectoryMemory uses the parent object's addr_ranges
in its code
The style checker complained about a line length in GPU_VIPER_Region,
so the patch also fixes that
Change-Id: Icec96777a51d8a826b576fc752fae0f7f15427bc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32674
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This changeset adds the necessary changes for running
GCN3 ISA with VIPER in apu_se.py.
Changes to the VIPER protocol configs are made to add support
for DMA and scalar caches.
hsaTopology is added to help the pseudo FS create the files
needed by ROCm to understand the device on which the SW is
being run.
Change-Id: I0f47a6a36bb241a26972c0faafafcf332a7d7d1f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30274
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
fs.py warns when an Arm platform is being created without a DTB file,
if the platform does not support the automatic creation of a DTB.
Updated the list of supported platforms with recent additions in order
to remove incorrect and potentially confusing warnings.
Change-Id: I549124a1afbc36e313f614dccab17973582bc3f7
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30575
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit makes it possible to make invocations such as:
gem5.opt se.py --stats-root 'system.cpu[:].dtb' --stats-root 'system.membus'
When --stats-root is given, only stats that are under any of the root
SimObjects get dumped. E.g. the above invocation would dump stats such as:
system.cpu0.dtb.walker.pwrStateResidencyTicks::UNDEFINED
system.cpu1.dtb.walker.pwrStateResidencyTicks::UNDEFINED
system.membus.pwrStateResidencyTicks::UNDEFINED
system.membus.trans_dist::ReadReq
but not for example `system.clk_domain.clock`.
If the --stats-root is given, only new stats as defined at:
Idc8ff448b9f70a796427b4a5231e7371485130b4 get dumped, and old ones are
ignored. The commits following that one have done some initial conversion
work, but many stats are still in the old format.
Change-Id: Iadaef26edf9a678b39f774515600884fbaeec497
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28628
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This changeset allows setting a variable for interleaving.
That value is used together with the number of directories to
calculate numa_high_bit, which is in turn used to set up
cache, directory, and memory controller interleaving.
A similar approach is used to set xor_low_bit, and calculate
xor_high_bit for address hashing.
Change-Id: Ia342c77c59ca2e3438db218b5c399c3373618320
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28134
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Most of these "rcS" scripts are only useful for specific disk images
that have long been lost to the gem5 community. This commit deletes all
of these scripts. It keeps the generally useful hack_back_cktp script
and the bbench scripts that work with the android images that are still
available.
In the future, these remaning scripts should be moved to the gem5
resources repository.
Issue-on: https://gem5.atlassian.net/browse/GEM5-350
Change-Id: Iba99e70fde7f656e968b4ecd95663275bd38fd6e
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28507
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch properly sets the access permissions in all controllers.
'Busy' was used for all transient states, which is incorrect in lots of
cases when we still hold a valid copy of the line and are able to handle
a functional read.
In the L2 controller these states were split to differentiate the access
permissions:
IFGXX -> IFGXX, IFGXXD
IGMO -> IGMO, IGMOU
IGMIOF -> IGMIOF, IGMIOFD
Same for the dir. controller:
IS -> IS, IS_M
MM -> MM, MM_M
The dir. controllers also has the states WBI/WBS for lines that have
been queued for a writeback. In these states we hold the data in the TBE
for replying to functional reads until the memory acks the write and we
move to I or S.
Other minor changes includes updated debug messages and asserts.
Change-Id: Ie4f6eac3b4d2641ec91ac6b168a0a017f61c0d6f
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/21927
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Pouya Fotouhi <pfotouhi@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
This commit does not make any functional changes but just rearranges
the existing code with regard to the power states. Previously, all
code regarding power states was in the ClockedObjects. However, it
seems more logical and cleaner to move this code into a separate
class, called PowerState. The PowerState is a now SimObject. Every
ClockedObject has a PowerState but this patch also allows for objects
with PowerState which are not ClockedObjects.
Change-Id: Id2db86dc14f140dc9d0912a8a7de237b9df9120d
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28049
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
fs_power.py is an example script that demonstrates how power models
can be used with gem5. Previously, the formulas used to calculate the
dynamic and static power of the cores and the L2 cache were using
stats in equations as determined by their path relative to the
SimObject where the power model is attached to or full paths. This CL
changes these formulas to refer to the stats only by their full paths.
Change-Id: I91ea16c88c6a884fce90fd4cd2dfabcba4a1326c
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27893
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Before this change, running:
./build/NULL/gem5.opt configs/example/ruby_mem_test.py -m 20000000 \
--functional 10
would only print warning for memory errors such as:
warn: Read access failed at 0x107a00
and there was no way to make the simulation fail.
This commit makes those warnings into errors such as:
panic: Read access failed at 0x107a00
unless --suppress-func-errors is given.
This will be used to automate MemTest testing in later commits.
Change-Id: I1840c1ed1853f1a71ec73bd50cadaac095794f91
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/26804
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This is specialized per arch, and the Workload class is the only thing
actually using it. It doesn't make any sense to dispatch those calls
over to the System object, especially since that was, in most cases,
the only reason an ISA specific system class even still existed.
After this change, only ARM still has an architecture specific System
class.
Change-Id: I81b6c4db14b612bff8840157cfc56393370095e2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/24287
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This generalized Workload SimObject is not geared towards FS or SE
simulations, although currently it's only used in FS. This gets rid
of the ARM specific highestELIs64 property (from the workload, not the
system) and replaces it with a generic getArch.
The old globally accessible kernel symtab has been replaced with a
symtab accessor which takes a ThreadContext *. The parameter isn't used
for anything for now, but in cases where there might be multiple
symbol tables to choose from (kernel vs. current user space?) the
method will now be able to distinguish which to use. This also makes
it possible for the workload to manage its symbol table with whatever
policy makes sense for it.
That method returns a const SymbolTable * since most of the time the
symbol table doesn't need to be modified. In the one case where an
external entity needs to modify the table, two pseudo instructions,
the table to modify isn't necessarily the one that's currently active.
For instance, the pseudo instruction will likely execute in user space,
but might be intended to add a symbol to the kernel in case something
like a module was loaded.
To support that usage, the workload has a generic "insertSymbol" method
which will insert the symbol in the table that "makes sense". There is
a lot of ambiguity what that means, but it's no less ambiguous than
today where we're only saved by the fact that there is generally only
one active symbol table to worry about.
This change also introduces a KernelWorkload SimObject class which
inherits from Workload and adds in kernel related members for cases
where the kernel is specified in the config and loaded by gem5 itself.
That's the common case, but the base Workload class would be used
directly when, for instance, doing a baremetal simulation or if the
kernel is loaded by software within the simulation as is the case for
SPARC FS.
Because a given architecture specific workload class needs to inherit
from either Workload or KernelWorkload, this change removes the
ability to boot ARM without a kernel. This ability should be restored
in the future.
To make having or not having a kernel more flexible, the kernel
specific members of the KernelWorkload should be factored out into
their own object which can then be attached to a workload through a
(potentially unused) property rather than inheritance.
Change-Id: Idf72615260266d7b4478d20d4035ed5a1e7aa241
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/24283
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The main applications are to run baremetal programs and initramfs Linux
kernel.
Before this patch, disks() calls in makeArmSystem would throw:
IOError: Can't find file 'linux-aarch32-ael.img' on M5_PATH.
In order to achieve this, this commit also removes the default hardcoded
disk image basenames.
For example, before this commit, running without a --disk-image in X86
would automatically search for an image with basename x86root.img in
M5_PATH, which means we would either have to ignore any disk image error,
or else running without disk images would fail.
After this commit, you would have to pass --disk-image x86root.img to
achieve the old behaviour.
Change-Id: I0ae8c4b3b93d0074fd4fca0d5ed52181c50b6c04
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27867
Tested-by: Gem5 Cloud Project GCB service account <345032938727@cloudbuild.gserviceaccount.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
This simplifies the SPARC FS workload significantly, and removes
assumptions about what ROMs exist, where they go, etc. It removes
other components from the loop which don't have anything to contribute
as far as setting up the ROMs.
One side effect of this is that there isn't specialized support for
adding PC based events which would fire in the ROMs, but that was never
done and the files that were being used were flat binary blobs with no
symbols in the first place.
This also necessitates building a unified image which goes into the single
8MB ROM that is located at address 0xfff0000000. That is simply done
with the following commands:
dd if=/dev/zero of=t1000_rom.bin bs=1024 count=8192
dd if=reset_new.bin of=t1000_rom.bin
dd if=q_new.bin of=t1000_rom.bin bs=1024 seek=64
dd if=openboot_new.bin of=t1000_rom.bin bs=1024 seek=512
This results in an 8MB blob which can be loaded verbatim into the ROM.
Alternatively, and with some extra effort, an ELF file could be
constructed which had each of these components as segments, offset to the
right location in the ELF header. That would be slightly more work to set up,
but wouldn't waste space on regions of the image that are all zeroes.
Change-Id: Id4e08f4e047e7bd36a416c197a36be841eba4a15
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27268
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: Gem5 Cloud Project GCB service account <345032938727@cloudbuild.gserviceaccount.com>
This change includes:
1) Verify available command bandwidth
2) Add support for multi-cycle commands
3) Add new timing parameters
4) Add ability to interleave bursts
5) Add LPDDR5 configurations
The DRAM controller historically does not verify contention on the
command bus and if there is adaquate command bandwidth to issue a
new command. As memory technologies evolve, multiple cycles are becoming
a requirement for some commands. Depending on the burst length, this
can stress the command bandwidth. A check was added to verify command
issue does not exceed a maximum value within a defined window. The
default window is a burst, with the maximum value defined based on the
burst length and media clocking characteristics. When the command bandwidth
is exceeded, commands will be shifted to subsequent burst windows.
Added support for multi-cycle commands, specifically Activate, which
requires a larger address width as capacities grow. Additionally,
added support for multi-cycle Read / Write bursts for low power
DRAM cases in which additional CLK synchronization may be required
to run at higher speeds.
To support emerging memories, added the following new timing parameters.
1) tPPD -- Precharge-to-Precharge delay
2) tAAD -- Max delay between Activate-1 and Activate-2 commands
I/O data rates are continuing to increase for DRAM but the core frequency
is still fairly stagnant for many technologies. As we increase the burst
length, either the core prefetch needs to increase (for a seamless burst)
or the burst will be transferred with gaps on the data bus. To support
the latter case, added the ability to interleave 2 bursts across bank
groups.
Using the changes above, added an initial set of LPDDR5 configurations.
Change-Id: I1b14fed221350e6e403f7cbf089fe6c7f033c181
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/26236
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: Gem5 Cloud Project GCB service account <345032938727@cloudbuild.gserviceaccount.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Calls to queueMemoryRead and queueMemoryWrite do not consider the size
of the queue between ruby directories and DRAMCtrl which causes infinite
buffering in the queued port between the two. This adds a MessageBuffer
in between which uses enqueues in SLICC and is therefore size checked
before any SLICC transaction pushing to the buffer can occur, removing
the infinite buffering between the two.
Change-Id: Iedb9070844e4f6c8532a9c914d126105ec98d0bc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27427
Tested-by: Gem5 Cloud Project GCB service account <345032938727@cloudbuild.gserviceaccount.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
I switch between waiting and non-waiting scenario many times per day.
The BaseCPU.wait_for_remote_gdb attribute, introduced in c2baaab0ed,
makes it much less painful by saving many recompiles.
The present commit tries to go a bit further: the se.py script is
under version control, and changing it interferes with smooth git
workflow.
Change-Id: Ie65ffc44b11d78d5e7878f81f2fcdafa143c20a8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27287
Tested-by: Gem5 Cloud Project GCB service account <345032938727@cloudbuild.gserviceaccount.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>