Use SConsopts files local to individual domains to pull
non-foundational build code out of SConstruct. This greatly simplifies
SConstruct, and also makes it easier to find build configuration having
to do with particular pieces of gem5.
This change also converts some python level variables, all_protocols,
protocol_dirs, and slicc_includes, into the environment where the timing
of their initialization is more flexible.
Change-Id: Ie61ceb75ae9e5557cc400603c972a9582e99c1ea
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40872
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
This patch add a new Ruby cache coherence protocol based on Arm' AMBA5
CHI specification. The CHI protocol defines and implements two state
machine types:
- Cache_Controller: generic cache controller that can be configured as:
- Top-level L1 I/D cache
- A intermediate level (L2, L3, ...) private or shared cache
- A CHI home node (i.e. the point of coherence of the system and
has the global directory)
- A DMA requester
- Memory_Controller: implements a CHI slave node and interfaces with
gem5 memory controller. This controller has the functionality of a
Directory_Controller on the other Ruby protocols, except it doesn't
have a directory.
The Cache_Controller has multiple cache allocation/deallocation
parameters to control the clusivity with respect to upstream caches.
Allocation can be completely disabled to use Cache_Controller as a
DMA requester or as a home node without a shared LLC.
The standard configuration file configs/ruby/CHI.py provides a
'create_system' compatible with configs/example/fs.py and
configs/example/se.py and creates a system with private L1/L2 caches
per core and a shared LLC at the home nodes. Different cache topologies
can be defined by modifying 'create_system' or by creating custom
scripts using the structures defined in configs/ruby/CHI.py.
This patch also includes the 'CustomMesh' topology script to be used
with CHI. CustomMesh generates a 2D mesh topology with the placement
of components manually defined in a separate configuration file using
the --noc-config parameter.
The example in configs/example/noc_config/2x4.yaml creates a simple 2x4
mesh. For example, to run a SE mode simulation, with 4 cores,
4 mem ctnrls, and 4 home nodes (L3 caches):
build/ARM/gem5.opt configs/example/se.py \
--cmd 'tests/test-progs/hello/bin/arm/linux/hello' \
--ruby --num-cpus=4 --num-dirs=4 --num-l3caches=4 \
--topology=CustomMesh --noc-config=configs/example/noc_config/2x4.yaml
If one doesn't care about the component placement on the interconnect,
the 'Crossbar' and 'Pt2Pt' may be used and they do not require the
--noc-config option.
Additional authors:
Joshua Randall <joshua.randall@arm.com>
Pedro Benedicte <pedro.benedicteillescas@arm.com>
Tuan Ta <tuan.ta2@arm.com>
JIRA: https://gem5.atlassian.net/browse/GEM5-908
Change-Id: I856524b0afd30842194190f5bd69e7e6ded906b0
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42563
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Splitting hw_prefetches into prefetch_hits and prefetch_misses so both
events can be tracked separately. Also added appropriate functions to
increment stats. Renamed m_prefetches for consistency.
sw_prefetches is not used and has been removed. The sequencer converts
SW prefetch requests into a RubyRequestType_LD/RubyRequestType_ST
which are handled as demand requests by the all current protocols.
Change-Id: Iafa6b31c84843ddd1fad98fa7e5afed02b8c4b4d
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41816
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
A single functionalRead may not be able to get the whole latest
copy of the block in protocols that have features such as:
- a cache line can be partially present and dirty in a controller
- a cache line can be transferred over the network using multiple
protocol-level messages
To support these cases, this patch adds an alternative function:
bool functionalRead(PacketPtr, WriteMask&)
Protocols that implement this function can partially update
the packet and use the WriteMask to mark updated bytes.
The top-level RubySystem:functionalRead then issues functionalRead
to controllers until the whole block is read.
This patch implements functionalRead(PacketPtr, WriteMask&) for all the
common messages and SimpleNetwork. A protocol-specific implementation
will be provided in a future patch.
The new interface is compiled only if required by the protocol (see
src/mem/ruby/system/SConscript). Otherwise the original interface is
used thus maintaining compatibility with previous protocols.
Change-Id: I4600d5f1d7cc170bd7b09ccd09bfd3bb6605f86b
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31416
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There is a race condition in VIPER where an atomic issued to the same
address can occur resulting in multiple trigger messages signalling the
compleition of the atomic operation. The first message was deallocating
the TBE causing the second message to dereference a nullptr when looking
up the TBE.
A counter is added to track the number of in flight AtomicDone trigger
messages. The AtomicDone is not called until the last in flight message
arrives at the trigger queue. The remaining messages call AtomicNotDone
which simply pops the message from the queue and keeps the TBE
allocated.
Change-Id: Ie1de0436861a7c393ad6d2fb2faceb83c18d4cc3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39175
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Returns the offset of an address with respect to a base address.
Looks unnecessary, but SLICC doesn't support casting and the '-'
operator for Addr types, so the alternative to this would be to add
more some helpers like 'addrToUint64' and 'uint64ToInt'.
Change-Id: I90480cec4c8b2e6bb9706f8b94ed33abe3c93e78
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31270
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch augments the MESI_Three_Level Ruby protocol with hardware
transactional memory support.
The HTM implementation relies on buffering of speculative memory updates.
The core notifies the L0 cache controller that a new transaction has
started and the controller in turn places itself in transactional state
(htmTransactionalState := true).
When operating in transactional state, the usual MESI protocol changes
slightly. Lines loaded or stored are marked as part of a transaction's
read and write set respectively. If there is an invalidation request to
cache line in the read/write set, the transaction is marked as failed.
Similarly, if there is a read request by another core to a speculatively
written cache line, i.e. in the write set, the transaction is marked as
failed. If failed, all subsequent loads and stores from the core are
made benign, i.e. made into NOPS at the cache controller, and responses
are marked to indicate that the transactional state has failed. When the
core receives these marked responses, it generates a HtmFailureFault
with the reason for the transaction failure. Servicing this fault does
two things--
(a) Restores the architectural checkpoint
(b) Sends an HTM abort signal to the cache controller
The restoration includes all registers in the checkpoint as well as the
program counter of the instruction before the transaction started.
The abort signal is sent to the L0 cache controller and resets the
failed transactional state. It resets the transactional read and write
sets and invalidates any speculatively written cache lines. It also
exits the transactional state so that the MESI protocol operates as
usual.
Alternatively, if the instructions within a transaction complete without
triggering a HtmFailureFault, the transaction can be committed. The core
is responsible for notifying the cache controller that the transaction
is complete and the cache controller makes all speculative writes
visible to the rest of the system and exits the transactional state.
Notifting the cache controller is done through HtmCmd Requests which are
a subtype of Load Requests.
KUDOS:
The code is based on a previous pull request by Pradip Vallathol who
developed HTM and TSX support in Gem5 as part of his master’s thesis:
http://reviews.gem5.org/r/2308/index.html
JIRA: https://gem5.atlassian.net/browse/GEM5-587
Change-Id: Icc328df93363486e923b8bd54f4d77741d8f5650
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30319
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There are race conditions while running several benchmarks, where
the DMA engine and the CorePair simultaneously send requests for the
same block. This patch fixes two scenarios
(a) If the request from the DMA engine arrives before the one from the
CorePair, the directory controller records it as a pending request.
However, once the DMA request is serviced, the directory doesn't check
for pending requests. The CorePair, consequently, never sees a response
to its request and this results in a Deadlock.
Added call to wakeUpDependents in the transition from BDR_Pm to U
Added call to wakeUpDependents in the transition from BDW_P to U
(b) If the request from the CorePair is being serviced by the directory
and the DMA requests for the same block, this causes an invalid
transition because the current coherence doesn't take care of this
scenario.
Added transition state where the requests from DMA are added to the
stall buffer.
Updated B to U CoreUnblock transition to check all buffers, as the DMA
requests were being placed later in the stall buffer than was being checked
Change-Id: I5a76efef97723bc53cf239ea7e112f84fc874ef8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31996
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This changeset adds the necessary changes for running
GCN3 ISA with VIPER in apu_se.py.
Changes to the VIPER protocol configs are made to add support
for DMA and scalar caches.
hsaTopology is added to help the pseudo FS create the files
needed by ROCm to understand the device on which the SW is
being run.
Change-Id: I0f47a6a36bb241a26972c0faafafcf332a7d7d1f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30274
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch enables cache controllers to make response
messages in advance, store them in a per-address saved
map in an output message buffer and enqueue them altogether
in the future. This patch introduces new slicc statement
called defer_enqueueing. This patch would help simplify
the logic of state machines that deal with coalesing
multiple requests from different requestors.
Change-Id: I566d4004498b367764238bb251260483c5a1a5e5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28132
Reviewed-by: Tuan Ta <qtt2@cornell.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The recent commit dd6cd33 modified the behaviour of the the Ruby
sequencer to handle load linked requests as loads rather than
stores. This caused the regression test
realview-simple-timing-dual-ruby-ARM-x86_64-opt
to become stuck when booting Linux. This patch fixes the issue by
adding a missing forward_eviction_to_cpu action to the state
transition(OM, Fwd_GETX, IM).
Change-Id: I8f253c5709488b07ddc5143a15eda406e31f3cc6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28787
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Fixes a few resource allocation issues in the directory controller:
- Added TBE resource checks on allocation.
- Now also allocating a TBE when issuing read requests to the controller
to allow for a better response to backpressure. Without the TBE as a
limiting factor, the directory can have an unbounded amount of
outstanding memory requests.
- Also allocating a TBE for forwarded requests.
Change-Id: I17016668bd64a50a4354baad5d181e6d3802ac46
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/21928
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Pouya Fotouhi <pfotouhi@ucdavis.edu>
This patch properly sets the access permissions in all controllers.
'Busy' was used for all transient states, which is incorrect in lots of
cases when we still hold a valid copy of the line and are able to handle
a functional read.
In the L2 controller these states were split to differentiate the access
permissions:
IFGXX -> IFGXX, IFGXXD
IGMO -> IGMO, IGMOU
IGMIOF -> IGMIOF, IGMIOFD
Same for the dir. controller:
IS -> IS, IS_M
MM -> MM, MM_M
The dir. controllers also has the states WBI/WBS for lines that have
been queued for a writeback. In these states we hold the data in the TBE
for replying to functional reads until the memory acks the write and we
move to I or S.
Other minor changes includes updated debug messages and asserts.
Change-Id: Ie4f6eac3b4d2641ec91ac6b168a0a017f61c0d6f
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/21927
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Pouya Fotouhi <pfotouhi@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch fixes some issues in the directory controller regarding DMA
handling:
1) Junk data messages were being sent immediately in response to DMA reads
for a line in the S state (one or more sharers, clean). Now, data is
fetched from memory directly and forwarded to the device. Some existing
transitions for handling GETS requests are reused, since it's essentially
the same behavior (except we don't update the list of sharers for DMAs)
2) DMA writes for lines in the I or S states would always overwrite the
whole line. We now check if it's only a partial line write, in which case
we fetch the line from memory, update it, and writeback.
3) Fixed incorrect DMA msg size
Some existing functions were renamed for clarity.
Change-Id: I759344ea4136cd11c3a52f9eaab2e8ce678edd04
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/21926
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Pouya Fotouhi <pfotouhi@ucdavis.edu>