This makes what are configuration and what are internal SCons variables
explicit and separate, and makes it unnecessary to call out what
variables to export to C++.
These variables will also be plumbed into and out of kconfiglib in later
changes.
Change-Id: Iaf5e098d7404af06285c421dbdf8ef4171b3f001
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56892
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Presumably, these are fixed for whatever protocol that gets selected. We
don't need to accumulate includes, we need to set includes to something
in particular. If there is a common include which always needs to be
used, we can handle that in the SConscript separately from
SLICC_INCLUDES.
Change-Id: I996d08566944e38e388dc287f644c40366ebba0d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56754
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
This enhances MOESI_AMD_Base-dir DmaWrite to enable partial writes. This
is currently done by assuming a full cache line, invalidating caches,
and transitioning back to unblocked state. The enhanced write supports
partial writes (i.e., smaller than cache line size) by first reading
memory, merging the modified data, and then writing back to memory.
Implementation of this mirrors that of DmaRead in terms of state. This
means for each DmaRead state (BDR_PM, BDR_Pm, and BDR_M) there is a
write analogue (BDW_PM, BDW_Pm, and BDR_M) and the BDR_P state is
removed. Furthermore, this enhanced DmaWrite ... actually writes data to
memory instead of relying on DirectoryEntry / backing store for correct
data.
There are two possible state transitions for DmaWrite now. (1) Memory
data arrives before probe response and (2) probe response arrives before
memory data. In case (1), probe data overwrites memory data and merges
the partial write using the TBE write mask then updates write mask to
'filled' state. In case (2), probe data is merged with the partial data
using the TBE write mask then updates write mask to 'filled' state. The
memory data will then be clobbered by copying the TBE data over the
response since the write mask is now full.
Change-Id: I1eebb882b464c4c5ee5fd60932fd38d271ada4d7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57410
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This protocol is using an old style where read/writes to memory were
being done by writing to a DataBlock in a DirectoryMemory entry. This
results in having multiple copies of memory, leads to stale copies in at
least one memory (usually DRAM), and require --access-backing-store in
most cases to work properly. This changeset removes all references to
getDirectoryEntry(...).DataBlk and instead forwards those reads and
writes to DRAM always.
Change-Id: If2e52151789ad82c7b55c8fa2b41c1f4e5b65994
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57409
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This protocol is using an old style where read/writes to memory were
being done by writing to a DataBlock in a DirectoryMemory entry. This
results in having multiple copies of memory, leads to stale copies in at
least one memory (usually DRAM), and require --access-backing-store in
most cases to work properly. This changeset removes all references to
getDirectoryEntry(...).DataBlk and instead forwards those reads and
writes to DRAM always.
This results in new transient states BL_WM, BDW_WM, and B_WM which are
blocked states waiting on memory acks indicating a write request is
complete. The appropriate transitions are updates to move to these new
states and stall states are updated to include them. DMA write ACK is
also moved to when the request is sent to memory, rather than when the
request is received.
Change-Id: Ic5bd6a8a8881d7df782e0f7eed8be9d873610e04
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56446
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The directory has an assert that this is at least one destination for a
probe when sending an invalidation or shared probe to coherence end
points in the protocol (TCC, LLC). This is not necessarily request and
for certain configurations there will be no probes required and none
will be sent. One such configuration is the GPU protocol tester which
would not require a probe to the CPU if it does not exist.
To fix this we first collect the probe destinations. Then we check if
any destinations exist. If so, we send the probe message. Otherwise we
immediately enqueue a probe complete message to the trigger queue. This
reorganization prevents messages with no destinations from being
enqueued, meeting the criteria for the assertion.
Change-Id: If016f457cb8c9e0277a910ac2c3f315c25b50ce8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55543
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
JIRA: https://gem5.atlassian.net/browse/GEM5-1185
Fixed an issue in which a CleanUnique responder would incorrectly
deallocate the cache block when handling an stale CU when the state
is UD_RU or UC_RU (thus incorrectly transitioning to RU).
The fix is to handle stale CUs similarly to stale WBs where we
override the dataValid TBE field to prevent the wrong state
transition.
This patch moves the stale code path to a separate transition
(similarly to stale WBs/Evicts) and moves the dataValid override to
Initiate_Request_Stale so it applies to all stale request types.
Notice now the stale field is also set on stale Comp_UC responses.
Additional minor change: CheckUpgrade_FromRU is the same as
CheckUpgrade_FromStore so it was removed.
Change-Id: I0a2cedcfde1dc30d67aa2c16d71b7470369c2b6e
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56810
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Meatboy 106 <garbage2collector@gmail.com>
Remove the line "For use for simulation and test purposes only" in files
were AMD is the only copyright holder listed in the header. This happens
to be the case for all files where this line exists, removing it
completely from gem5.
Change-Id: I623f266b002f564301b28774f49081099cfc60fd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53943
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently MOESI_AMD_Base used in VIPER has a CPUonly parameter which
indicates that messages should not try to add GPU SLICC controllers as
destinations. This adds the analogue GPUonly parameter which indicates
that requests should not try to add CPU SLICC controllers.
Also adds an assert to ensure the outgoing message has at least one
destination. This assert would indicate a misconfiguration.
Change-Id: Ibb0affd4606084fca021f0e7c117d4ff8c06d429
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51928
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
The CPUonly variable in MOESI_AMD_Base's Directory indicates that probes
should not be sent to any GPU SLICC controllers as they are not part of
CPU. There is one CPUonly check missing which causes problems in
GPU-only Ruby networks as there is no route to any controllers with that
MachineType. Add a condition to check CPUonly and do nothing in that
case.
Change-Id: I41b6c04feec473e34b04402adfb5978e75b847b6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51927
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently, the GPU VIPER TCC protocol handles races between atomics in
the triggerQueue_in. This in_port does not check for resource
availability, which can cause the trigger queue to execute multiple
times. Although this is the expected behavior, the code for handling
atomic races decrements the atomicDoneCnt flag in the trigger queue,
which is not safe since resource contention may cause it to execute
multiple times.
To resolve this issue, this commit moves the decrementing of this
counter to a new action that is called in an event that happens only
when the race between atomics is detected.
Change-Id: I552fd4f34fdd9ebeec99fb7aeb4eeb7b150f577f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51368
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
In the GPU VIPER TCC, programs with mixes of atomics and data
accesses to the same address, in the same kernel, can experience
deadlock when large applications (e.g., Pannotia's graph analytics
algorithms) are running on very small GPUs (e.g., the default 4 CU GPU
configuration). In this situation, deadlocks occur due to resource
stalls interacting with the behavior of the current implementation for
handling races between atomic accesses. The specific order of events
causing this deadlock are:
1. TCC is waiting on an atomic to return from directory
2. In the meantime it receives another atomic to the same address -- when
this happens, the TCC increments number of atomics to this address
(numAtomics = 2) that are pending in TBE, and does a write through of the
atomic to the directory.
3. When the first atomic returns from the Directory, it decrements the
numAtomics counter. numAtomics was at 2 though, because of step #2. So
it doesn't deallocate the TBE entry and calls Event:AtomicNotDone.
4. Another request (a LD) to the same address comes along for the same
address. The LD does z_stall since the second atomic is pending –- so the
LD retries every cycle until the deadlock counter times out (or until the
second atomic comes back).
5. The second atomic returns to the TCC. However, because there are so
many LD's pending in the cache, all doing z_stall's and retrying every cycle,
there are a lot of resource stalls. So, when the second atomic returns, it is
forced to retry its operation multiple times -- and each time it decrements
the atomicDoneCnt flag (which was added to catch a race between atomics
arriving and leaving the TCC in 7246f70bfb) repeatedly. As a result
atomicDoneCnt becomes negative.
6. Since this atomicDoneCnt flag is used to determine when Event:AtomicDone
happens, and since the resource stalls caused the atomicDoneCnt flag to become
negative, we never complete the atomic. Which means the pending LD can never
access the line, because it's stuck waiting for the atomic to complete.
7. Eventually the deadlock threshold is reached.
To fix this issue, this commit changes the VIPER TCC protocol from using
z_stall to using the stall_and_wait buffer method that the
Directory-level of the SLICC already uses. This change effectively
prevents resource stalls from dominating the TCC level, by putting
pending requests for a given address in a per-address stall buffer.
These requests are then woken up when the pending request returns.
As part of this change, this change also makes two small changes to the
Directory-level protocol (MOESI_AMD_BASE-dir):
1. Updated the names of the wakeup actions to match the TCC wakeup actions,
to avoid confusion.
2. Changed transition(B, UnblockWriteThrough, U) to check all stall buffers,
as some requests were being placed later in the stall buffer than was
being checked. This mirrors the changes in 187c44fe44 to other Directory
transitions to resolve races between GPU and DMA requests, but for
transitions prior workloads did not stress.
Change-Id: I60ac9830a87c125e9ac49515a7fc7731a65723c2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51367
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The GPU VIPER TCC protocol accidentally used "TiggerMsg" instead
of "TriggerMsg" for the triggerQueue_in port. This was a benign
bug beacuse the msg type is not used in the in_port implementation
but still makes the SLICC harder to understand, so fixing it is
worthwhile.
Change-Id: I88cbc72bac93bcc58a66f057a32f7bddf821cac9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/44905
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Use SConsopts files local to individual domains to pull
non-foundational build code out of SConstruct. This greatly simplifies
SConstruct, and also makes it easier to find build configuration having
to do with particular pieces of gem5.
This change also converts some python level variables, all_protocols,
protocol_dirs, and slicc_includes, into the environment where the timing
of their initialization is more flexible.
Change-Id: Ie61ceb75ae9e5557cc400603c972a9582e99c1ea
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40872
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
This patch add a new Ruby cache coherence protocol based on Arm' AMBA5
CHI specification. The CHI protocol defines and implements two state
machine types:
- Cache_Controller: generic cache controller that can be configured as:
- Top-level L1 I/D cache
- A intermediate level (L2, L3, ...) private or shared cache
- A CHI home node (i.e. the point of coherence of the system and
has the global directory)
- A DMA requester
- Memory_Controller: implements a CHI slave node and interfaces with
gem5 memory controller. This controller has the functionality of a
Directory_Controller on the other Ruby protocols, except it doesn't
have a directory.
The Cache_Controller has multiple cache allocation/deallocation
parameters to control the clusivity with respect to upstream caches.
Allocation can be completely disabled to use Cache_Controller as a
DMA requester or as a home node without a shared LLC.
The standard configuration file configs/ruby/CHI.py provides a
'create_system' compatible with configs/example/fs.py and
configs/example/se.py and creates a system with private L1/L2 caches
per core and a shared LLC at the home nodes. Different cache topologies
can be defined by modifying 'create_system' or by creating custom
scripts using the structures defined in configs/ruby/CHI.py.
This patch also includes the 'CustomMesh' topology script to be used
with CHI. CustomMesh generates a 2D mesh topology with the placement
of components manually defined in a separate configuration file using
the --noc-config parameter.
The example in configs/example/noc_config/2x4.yaml creates a simple 2x4
mesh. For example, to run a SE mode simulation, with 4 cores,
4 mem ctnrls, and 4 home nodes (L3 caches):
build/ARM/gem5.opt configs/example/se.py \
--cmd 'tests/test-progs/hello/bin/arm/linux/hello' \
--ruby --num-cpus=4 --num-dirs=4 --num-l3caches=4 \
--topology=CustomMesh --noc-config=configs/example/noc_config/2x4.yaml
If one doesn't care about the component placement on the interconnect,
the 'Crossbar' and 'Pt2Pt' may be used and they do not require the
--noc-config option.
Additional authors:
Joshua Randall <joshua.randall@arm.com>
Pedro Benedicte <pedro.benedicteillescas@arm.com>
Tuan Ta <tuan.ta2@arm.com>
JIRA: https://gem5.atlassian.net/browse/GEM5-908
Change-Id: I856524b0afd30842194190f5bd69e7e6ded906b0
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42563
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Splitting hw_prefetches into prefetch_hits and prefetch_misses so both
events can be tracked separately. Also added appropriate functions to
increment stats. Renamed m_prefetches for consistency.
sw_prefetches is not used and has been removed. The sequencer converts
SW prefetch requests into a RubyRequestType_LD/RubyRequestType_ST
which are handled as demand requests by the all current protocols.
Change-Id: Iafa6b31c84843ddd1fad98fa7e5afed02b8c4b4d
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41816
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
A single functionalRead may not be able to get the whole latest
copy of the block in protocols that have features such as:
- a cache line can be partially present and dirty in a controller
- a cache line can be transferred over the network using multiple
protocol-level messages
To support these cases, this patch adds an alternative function:
bool functionalRead(PacketPtr, WriteMask&)
Protocols that implement this function can partially update
the packet and use the WriteMask to mark updated bytes.
The top-level RubySystem:functionalRead then issues functionalRead
to controllers until the whole block is read.
This patch implements functionalRead(PacketPtr, WriteMask&) for all the
common messages and SimpleNetwork. A protocol-specific implementation
will be provided in a future patch.
The new interface is compiled only if required by the protocol (see
src/mem/ruby/system/SConscript). Otherwise the original interface is
used thus maintaining compatibility with previous protocols.
Change-Id: I4600d5f1d7cc170bd7b09ccd09bfd3bb6605f86b
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31416
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There is a race condition in VIPER where an atomic issued to the same
address can occur resulting in multiple trigger messages signalling the
compleition of the atomic operation. The first message was deallocating
the TBE causing the second message to dereference a nullptr when looking
up the TBE.
A counter is added to track the number of in flight AtomicDone trigger
messages. The AtomicDone is not called until the last in flight message
arrives at the trigger queue. The remaining messages call AtomicNotDone
which simply pops the message from the queue and keeps the TBE
allocated.
Change-Id: Ie1de0436861a7c393ad6d2fb2faceb83c18d4cc3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39175
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Returns the offset of an address with respect to a base address.
Looks unnecessary, but SLICC doesn't support casting and the '-'
operator for Addr types, so the alternative to this would be to add
more some helpers like 'addrToUint64' and 'uint64ToInt'.
Change-Id: I90480cec4c8b2e6bb9706f8b94ed33abe3c93e78
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31270
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>