Introduce far atomic operations in CHI protocol.
Three configuration parameters have been used to tune this behavior:
policy_type: sets the atomic policy to one of the described in our paper
atomic_op_latency: simulates the AMO ALU operation latency
comp_anr: configures the Atomic No return transaction to split
CompDBIDResp into two different messages DBIDResp and Comp
Change-Id: I087afad9ad9fcb9df42d72893c9e32ad5a5eb478
This will hold the CHI Data Buffer Identifier (DBID) field.
The DBID allows a Completer of a transaction to provide its own
identifier for a transaction ID.
This new ID will be used as a TxnId field by a following
WriteData/CompData/CompAck response.
For now we only set it to the original txnId (identity mapping)
Change-Id: If30c5e1cafbe5a30073c7cd01d60bf41eb586cee
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The TxnId field of a CHI request has so far been unused (other than for
DVM transactions). With this patch we always initialize the field when
we extract a ruby request from the sequencer port.
According to specs (IHI0050F):
A 12-bit field is defined for the TxnID with the number of outstanding
transactions being limited to 1024. A Requester is permitted to reuse a
TxnID value after it has received either:
* All responses associated with a previous transaction that have used
the same value.
* A RetryAck response for a previous transaction that used the same value
Change-Id: Ie48f0fee99966339799ac50932d36b2a927b1c7d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Based on the CHIRequestType, it automatically tells if the
request has been originated from the sequencer
(CPU load/fetch/store)
Change-Id: I50fd116c8b1a995b1c37e948cd96db60c027fe66
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
DCT must be disabled when handling a ReadUnique where the copy need to
be upgraded.
Previously we were just asserting as it was assumed DCT is only enabled
for HNFs (which can "auto-upgrade"). However DCT may also be enabled for
intermediated levels of distributed shared caches above the HNFs.
Currently we generate these stats for all defined Events in the
protocol, which may generate too many stats that are never used. Though
these don't appear in the stats.txt file, they unnecessarily increases
simulation startup time and memory footprint.
This patch limits those stats to events with the "in_trans" and/or
"out_trans" properties. SLICC compiler then checks which combinations of
event+state are possible when generating the stats.
Also the possible level of detail for inTransLatHist was reduced.
Only the number of transactions for each event+initial+final state
combinations is now accounted. Latency histograms are only defined per
event type (similarly to outTransLatHist). This significantly reduces
the final file size for generated stats.
Marks which events signal the beginning of incoming and outgoing
transactions for generating inTransLatHist and outTransLatHist stats.
Change-Id: I90594a27fa01ef9cfface309971354b281308d22
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Do not respond with SnpRespData_I when the line is still present
upstream.
Change-Id: I2592e5c6637cfc0e83042169a245837648276e61
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
DCT must be disabled when handling a ReadUnique where the copy
need to be upgraded.
Previously we were just asserting as it was assumed DCT is only enabled
for HNFs (which can "auto-upgrade"). However DCT may also be enabled
for intermediated levels of distributed shared caches above the HNFs.
Change-Id: I9e29142a8d2f59ea61c1d90cda6b00c19435d6b7
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
When an Evict request is received from upstream for a shared line
and the line is no longer cached locally (or on any other upstream
cache), we need to also send an Evict downstream. In this case we need
to wait until our outgoing Evict completes before completing the Evict
from upstream in order be able to resolve race conditions with incoming
snoops. E.g.: while our outgoing Evict is pending we may receive a
snoop requesting data, but we won't be able to complete this snoop if
we have already completed all upstream Evicts and we no longer have the
line.
Change-Id: I23ac4f0a9c4ddd81e2425376c8d1e1c7fb66d107
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Finish_CopyBack_Stale is scheduled only when the requestor is the last
sharer. This prevents the cacahe evicting the line which was already
evicted while the stale WriteBack transaction was stalled.
Wrong condition check in Finish_CopyBack_Stale for eviction is also
removed.
Change-Id: Ib66acc1b9e4a6f7cea373e1fb37375427897d48d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63611
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This commit adds `desc` descriptions to the new symbols introduced
with CHI DVM support. The generation of the SLICC HTML documentation
requires each symbol to have a description, so a build with
`SLICC_HTML=True` will fail without this change.
Change-Id: I06f3bdd33edd1ff6e4bec35b01a460b9359ed9f6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/60869
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When ReadOnce request hits upstream, set dataToBeInvalid to true
for R* states so that the line from the upstream is successfully dropped
at the end by Finalize_UpdateCacheFromTBE.
For UD_RU and UC_RU state, set dataValid to true to prevent it changing
to RU state when it doesn't get the snoop data response.
Change-Id: Ie83c511e8d158e18abc5c9c16bc6040ce73587bf
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58411
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Tiago Muck <tiago.muck@arm.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Assume core C1 with private L1/L2 and a shared exclusive L3.
C1 has a line in SC state, while the state in the L3 is
RUSC (L3 has exclusive accesses and upstream requester has line in SC).
When C1 evicts the line (Evict request), the L3 has to issue a
WriteEvictFull to the home node, however the L3 doesn't have a copy
of the line.
This fix handling Evict requests when the line state is RUSC. When
the last sharer issues an Evict request, the responder may issue
SnpOnce the obtain a copy the line if needed.
JIRA: https://gem5.atlassian.net/browse/GEM5-1195
Change-Id: Ic8f4e10b38d95cd6d84f8d65b87b0c94fcf52eea
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/59991
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
When just forwarding a WU request, the controller waits until the WU is
acked from downstream before sending the ack upstream. This
prevents snoops clearing valid WU data.
JIRA: https://gem5.atlassian.net/browse/GEM5-1195
This was more likely to happen with shared exclusive caches, e.g:
assume core C1 and C2 with private L1/L2 and a shared exclusive L3.
C1 has as dirty copy of the line while C2 issues a WriteUnique request
to that line. The line state is RU in the L3, so the L3 will just
forward the request to the HNF, so:
- C2 issues WU to L3 cache
- L3 acks the WU, allowing C2 to send the data, while concurrently
forwarding the WU to the HNF.
- L3 receives data from C2
- HNF sends invalidating snoops upstream because line is RU
- The snoop hazards with the pending WU at the L3 and invalidates
the data previously received. This causes an assertion to fail when
we resume handling the WU.
Change-Id: I51e457e0bdb648c0fff3f702b7d2c95dcf431dc5
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/59990
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Normally we don't check the TBE data if there are outstanding response
messages for the transaction because that means the latest valid data is
either in another cache or within an inflight message.
However this is not the case when we have either a pending CleanUnique
or we are handling CleanUnique. So bypass the pending message check in
this case.
Change-Id: I5f31039ca2a01a6a68fee8e0f3cf02c7e437b43e
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57395
Reviewed-by: Daecheol You <daecheol.you@samsung.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Initiate_MaitainCoherence would not trigger a writeback if
tbe.dataMaybeDirtyUpstream is set due to the assumption that
the upstream cache would writeback any dirty data. However this
is not the case if we use this action finalize a CleanUnique, e.g.:
- L1-A has data in SC
- L1-B has data in SD
- L2 has data in RUSD (L2 is an exclusive cache)
- L1-A sends CleanUnique to L2
- L2 invalidates L1-B and receives dirty data.
- L2 acks the CleanUnique; L1-A is now UC
- L2 has the dirty data but drops it because dataMaybeDirtyUpstream
- L1-A doesn't modify the data and eventually evicts it with WriteEvict
- Data from WriteEvicts are dropped at the HNF and we lose the line
This patch removes the tbe.dataMaybeDirtyUpstream check.
Instead it only skips the WriteBack if an upstream cache is in
SD state, when it's guaranteed it will writeback the dirty data.
JIRA: https://gem5.atlassian.net/browse/GEM5-1195
Change-Id: I6722bc25068b0c44afcf261abc8824f1d80c09f9
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57392
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Daecheol You <daecheol.you@samsung.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
WriteCleanFull can be requested for the cache line in SD state (e.g.
Local eviction of a cache line in SD_RSC state). In this case, the
requestor is the owner of the cache line,
but it doesn't have it with exclusive right.
Thus, 'ownerIsExcl == false' should be removed from the stale condition.
Change-Id: I4d34021ac31b2e8600c24689a03a3b8fa18aa1f7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58412
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Initiate_CopyBack_Stale removes the requestor from the sharer list.
However, if CBWrData_SC is the data response of stale WriteCleanFull,
the requestor should remain in the sharer list.
Thus, whether to send a Evict or not can be decided after the data
response arrives. For this, FinishCopyBack_Stale event was added as the
last event to handle Evict.
Change-Id: Ic3e3a1e4d74b24b9aa328b2ddfa817db44f24e4e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58413
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Tiago Muck <tiago.muck@arm.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
JIRA: https://gem5.atlassian.net/browse/GEM5-1185
Fixed an issue in which a CleanUnique responder would incorrectly
deallocate the cache block when handling an stale CU when the state
is UD_RU or UC_RU (thus incorrectly transitioning to RU).
The fix is to handle stale CUs similarly to stale WBs where we
override the dataValid TBE field to prevent the wrong state
transition.
This patch moves the stale code path to a separate transition
(similarly to stale WBs/Evicts) and moves the dataValid override to
Initiate_Request_Stale so it applies to all stale request types.
Notice now the stale field is also set on stale Comp_UC responses.
Additional minor change: CheckUpgrade_FromRU is the same as
CheckUpgrade_FromStore so it was removed.
Change-Id: I0a2cedcfde1dc30d67aa2c16d71b7470369c2b6e
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56810
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Meatboy 106 <garbage2collector@gmail.com>
Use SConsopts files local to individual domains to pull
non-foundational build code out of SConstruct. This greatly simplifies
SConstruct, and also makes it easier to find build configuration having
to do with particular pieces of gem5.
This change also converts some python level variables, all_protocols,
protocol_dirs, and slicc_includes, into the environment where the timing
of their initialization is more flexible.
Change-Id: Ie61ceb75ae9e5557cc400603c972a9582e99c1ea
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40872
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
This patch add a new Ruby cache coherence protocol based on Arm' AMBA5
CHI specification. The CHI protocol defines and implements two state
machine types:
- Cache_Controller: generic cache controller that can be configured as:
- Top-level L1 I/D cache
- A intermediate level (L2, L3, ...) private or shared cache
- A CHI home node (i.e. the point of coherence of the system and
has the global directory)
- A DMA requester
- Memory_Controller: implements a CHI slave node and interfaces with
gem5 memory controller. This controller has the functionality of a
Directory_Controller on the other Ruby protocols, except it doesn't
have a directory.
The Cache_Controller has multiple cache allocation/deallocation
parameters to control the clusivity with respect to upstream caches.
Allocation can be completely disabled to use Cache_Controller as a
DMA requester or as a home node without a shared LLC.
The standard configuration file configs/ruby/CHI.py provides a
'create_system' compatible with configs/example/fs.py and
configs/example/se.py and creates a system with private L1/L2 caches
per core and a shared LLC at the home nodes. Different cache topologies
can be defined by modifying 'create_system' or by creating custom
scripts using the structures defined in configs/ruby/CHI.py.
This patch also includes the 'CustomMesh' topology script to be used
with CHI. CustomMesh generates a 2D mesh topology with the placement
of components manually defined in a separate configuration file using
the --noc-config parameter.
The example in configs/example/noc_config/2x4.yaml creates a simple 2x4
mesh. For example, to run a SE mode simulation, with 4 cores,
4 mem ctnrls, and 4 home nodes (L3 caches):
build/ARM/gem5.opt configs/example/se.py \
--cmd 'tests/test-progs/hello/bin/arm/linux/hello' \
--ruby --num-cpus=4 --num-dirs=4 --num-l3caches=4 \
--topology=CustomMesh --noc-config=configs/example/noc_config/2x4.yaml
If one doesn't care about the component placement on the interconnect,
the 'Crossbar' and 'Pt2Pt' may be used and they do not require the
--noc-config option.
Additional authors:
Joshua Randall <joshua.randall@arm.com>
Pedro Benedicte <pedro.benedicteillescas@arm.com>
Tuan Ta <tuan.ta2@arm.com>
JIRA: https://gem5.atlassian.net/browse/GEM5-908
Change-Id: I856524b0afd30842194190f5bd69e7e6ded906b0
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42563
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>