Remove the prefetch_on_access and prefetch_on_pf_hit from BaseCache.
BasePrefetch no longer expects this params to exist in the parent.
Configurations that set these parameter using the cache object were
fixed.
Change-Id: I9ab6a545eaf930ee41ebda74e2b6b8bad0ca35a7
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
This patches decouples the prefetchers from the cache implementation
as the first step to allow using the classic prefetchers with ruby
caches. The prefetchers that need do cache lookups can do so using
the accessor object provided when the probes are notified. This may
also facilitate connecting the same prefetcher to multiple caches.
Related JIRA:
https://gem5.atlassian.net/browse/GEM5-457https://gem5.atlassian.net/browse/GEM5-1112
Change-Id: I4fee1a3613ae009fabf45d7b747e4582cad315ef
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
While it's plausible to define the cache_line_size as a 32-bit unsigned
int, the use of cache_line_size is way out of its original scope.
cache_line_size has been used to produce an address mask, which masking
out the offset bits from an address. For example, [1], [2], [3], and
[4]. However, since the cache_line_size is an "unsigned int", the type
of the value is not guaranteed to be 64-bit long. Subsequently, the bit
twiddling hacks in [1], [2], [3], and [4] produce 32-bit mask, i.e.,
0x00000000FFFFFFC0.
This behavior at least caused a problem in LLSC in RISC-V [5], where the
load reservation (LR) relies on the mask to produce the cache block
address. Two distinct 64-bit addresses can be mapped to the same cache
block using the above mask.
This patch explicitly defines cache_line_size as a 64-bit unsigned int
so the cache block mask can be produced correctly for 64-bit addresses.
[1]
3bdcfd6f7a/src/cpu/simple/atomic.hh (L147)
[2]
3bdcfd6f7a/src/cpu/simple/timing.hh (L224)
[3]
3bdcfd6f7a/src/cpu/o3/lsq_unit.cc (L241)
[4]
3bdcfd6f7a/src/cpu/minor/lsq.cc (L1425)
[5]
3bdcfd6f7a/src/arch/riscv/isa.cc (L787)
Added checks to ensure that atomics are not performed in the TCC when it
is configured as a write-through cache. Also added SLC bit overwrite to
ensure directory preforms atomics when there is a write-through TCC.
Change-Id: I4514e6c8022aeb7785f2c59871cd9acec8161ed8
A previous commit added BUILD_GPU guards to gpu coalescer models since
a related cache recorder commit added GPU support. This is no longer
needed since the cache recorder moved to using a vector of RubyPorts
instead of Sequencer/GPUCoalescer pointers. This commit removes
BUILD_GPU guards from the Ruby coalescer models
Change-Id: I23a7957d82524d6cd3483d22edfb35ac51796eca
Previously, the cache recorder used a vector of sequencer pointers to
access Ruby objects. A recent commit updated the cache recorder to also
maintain a vector of GPUCoalescer pointers in order for GPUs to support
flushin. This added redundant code to the cache recorder. This commit
replaces the sequencer and GPUCoalescer vectors with a vector of
RubyPort pointers so that the code does not contain redundant lines
Change-Id: Id5da33fb870f17bb9daef816cc43c0bcd70a8706
Earlier, GPU checkpointing was working only if a checkpoint was created
before the first kernel execution. This pull request adds support to
checkpoint in-between any two kernel calls. It does so by doing the
following.
- Adds flush support in the GPU_VIPER protocol
- Adds flush support in the GPUCoalescer
- Updates cache recorder to use the GPUCoalescer during simulation
cooldown and cache warmup times.
Ruby was recently updated to support flushes and warmup for GPUs. Since
this support uses the GPUCoalescer, non-GPU builds face a compile time
issue. This is because GPU code is not built for non-GPU builds. This
commit addes "#if BUILD_GPU" guards around the GPU-related code in
common files like AbstractController.hh, CacheRecorder.*, RubySystem.cc,
GPUCoalescer.hh, and VIPERCoalescer.hh. This support allows GPU builds
to use flushing while non-GPU builds compile without problems
Change-Id: If8ee4ff881fe154553289e8c00881ee1b6e3f113
Introduce far atomic operations in CHI protocol.
Three configuration parameters have been used to tune this behavior:
policy_type: sets the atomic policy to one of the described in our paper
atomic_op_latency: simulates the AMO ALU operation latency
comp_anr: configures the Atomic No return transaction to split
CompDBIDResp into two different messages DBIDResp and Comp
Change-Id: I087afad9ad9fcb9df42d72893c9e32ad5a5eb478
Previously, the cache recorder used the Sequencer to issue flush
requests and cache warmup requests. The GPU however uses GPUCoalescer to
access the cache, and not the Sequencer. This commit adds a GPUCoalescer
map to the cache recorder and uses it to send flushes and cache warmup
requests to any GPU caches in the system
Change-Id: I10490cf5e561c8559a98d4eb0550c62eefe769c9
This commit adds flush support to the GPU VIPER coherence protocol. The
L1 cache will now initiate a flush request if the packet it receives
is of type RubyRequestType_FLUSH. During the flush process, the L1 cache
will a request to L2 if its in either V or I state. L2 will issue a
flush request to the directory if its cache line is in the valid
state before invalidating its copy. The directory, on receiving this
request, writes data to memory and sends an ack back to the L2. L2
forwards this ack back to the L1, which then ends the flush by calling
the write callback
Change-Id: I9dfc0c7b71a1e9f6d5e9e6ed4977c1e6a3b5ba46
The GPU Coalescer does not contain cache cooldown and warmup support.
This commit updates the coalsecer to support cache cooldown during flush
and warmup during checkpoint restore.
Change-Id: I5459471dec20ff304fd5954af1079a7486ee860a
There are two overloaded-virtual issues reported by g++13.
1. Copy assignment and move assignment overload is hidden in the derived
class
[ CXX] src/mem/cache/replacement_policies/weighted_lru_rp.cc -> ALL/mem/cache/replacement_policies/weighted_lru_rp.o
In file included from src/mem/cache/base.hh:61,
from src/mem/cache/base.cc:46:
src/mem/cache/cache_blk.hh:172:5: error: ‘virtual gem5::CacheBlk& gem5::CacheBlk::operator=(gem5::CacheBlk&&)’ was hidden [-Werror=overloaded-virtual=]
172 | operator=(CacheBlk&& other)
| ^~~~~~~~
src/mem/cache/cache_blk.hh:518:19: note: by ‘gem5::TempCacheBlk& gem5::TempCacheBlk::operator=(const gem5::TempCacheBlk&)’
518 | TempCacheBlk& operator=(const TempCacheBlk&) = delete;
| ^~~~~~~~
In this case, we can exiplict using parent operator= to keep the
function overload.
2. Intended overload hidden in SystemC is reported as error.
In file included from src/systemc/ext/tlm_utils/simple_initiator_socket.h:24,
from src/systemc/tlm_bridge/gem5_to_tlm.hh:72,
from build/ALL/python/_m5/param_Gem5ToTlmBridge256.cc:17:
src/systemc/ext/tlm_utils/../tlm_core/2/sockets/initiator_socket.hh: In instantiation of ‘class tlm::tlm_base_initiator_socket<256, tlm::tlm_fw_transport_if<>, tlm::tlm_bw_transport_if<>, 1, sc_core::SC_ONE_OR_MORE_BOUND>’:
src/systemc/ext/tlm_utils/../tlm_core/2/sockets/initiator_socket.hh:185:7: required from ‘class tlm::tlm_initiator_socket<256, tlm::tlm_base_protocol_types, 1, sc_core::SC_ONE_OR_MORE_BOUND>’
src/systemc/ext/tlm_utils/simple_initiator_socket.h:37:7: required from ‘class tlm_utils::simple_initiator_socket_b<sc_gem5::Gem5ToTlmBridge<256>, 256, tlm::tlm_base_protocol_types, sc_core::SC_ONE_OR_MORE_BOUND>’
src/systemc/ext/tlm_utils/simple_initiator_socket.h:156:7: required from ‘class tlm_utils::simple_initiator_socket<sc_gem5::Gem5ToTlmBridge<256>, 256, tlm::tlm_base_protocol_types>’
src/systemc/tlm_bridge/gem5_to_tlm.hh:147:46: required from ‘class sc_gem5::Gem5ToTlmBridge<256>’
/usr/include/c++/13/type_traits:1411:38: required from ‘struct std::is_base_of<sc_gem5::Gem5ToTlmBridgeBase, sc_gem5::Gem5ToTlmBridge<256> >’
ext/pybind11/include/pybind11/detail/../detail/common.h:880:59: required from ‘struct pybind11::class_<sc_gem5::Gem5ToTlmBridge<256>, sc_gem5::Gem5ToTlmBridgeBase, std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete> >::is_valid_class_option<sc_gem5::Gem5ToTlmBridgeBase>’
ext/pybind11/include/pybind11/detail/../detail/common.h:719:35: required by substitution of ‘template<class ... Ts> using pybind11::detail::all_of = pybind11::detail::bool_constant<(Ts::value && ...)> [with Ts = {pybind11::class_<sc_gem5::Gem5ToTlmBridge<256>, sc_gem5::Gem5ToTlmBridgeBase, std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete> >::is_valid_class_option<sc_gem5::Gem5ToTlmBridgeBase>, pybind11::class_<sc_gem5::Gem5ToTlmBridge<256>, sc_gem5::Gem5ToTlmBridgeBase, std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete> >::is_valid_class_option<std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete> >}]’
ext/pybind11/include/pybind11/pybind11.h:1506:70: required from ‘class pybind11::class_<sc_gem5::Gem5ToTlmBridge<256>, sc_gem5::Gem5ToTlmBridgeBase, std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete> >’
build/ALL/python/_m5/param_Gem5ToTlmBridge256.cc:34:179: required from here
src/systemc/ext/tlm_utils/../core/sc_port.hh:125:18: error: ‘void sc_core::sc_port_b<IF>::bind(sc_core::sc_port_b<IF>&) [with IF = tlm::tlm_fw_transport_if<>]’ was hidden [-Werror=overloaded-virtual=]
125 | virtual void bind(sc_port_b<IF> &p) { sc_port_base::bind(p); }
| ^~~~
In file included from src/systemc/ext/tlm_utils/simple_initiator_socket.h:27:
src/systemc/ext/tlm_utils/../tlm_core/2/sockets/initiator_socket.hh:133:18: note: by ‘tlm::tlm_base_initiator_socket<256, tlm::tlm_fw_transport_if<>, tlm::tlm_bw_transport_if<>, 1, sc_core::SC_ONE_OR_MORE_BOUND>::bind’
133 | virtual void bind(bw_interface_type &ifs) { (get_base_export())(ifs); }
| ^~~~
src/systemc/ext/tlm_utils/../core/sc_port.hh:124:18: error: ‘void sc_core::sc_port_b<IF>::bind(IF&) [with IF = tlm::tlm_fw_transport_if<>]’ was hidden [-Werror=overloaded-virtual=]
124 | virtual void bind(IF &i) { sc_port_base::bind(i); }
| ^~~~
src/systemc/ext/tlm_utils/../tlm_core/2/sockets/initiator_socket.hh:133:18: note: by ‘tlm::tlm_base_initiator_socket<256, tlm::tlm_fw_transport_if<>, tlm::tlm_bw_transport_if<>, 1, sc_core::SC_ONE_OR_MORE_BOUND>::bind’
133 | virtual void bind(bw_interface_type &ifs) { (get_base_export())(ifs); }
| ^~~~
From the code comment, it's intended in SystemC header.
// The overloaded virtual is intended in SystemC, so we'll disable the warning.
// Please check section 9.3 of SystemC 2.3.1 release note for more details.
The issue is we should move the skip to the base class.
Change-Id: I6683919e594ffe1fb3b87ccca1602bffdb788e7d
With this PR our CHI implementation starts making use of the txnid and
DBID identifiers.
Note: we were already making use of the txnId for DVM messages to convey
the DVM address. This is still the case.
In the future we should realign the DVM logic so that the txnId is
solely used as a transaction identifier.
When there is race between FwdGetX
and PUTX on owner. Owner in this case hands off
ownership to GetX requestor and PUTX still goes
through. But since owner has changed, state should go back to M and PUTX
is essentially trashed.
An Unblock to the Directory in this case will give an undefined
transition. I have added transitions which indicate that when an Unblock
is served to the Directory, it means that some kind of ownership
transfer has happened while a PUTX/PUTO was in progress.
This change, https://github.com/gem5/gem5/pull/205, mistakenly allocates
write buffer for clflush instruction when there's a cache miss. However,
clflush in gem5 is not a write instruction. Thus, the cache should
allocate miss buffer in this case.
When there is race between FwdGetX
and PUTX on owner. Owner in this case hands off
ownership to GetX requestor and PUTX still goes
through. But since owner has changed, state should
go back to M and PUTX is essentially trashed.
An Unblock to the Directory in this case will give an undefined
transition. I have added transitions which indicate that when
an Unblock is served to the Directory, it means that some kind
of ownership transfer has happened while a PUTX/PUTO was in
progress.
Change-Id: I37439b5a363417096030a0875a51c605bd34c127
After calling m5_dump_reset_stats(0,0) in a test program,
some statistics like
l1_controllers.L1Dcache.m_demand_hits,
l1_controllers.L1Dcache.m_demand_misses,
l1_controllers.L1Dcache.m_demand_accesses
were not getting reset in the newer stat dumps.
This one line patch fixes that. Changes were tested with
calling two m5_dump_reset_stats(0,0) in a row for a system
with 1 core, tested on both SE and FS.
Credits to Gabriel Busnot for finding the fix.
Change-Id: I19d75996fa53d31ef20f7b206024fd38dbeac643
This change, https://github.com/gem5/gem5/pull/205, mistakenly
allocates write buffer for clflush instruction when there's a
cache miss. However, clflush in gem5 is not a write instruction.
Thus, the cache should allocate miss buffer in this case.
Change-Id: I9c1c9b841159c4420567e9c929e71e4aa27d5c28
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This will hold the CHI Data Buffer Identifier (DBID) field.
The DBID allows a Completer of a transaction to provide its own
identifier for a transaction ID.
This new ID will be used as a TxnId field by a following
WriteData/CompData/CompAck response.
For now we only set it to the original txnId (identity mapping)
Change-Id: If30c5e1cafbe5a30073c7cd01d60bf41eb586cee
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The TxnId field of a CHI request has so far been unused (other than for
DVM transactions). With this patch we always initialize the field when
we extract a ruby request from the sequencer port.
According to specs (IHI0050F):
A 12-bit field is defined for the TxnID with the number of outstanding
transactions being limited to 1024. A Requester is permitted to reuse a
TxnID value after it has received either:
* All responses associated with a previous transaction that have used
the same value.
* A RetryAck response for a previous transaction that used the same value
Change-Id: Ie48f0fee99966339799ac50932d36b2a927b1c7d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Based on the CHIRequestType, it automatically tells if the
request has been originated from the sequencer
(CPU load/fetch/store)
Change-Id: I50fd116c8b1a995b1c37e948cd96db60c027fe66
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
At the moment an address value can only be used in the slicc code
to do TBE lookups but there is no way to add/subtract/divide/multiply
two addresses nor an address and an integer value.
This hinders the development of protocol specific code and
forces developers to place such code in shared
C++ structures
Change-Id: Ia184e793b6cd38f951f475a7cdf284f529972ccb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
At the moment it is possible to static_cast by pointer/reference only:
static_cast(type, "pointer", val) -> static_cast<type*>(val);
static_cast(type, "reference", val) -> static_cast<type&>(val);
With this patch it will also be possible to do something like
static_cast(type, "value", val) -> static_cast<type>(val);
Which is important when wishing to convert integer types into
custom onces and viceversa.
This patch is also deferring static_cast type check to C++
At the moment it is difficult to use the static_cast utility in slicc as
it tries to handle type checking in the language itself. This would
force us to explicitly define compatible types (like an Addr and an int
as an example). Rather than pushing the burden on us, we should always
allow a developer to use a static_cast in slicc and let the C++ compiler
complain if the generated code is not compatible
Change-Id: I0586b9224b1e41751a07d15e2d48a435061c2582
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Currently the MOESI_AMD_Base-directory transition for system level
atomics sends the response message before the atomic is performed. This
was likely done because atomics are supposed to return the value of the
data *before* the atomic is performed and by simply ordering the actions
this way that was taken care of.
With the new atomic log feature, the atomic values are pulled from the
log by the coalescer on the return path. Therefore, these actions can be
reordered. However, it is now necessary that the atomics be performed
before sending the response so that the log is populated and copied by
the response action. This should fix#253 .
Change-Id: Ie7e178f93990975367de2cc3e89e5ef9c9069241
Augmenting the DataBlock class with a change log structure to record the
effects of atomic operations on a data block and service these changes
if the atomic operations require return values.
Although the operations are atomic, the coalescer need not send unique
memory requests for each operation. Atomic operations within a wavefront
to the same address are now coalesced into a single memory request. The
response of this request carries all the necessary information to
provide the requesting lanes unique values as a result of their
individual atomic operations. This helps reduce contention for request
and response queues in simulation.
Previously, only the final value of the datablock after all atomic ops
to the same address was visible to the requesting waves. This change
corrects this behavior by allowing each wave to see the effect of this
individual atomic op is a return value is necessary.
DCT must be disabled when handling a ReadUnique where the copy need to
be upgraded.
Previously we were just asserting as it was assumed DCT is only enabled
for HNFs (which can "auto-upgrade"). However DCT may also be enabled for
intermediated levels of distributed shared caches above the HNFs.
Currently we generate these stats for all defined Events in the
protocol, which may generate too many stats that are never used. Though
these don't appear in the stats.txt file, they unnecessarily increases
simulation startup time and memory footprint.
This patch limits those stats to events with the "in_trans" and/or
"out_trans" properties. SLICC compiler then checks which combinations of
event+state are possible when generating the stats.
Also the possible level of detail for inTransLatHist was reduced.
Only the number of transactions for each event+initial+final state
combinations is now accounted. Latency histograms are only defined per
event type (similarly to outTransLatHist). This significantly reduces
the final file size for generated stats.
Marks which events signal the beginning of incoming and outgoing
transactions for generating inTransLatHist and outTransLatHist stats.
Change-Id: I90594a27fa01ef9cfface309971354b281308d22
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Generating these stats for all defined Events may generate too many
stats that are never used, which unnecessarily increases simulation
startup time and memory consumption.
This patch limits those stats to events with the "in_trans" and/or
"out_trans" properties. SLICC compiler then checks which combinations
of event+state are possible when generating the stats.
Also the possible level of detail for inTransLatHist was reduced.
Only the number of transactions for each event+initial+final state
combinations is now accounted. Latency histograms are only defined
per event type (similarly to outTransLatHist). This significantly
reduces the final file size for generated stats.
Change-Id: I29aaeb771436cc3f0ce7547a223d58e71d9cedcc
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Do not respond with SnpRespData_I when the line is still present
upstream.
Change-Id: I2592e5c6637cfc0e83042169a245837648276e61
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
DCT must be disabled when handling a ReadUnique where the copy
need to be upgraded.
Previously we were just asserting as it was assumed DCT is only enabled
for HNFs (which can "auto-upgrade"). However DCT may also be enabled
for intermediated levels of distributed shared caches above the HNFs.
Change-Id: I9e29142a8d2f59ea61c1d90cda6b00c19435d6b7
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
When an Evict request is received from upstream for a shared line
and the line is no longer cached locally (or on any other upstream
cache), we need to also send an Evict downstream. In this case we need
to wait until our outgoing Evict completes before completing the Evict
from upstream in order be able to resolve race conditions with incoming
snoops. E.g.: while our outgoing Evict is pending we may receive a
snoop requesting data, but we won't be able to complete this snoop if
we have already completed all upstream Evicts and we no longer have the
line.
Change-Id: I23ac4f0a9c4ddd81e2425376c8d1e1c7fb66d107
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Augmenting the DataBlock class with a change log structure to
record the effects of atomic operations on a data block and
service these changes if the atomic operations require return
values.
Although the operations are atomic, the coalescer need not
send unique memory requests for each operation. Atomic
operations within a wavefront to the same address are now
coalesced into a single memory request. The response of this
request carries all the necessary information to provide the
requesting lanes unique values as a result of their individual
atomic operations. This helps reduce contention for request
and response queues in simulation.
Previously, only the final value of the datablock after all
atomic ops to the same address was visible to the requesting
waves. This change corrects this behavior by allowing each wave
to see the effect of this individual atomic op is a return value
is necessary.
Change-Id: I639bea943afd317e45f8fa3bff7689f6b8df9395
When a linux kernel changes a page property, it flushes the related cache
lines. The kernel might change the page property before flushing the
cache lines. This results in the clflush might occur in an uncacheable region.
Currently, an uncacheable request must be a read or a write. However,
clflush request is neither of them.
This change aims to allow clflush requests to work on uncacheable regions.
Since there is no straightforward way to check if a packet is from a clflush
instruction, this change permits all Clean Invalidate Requests, which is
the type of request produced by clflush, to work on uncacheable regions.
Change-Id: Ib3ec01d9281d3dfe565a0ced773ed912edb32b8f
Signed-off-by: Hoa Nguyen <hn@hnpl.org>