This change, https://github.com/gem5/gem5/pull/205, mistakenly allocates
write buffer for clflush instruction when there's a cache miss. However,
clflush in gem5 is not a write instruction. Thus, the cache should
allocate miss buffer in this case.
When there is race between FwdGetX
and PUTX on owner. Owner in this case hands off
ownership to GetX requestor and PUTX still goes
through. But since owner has changed, state should
go back to M and PUTX is essentially trashed.
An Unblock to the Directory in this case will give an undefined
transition. I have added transitions which indicate that when
an Unblock is served to the Directory, it means that some kind
of ownership transfer has happened while a PUTX/PUTO was in
progress.
Change-Id: I37439b5a363417096030a0875a51c605bd34c127
After calling m5_dump_reset_stats(0,0) in a test program,
some statistics like
l1_controllers.L1Dcache.m_demand_hits,
l1_controllers.L1Dcache.m_demand_misses,
l1_controllers.L1Dcache.m_demand_accesses
were not getting reset in the newer stat dumps.
This one line patch fixes that. Changes were tested with
calling two m5_dump_reset_stats(0,0) in a row for a system
with 1 core, tested on both SE and FS.
Credits to Gabriel Busnot for finding the fix.
Change-Id: I19d75996fa53d31ef20f7b206024fd38dbeac643
This change, https://github.com/gem5/gem5/pull/205, mistakenly
allocates write buffer for clflush instruction when there's a
cache miss. However, clflush in gem5 is not a write instruction.
Thus, the cache should allocate miss buffer in this case.
Change-Id: I9c1c9b841159c4420567e9c929e71e4aa27d5c28
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This will hold the CHI Data Buffer Identifier (DBID) field.
The DBID allows a Completer of a transaction to provide its own
identifier for a transaction ID.
This new ID will be used as a TxnId field by a following
WriteData/CompData/CompAck response.
For now we only set it to the original txnId (identity mapping)
Change-Id: If30c5e1cafbe5a30073c7cd01d60bf41eb586cee
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The TxnId field of a CHI request has so far been unused (other than for
DVM transactions). With this patch we always initialize the field when
we extract a ruby request from the sequencer port.
According to specs (IHI0050F):
A 12-bit field is defined for the TxnID with the number of outstanding
transactions being limited to 1024. A Requester is permitted to reuse a
TxnID value after it has received either:
* All responses associated with a previous transaction that have used
the same value.
* A RetryAck response for a previous transaction that used the same value
Change-Id: Ie48f0fee99966339799ac50932d36b2a927b1c7d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Based on the CHIRequestType, it automatically tells if the
request has been originated from the sequencer
(CPU load/fetch/store)
Change-Id: I50fd116c8b1a995b1c37e948cd96db60c027fe66
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
At the moment an address value can only be used in the slicc code
to do TBE lookups but there is no way to add/subtract/divide/multiply
two addresses nor an address and an integer value.
This hinders the development of protocol specific code and
forces developers to place such code in shared
C++ structures
Change-Id: Ia184e793b6cd38f951f475a7cdf284f529972ccb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
At the moment it is possible to static_cast by pointer/reference only:
static_cast(type, "pointer", val) -> static_cast<type*>(val);
static_cast(type, "reference", val) -> static_cast<type&>(val);
With this patch it will also be possible to do something like
static_cast(type, "value", val) -> static_cast<type>(val);
Which is important when wishing to convert integer types into
custom onces and viceversa.
This patch is also deferring static_cast type check to C++
At the moment it is difficult to use the static_cast utility in slicc as
it tries to handle type checking in the language itself. This would
force us to explicitly define compatible types (like an Addr and an int
as an example). Rather than pushing the burden on us, we should always
allow a developer to use a static_cast in slicc and let the C++ compiler
complain if the generated code is not compatible
Change-Id: I0586b9224b1e41751a07d15e2d48a435061c2582
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Currently the MOESI_AMD_Base-directory transition for system level
atomics sends the response message before the atomic is performed. This
was likely done because atomics are supposed to return the value of the
data *before* the atomic is performed and by simply ordering the actions
this way that was taken care of.
With the new atomic log feature, the atomic values are pulled from the
log by the coalescer on the return path. Therefore, these actions can be
reordered. However, it is now necessary that the atomics be performed
before sending the response so that the log is populated and copied by
the response action. This should fix#253 .
Change-Id: Ie7e178f93990975367de2cc3e89e5ef9c9069241
Augmenting the DataBlock class with a change log structure to record the
effects of atomic operations on a data block and service these changes
if the atomic operations require return values.
Although the operations are atomic, the coalescer need not send unique
memory requests for each operation. Atomic operations within a wavefront
to the same address are now coalesced into a single memory request. The
response of this request carries all the necessary information to
provide the requesting lanes unique values as a result of their
individual atomic operations. This helps reduce contention for request
and response queues in simulation.
Previously, only the final value of the datablock after all atomic ops
to the same address was visible to the requesting waves. This change
corrects this behavior by allowing each wave to see the effect of this
individual atomic op is a return value is necessary.
DCT must be disabled when handling a ReadUnique where the copy need to
be upgraded.
Previously we were just asserting as it was assumed DCT is only enabled
for HNFs (which can "auto-upgrade"). However DCT may also be enabled for
intermediated levels of distributed shared caches above the HNFs.
Currently we generate these stats for all defined Events in the
protocol, which may generate too many stats that are never used. Though
these don't appear in the stats.txt file, they unnecessarily increases
simulation startup time and memory footprint.
This patch limits those stats to events with the "in_trans" and/or
"out_trans" properties. SLICC compiler then checks which combinations of
event+state are possible when generating the stats.
Also the possible level of detail for inTransLatHist was reduced.
Only the number of transactions for each event+initial+final state
combinations is now accounted. Latency histograms are only defined per
event type (similarly to outTransLatHist). This significantly reduces
the final file size for generated stats.
Marks which events signal the beginning of incoming and outgoing
transactions for generating inTransLatHist and outTransLatHist stats.
Change-Id: I90594a27fa01ef9cfface309971354b281308d22
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Generating these stats for all defined Events may generate too many
stats that are never used, which unnecessarily increases simulation
startup time and memory consumption.
This patch limits those stats to events with the "in_trans" and/or
"out_trans" properties. SLICC compiler then checks which combinations
of event+state are possible when generating the stats.
Also the possible level of detail for inTransLatHist was reduced.
Only the number of transactions for each event+initial+final state
combinations is now accounted. Latency histograms are only defined
per event type (similarly to outTransLatHist). This significantly
reduces the final file size for generated stats.
Change-Id: I29aaeb771436cc3f0ce7547a223d58e71d9cedcc
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Do not respond with SnpRespData_I when the line is still present
upstream.
Change-Id: I2592e5c6637cfc0e83042169a245837648276e61
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
DCT must be disabled when handling a ReadUnique where the copy
need to be upgraded.
Previously we were just asserting as it was assumed DCT is only enabled
for HNFs (which can "auto-upgrade"). However DCT may also be enabled
for intermediated levels of distributed shared caches above the HNFs.
Change-Id: I9e29142a8d2f59ea61c1d90cda6b00c19435d6b7
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
When an Evict request is received from upstream for a shared line
and the line is no longer cached locally (or on any other upstream
cache), we need to also send an Evict downstream. In this case we need
to wait until our outgoing Evict completes before completing the Evict
from upstream in order be able to resolve race conditions with incoming
snoops. E.g.: while our outgoing Evict is pending we may receive a
snoop requesting data, but we won't be able to complete this snoop if
we have already completed all upstream Evicts and we no longer have the
line.
Change-Id: I23ac4f0a9c4ddd81e2425376c8d1e1c7fb66d107
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Augmenting the DataBlock class with a change log structure to
record the effects of atomic operations on a data block and
service these changes if the atomic operations require return
values.
Although the operations are atomic, the coalescer need not
send unique memory requests for each operation. Atomic
operations within a wavefront to the same address are now
coalesced into a single memory request. The response of this
request carries all the necessary information to provide the
requesting lanes unique values as a result of their individual
atomic operations. This helps reduce contention for request
and response queues in simulation.
Previously, only the final value of the datablock after all
atomic ops to the same address was visible to the requesting
waves. This change corrects this behavior by allowing each wave
to see the effect of this individual atomic op is a return value
is necessary.
Change-Id: I639bea943afd317e45f8fa3bff7689f6b8df9395
When a linux kernel changes a page property, it flushes the related cache
lines. The kernel might change the page property before flushing the
cache lines. This results in the clflush might occur in an uncacheable region.
Currently, an uncacheable request must be a read or a write. However,
clflush request is neither of them.
This change aims to allow clflush requests to work on uncacheable regions.
Since there is no straightforward way to check if a packet is from a clflush
instruction, this change permits all Clean Invalidate Requests, which is
the type of request produced by clflush, to work on uncacheable regions.
Change-Id: Ib3ec01d9281d3dfe565a0ced773ed912edb32b8f
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
When xbar encounters the address error, print out the port trace in the
packet for user to debug if the port trace is enabled.
To gain the packet of the access, the parameter of findPort() function
is changed from AddrRange to PacketPtr.
When running gem5 with "--debug-flags=PortTrace", we can see the full
path of the unexpected access when xbar cannot find the destination of
the address.
This patch changes the data type used for image size from int
to uint64_t. Current version allows initializing AbstractMemory
types with a maximum binary size of 2GiB. This will be limiting
in many studies.
Change-Id: Iea3bbd525d4a67aa7cf606f6311aef66c9b4a52c
1. Add findPort(PacketPtr pkt) for getting the port trace from the Packet.
Keep the findPort(AddrRange addr_range) for recvMemBackdoorReq(...)
2. With the debug flag `PortTrace` enabled, user can see the full path of
the packet with the corresponding address when address error in xbar.
Change-Id: Iaf43ee2d7f8c46b9b84b2bc421a6bc3b02e01b3e
In the memory controller, MemCtrl::MemoryPort::recvFunctional,
when the functional request is satisfied by the ctrl-response queue,
correctly make the packet a response.
This change mirrors AbstractMemory::functionalAccess, which uses
Packet::makeResponse() after satisfying the request.
Change-Id: I47917062d3270915a97eed2c9fade66ba17019eb
Added a GLC atomic latency parameter (glc-atomic-latency) used when
enqueueing response messages regarding atomics directly performed in
the TCC. This latency is added in addition to the L2 response latency
(TCC_latency). This represents the latency of performing an atomic
within the L2.
With this change, the TCC response queue will receive enqueues with
varying latencies as GLC atomic responses will have this added GLC
atomic latency while data responses will not. To accommodate this in
light of the queue having strict FIFO ordering (which would be violated
here), this change also adds an optional parameter bypassStrictFIFO to
the SLICC enqueue function which allows overriding strict FIFO
requirements for individual messages on a case-by-case basis. This
parameter is only being used in the TCC's atomic response enqueue call.
Change-Id: Iabd52cbd2c0cc385c1fb3fe7bcd0cc64bdb40aac
Added support for performing non-SLC-set atomics in the TCC.
Previously, all atomics were being passed on by the TCC to the
directory. With this change, atomics will only be passed on if the SLC
bit is set or if the line isn't present or available in the TCC.
If a non-SLC atomic is passed on to the directory because it is not
present in the TCC, the atomic will be performed on the return path on
the Data event. To accommodate the directory not performing the atomic
in this case, this change also passes the SLC bit on to the directory.
The previously-named "Atomic" action has been renamed to
"AtomicPassOn", with the new "Atomic" corresponding to an atomic
performed directly in the TCC.
Change-Id: Ibf92f71ddceb38bd1b0da70b0a786cc4c3cf2669
This change adds a new file to m5out which is citations.bib.
This file will contain the citations to the papers which describe the
aspects of the gem5 simulator that the simulation uses. In other words,
each simulation configuration could generate a different bib file
referencing different works.
Each SimObject can now have a set of citations associated with it. After
the system is built (in `instantiate`), the citations.bib file is
created by parsing all SimObjects that have been instantiated and taking
the union of their associated citations.
This commit is not meant to add all citations, but to act as an example
for others to add more citations to gem5.
Change-Id: Icd5c46fd9ee44adbeec1fea162657f5716f7e5ef
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Added WIB (Waiting on Writethrough Ack; Will be Bypassed) state which
is transitioned to when a dirty line in the TCC is evicted in a
bypassed read. Previously, we were transitioning to invalid.
While a WI (Waiting on Writethrough Ack) state exists, transitions from
it on WBAck deallocates the TBE, which contains SLC bit information
needed to trigger the Bypass event when the read response from the
directory comes in.
Without this change, WB acknowledgements from the directory in read
bypass evicts (with the SLC bit set) were being treated as if they were
read responses, leading to an invalid transition panic.
Change-Id: I703c3fe8af0366856552bb677810cb1a8f2896de
Prior to this patch, when a memory controller was failing at sending a
response to AbstractController, it would not wakeup until the next
request. This patch gives the opportunity to Ruby models to notify
memory response buffer dequeue so that AbstractController can send a
retry request if necessary.
A dequeueMemRspQueue function has been added AbstractController to
automate the dequeue+notify operation.
Note that models that don't notify AbstractController will continue
working as before.
Change-Id: I261bb4593c126208c98825e54f538638d818d16b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67658
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
TracingExtension contains a stack recording the port names
passed through of the Packet. The target receiving the Packet
can dump out the whole path of this Packet for the debug purpose.
This mechanism can be enabled with the debug flag PortTrace.
Change-Id: Ic11e708b35fdddc4f4b786d91b35fd4def08948c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71538
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
This change updates the HBMCtrl such that both pseudo channels
can be in separate states (read or write) at the same time. In
addition, the controller queues are now always split in two
halves for both pseudo channels.
Change-Id: Ifb599e611ad99f6c511baaf245bad2b5c9210a86
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65491
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This change adds a single DDR5 memory inteface.
A DDR5 DIMM contains two physical channels. Therefore,
two instances of this interface should be used to model
a DDR5 DIMM. The configuration includes 3 different speed
bins models. The configuration is tested with different
types of memory traffic using the traffic generator and shows
performance similar to what is observed in existing
literature [1]. One of the key features of DDR5
"same bank refresh" is yet not supported in gem5, but is
expected to improve the performance of the DDR5 model.
[1] Exploration of DDR5 with the Open-Source Simulator DRAMSys.
Change-Id: I5856a10c8dcd92dbecc7fd4dcea0f674b2412dd7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/68257
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Write queue drain logic seems off currently. An event is scheduled if
the write queue is empty instead of non-empty. There is no check to see
if draining is complete when bus is in write mode. Finally the power
down check on drain always fails if DRAM powerdown is disabled.
This changeset reverses the drain conditional for the write queue to
schedule an event if the write queue is *not* empty and checks in the
event processing method that the queues are all empty so that
signalDrainDone can be called. Lastly the powerdown state is ignored if
DRAM powerdown is disabled. Powerdown is disabled in the GPU_VIPER
protocol by default. This changeset successfully drains and checkpoints
a GPUFS simulation using GPU_VIPER protocol.
Change-Id: I5459856a694c9054b28677049a06b99b9ad91bbb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69917
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
This change addresses an error in the compiler tests:
https://jenkins.gem5.org/job/compiler-checks/573/
For clang versions 6 through 10, as well as GCC 7,
in order to use the "filesystem" module, you must
include the experimental namespace. In all newer
versions, you can use the "filesystem" module as is.
Because of this, include guards to handle this. They include
"<experimental/filesystem>" for the older clang versions and
the "<filesystem>" for all other versions.
As opposed to checking by version, we now check if the
filesystem library has been defined before using it.
Change-Id: I8fb8d4eaa33f3edc29b7626f44b82ee66ffe72be
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69778
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>