This patch augments the MESI_Three_Level Ruby protocol with hardware
transactional memory support.
The HTM implementation relies on buffering of speculative memory updates.
The core notifies the L0 cache controller that a new transaction has
started and the controller in turn places itself in transactional state
(htmTransactionalState := true).
When operating in transactional state, the usual MESI protocol changes
slightly. Lines loaded or stored are marked as part of a transaction's
read and write set respectively. If there is an invalidation request to
cache line in the read/write set, the transaction is marked as failed.
Similarly, if there is a read request by another core to a speculatively
written cache line, i.e. in the write set, the transaction is marked as
failed. If failed, all subsequent loads and stores from the core are
made benign, i.e. made into NOPS at the cache controller, and responses
are marked to indicate that the transactional state has failed. When the
core receives these marked responses, it generates a HtmFailureFault
with the reason for the transaction failure. Servicing this fault does
two things--
(a) Restores the architectural checkpoint
(b) Sends an HTM abort signal to the cache controller
The restoration includes all registers in the checkpoint as well as the
program counter of the instruction before the transaction started.
The abort signal is sent to the L0 cache controller and resets the
failed transactional state. It resets the transactional read and write
sets and invalidates any speculatively written cache lines. It also
exits the transactional state so that the MESI protocol operates as
usual.
Alternatively, if the instructions within a transaction complete without
triggering a HtmFailureFault, the transaction can be committed. The core
is responsible for notifying the cache controller that the transaction
is complete and the cache controller makes all speculative writes
visible to the rest of the system and exits the transactional state.
Notifting the cache controller is done through HtmCmd Requests which are
a subtype of Load Requests.
KUDOS:
The code is based on a previous pull request by Pradip Vallathol who
developed HTM and TSX support in Gem5 as part of his master’s thesis:
http://reviews.gem5.org/r/2308/index.html
JIRA: https://gem5.atlassian.net/browse/GEM5-587
Change-Id: Icc328df93363486e923b8bd54f4d77741d8f5650
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30319
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There are race conditions while running several benchmarks, where
the DMA engine and the CorePair simultaneously send requests for the
same block. This patch fixes two scenarios
(a) If the request from the DMA engine arrives before the one from the
CorePair, the directory controller records it as a pending request.
However, once the DMA request is serviced, the directory doesn't check
for pending requests. The CorePair, consequently, never sees a response
to its request and this results in a Deadlock.
Added call to wakeUpDependents in the transition from BDR_Pm to U
Added call to wakeUpDependents in the transition from BDW_P to U
(b) If the request from the CorePair is being serviced by the directory
and the DMA requests for the same block, this causes an invalid
transition because the current coherence doesn't take care of this
scenario.
Added transition state where the requests from DMA are added to the
stall buffer.
Updated B to U CoreUnblock transition to check all buffers, as the DMA
requests were being placed later in the stall buffer than was being checked
Change-Id: I5a76efef97723bc53cf239ea7e112f84fc874ef8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31996
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Adds support for device memories in the system and RubySystem classes.
Devices may register memory ranges with the system class and packets
which originate from the device MasterID will update the device memory
in Ruby. In RubySystem functional access is updated to keep the packets
within the Ruby network they originated from.
Change-Id: I47850df1dc1994485d471ccd9da89e8d88eb0d20
JIRA: https://gem5.atlassian.net/browse/GEM5-470
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29653
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
AbstractController sends requests using a QueuedMasterPort which has an
implicit buffer which is unbounded. Remove this by changing the port to
a MasterPort and implement a retry mechanism for AbstractController.
Although the request remains in the MessageBuffer if a retry is needed,
the additional retry logic optimizes serviceMemoryQueue slightly and
prevents the DRAMCtrl retry stats from being incorrect due to multiple
calls to sendTimingReq.
Change-Id: I8c592af92a1a499a418f34cfee16dd69d84803ad
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28387
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch addresses multiple cases:
- When a controller has read/write permissions while others have read
only permissions, the one with r/w permissions performs the read as
the others may have stale data
- When controllers only have lines with stale or busy access permissions,
a valid copy of the line may be in a message in transit in the network
or in a message buffer (not seen by the controller yet). In this case,
we forward the functional request accordingly.
- Sequencer messages should not accept functional reads
- Functional writes also update the packet data on the sequencer
outstanding request lists and the cpu-side response queue.
Change-Id: I6b0656f1a2b81d41bdcf6c783dfa522a77393981
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/22022
Tested-by: Gem5 Cloud Project GCB service account <345032938727@cloudbuild.gserviceaccount.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: John Alsop <johnathan.alsop@amd.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Calls to queueMemoryRead and queueMemoryWrite do not consider the size
of the queue between ruby directories and DRAMCtrl which causes infinite
buffering in the queued port between the two. This adds a MessageBuffer
in between which uses enqueues in SLICC and is therefore size checked
before any SLICC transaction pushing to the buffer can occur, removing
the infinite buffering between the two.
Change-Id: Iedb9070844e4f6c8532a9c914d126105ec98d0bc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27427
Tested-by: Gem5 Cloud Project GCB service account <345032938727@cloudbuild.gserviceaccount.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Add support in Ruby to use all replacement policies in Classic.
Furthermore, if new replacement policies are added to the
Classic system, the Ruby system will recognize new policies
without any other changes in Ruby system. The following list
all the major changes:
* Make Ruby cache entries (AbstractCacheEntry) inherit from
Classic cache entries (ReplaceableEntry). By doing this,
replacement policies can use cache entries from Ruby caches.
AccessPermission and print function are moved from
AbstractEntry to AbstractCacheEntry, so AbstractEntry is no
longer needed.
* DirectoryMemory and all SLICC files are changed to use
AbstractCacheEntry as their cache entry interface. So do the
python files in mem/slicc/ast which check the entry
interface.
* "main='false'" argument is added to the protocol files where
the DirectoryEntry is defined. This change helps
differentiate DirectoryEntry from CacheEntry because they are
both the instances of AbstractCacheEntry now.
* Use BaseReplacementPolicy in Ruby caches instead of
AbstractReplacementPolicy so that Ruby caches will recognize
the replacement policies from Classic.
* Add getLastAccess() and useOccupancy() function to Classic
system so that Ruby caches can use them. Move lastTouchTick
to ReplacementData struct because it's needed by
getLastAccess() to return the correct value.
* Add a 2-dimensional array of ReplacementData in Ruby caches
to store information for different replacement policies. Note
that, unlike Classic caches, where policy information is
stored in cache entries, the policy information needs to be
stored in a new 2-dimensional array. This is due to Ruby
caches deleting the cache entry every time the corresponding
cache line get evicted.
Change-Id: Idff6fdd2102a552c103e9d5f31f779aae052943f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20879
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Removed the icache/dcache hit latency parameters from the Sequencer.
They were replaced by the mandatory queue enqueue latency that is now
defined by the top-level cache controller. By default, the latency is
defined by the mandatory_queue_latency parameter. When the latency
depends on specific protocol states or on the request type, the protocol
may override the mandatoryQueueLatency function.
Change-Id: I72e57a7ea49501ef81dc7f591bef14134274647c
Signed-off-by: Tiago Muck <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18413
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
MemObject doesn't provide anything beyond its base ClockedObject any
more, so this change removes it from most inheritance hierarchies.
Occasionally MemObject is replaced with SimObject when I was fairly
confident that the extra functionality of ClockedObject wasn't needed.
Change-Id: Ic014ab61e56402e62548e8c831eb16e26523fdce
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18289
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Gabe Black <gabeblack@google.com>
The importer in Python 3 doesn't like the way we import SimObjects
from the global namespace. Convert the existing SimObject declarations
to import from m5.objects. As a side-effect, this makes these files
consistent with configuration files.
Change-Id: I11153502b430822130722839e1fa767b82a027aa
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/15981
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
A packet queue is typically used to hold on to packets that are
schedules to be sent in the future or when they need to queue behind
younger packets that have been sent out yet. Due to memory order
requirements, some MemObjects need to maintain the order for packet
(mostly responses) that reference the same cache block.
Prior to this patch the ordering requirements where determined when
the packet was scheduled to be sent. This patch moves the parameter to
the constructor.
Change-Id: Ieb4d94e86bc7514f5036b313ec23ea47dd653164
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/15555
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Packet::checkFunctional also wrote data to/from the packet depending
on if it was read/write, respectively, which the 'check' in the name
would suggest otherwise. This renames it to doFunctional, which is
more suggestive. It also renames any function called checkFunctional
which calls Packet::checkFunctional. These are
- Bridge::BridgeMasterPort::checkFunctional
- calls Packet::checkFunctional
- MSHR::checkFunctional
- calls Packet::checkFunctional
- MSHR::TargetList::checkFunctional
- calls Packet::checkFunctional
- Queue<>::checkFunctional
(of src/mem/cache/queue.hh, not src/cpu/minor/buffers.h)
- Instantiated with Queue<WriteQueueEntry> and Queue<MSHR>
- WriteQueueEntry
- calls Packet::checkFunctional
- WriteQueueEntry::TargetList
- calls Packet::checkFunctional
- MemDelay::checkFunctional
- calls QueuedSlavePort/QueuedMasterPort::checkFunctional
- Packet::checkFunctional
- PacketQueue::checkFunctional
- calls Packet::checkFunctional
- QueuedSlavePort::checkFunctional
- calls PacketQueue::doFunctional
- QueuedMasterPort::checkFunctional
- calls PacketQueue::doFunctional
- SerialLink::SerialLinkMasterPort::checkFunctional
- calls Packet::doFunctional
Change-Id: Ieca2579c020c329040da053ba8e25820801b62c5
Reviewed-on: https://gem5-review.googlesource.com/11810
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
This patch is changing the underlying type for RequestPtr from Request*
to shared_ptr<Request>. Having memory requests being managed by smart
pointers will simplify the code; it will also prevent memory leakage and
dangling pointers.
Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/10996
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
With this patch a gem5 System will store more info about its Masters.
While it was previously keeping track of the Master name and Master ID
only, it is now adding a per-Master pointer to the SimObject related to
the Master.
This will make it possible for a client to query a System for a Master
using either the master's name or the master's pointer.
Change-Id: I8b97d328a65cd06f329e2cdd3679451c17d2b8f6
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/9781
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Standardize all header guards in the mem directory according to the most
frequent patterns. In general they have the form:
mem: __FOLDER_TREE_FILE_NAME_HH__
ruby: __FOLDER_TREE_FILENAME_HH__
Change-Id: I983853e292deb302becf151bf0e970057dc24774
Reviewed-on: https://gem5-review.googlesource.com/7881
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Ruby has no support for atomic_noncaching accesses, which prevents using
it with kvm-cpu. This patch fixes this by directly forwarding atomic
requests from the ruby port/sequencer to the corresponding directory
based on the destination address of the packet.
Change-Id: I0b4928bfda44fd9e5e48583c51d1ea422800da2d
Reviewed-on: https://gem5-review.googlesource.com/5601
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
These files aren't a collection of miscellaneous stuff, they're the
definition of the Logger interface, and a few utility macros for
calling into that interface (panic, warn, etc.).
Change-Id: I84267ac3f45896a83c0ef027f8f19c5e9a5667d1
Reviewed-on: https://gem5-review.googlesource.com/6226
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Maintainer: Gabe Black <gabeblack@google.com>
Previously the directory covered a flat address range that always
started from address 0. This change adds a vector of address ranges
with interleaving and hashing that each directory keeps track of and
the necessary flexibility to support systems with non continuous
memory ranges.
Change-Id: I6ea1c629bdf4c5137b7d9c89dbaf6c826adfd977
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/2903
Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Fixing an issue with regStats not calling the parent class method
for most SimObjects in Gem5. This causes issues if one adds new
stats in the base class (since they are never initialized properly!).
Change-Id: Iebc5aa66f58816ef4295dc8e48a357558d76a77c
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Allow usage of packet class in ruby for convenience purposes. This may be
used to access members of the packet/request class (e.g., via helper
functions) and/or push protocol specific information to the packets
SenderState without needing to modify SLICC types and protocols in multiple
locations.
Ruby's controller block_on behavior aimed to block MessageBuffer requests into
SLICC controllers when a Locked_RMW was in flight. Unfortunately, this
functionality only partially works: When non-Locked_RMW memory accesses are
issued to the sequencer to an address with an in-flight Locked_RMW, the
sequencer may pass those accesses through to the controller. At the controller,
a number of incorrect activities can occur depending on the protocol. In
MOESI_hammer, for example, an intermediate IFETCH will cause an L1D to L2
transfer, which cannot be serviced, because the block_on functionality blocks
the trigger queue, resulting in a deadlock. Further, if an intermediate store
arrives (e.g. from a separate SMT thread), the sequencer allows the request
through to the controller, and the atomicity of the Locked_RMW may be broken.
To avoid these problems, disallow the Sequencer from passing any memory
accesses to the controller besides Locked_RMW_Write when a Locked_RMW is in-
flight.
Add 4 power states to the ClockedObject, provides necessary access functions
to check and update the power state. Default power state is UNDEFINED, it is
responsibility of the respective simulation model to provide the startup state
and any other logic for state change.
Add number of transition stat.
Add distribution of time spent in clock gated state.
Add power state residency stat.
Add dump call back function to allow stats update of distribution and residency
stats.
Recent changes to memory access queuing allocate requests for packets sent to
memory controllers, but did not free the requests. Delete them to avoid leaks.
Changeset 4872dbdea907 replaced Address by Addr, but did not make changes to
print statements. So the addresses which were being printed in hex earlier
along with their line address, were now being printed in decimals. This patch
adds a function printAddress(Addr) that can be used to print the address in hex
along with the lines address. This function has been put to use in some of the
places. At other places, change has been made to print just the address in
hex.
This patch changes MessageBuffer and TimerTable, two structures used for
buffering messages by components in ruby. These structures would no longer
maintain pointers to clock objects. Functions in these structures have been
changed to take as input current time in Tick. Similarly, these structures
will not operate on Cycle valued latencies for different operations. The
corresponding functions would need to be provided with these latencies by
components invoking the relevant functions. These latencies should also be
in Ticks.
I felt the need for these changes while trying to speed up ruby. The ultimate
aim is to eliminate Consumer class and replace it with an EventManager object in
the MessageBuffer and TimerTable classes. This object would be used for
scheduling events. The event itself would contain information on the object and
function to be invoked.
In hindsight, it seems I should have done this while I was moving away from use
of a single global clock in the memory system. That change led to introduction
of clock objects that replaced the global clock object. It never crossed my
mind that having clock object pointers is not a good design. And now I really
don't like the fact that we have separate consumer, receiver and sender
pointers in message buffers.