There is a flow of packets as so:
WriteResp -> WriteReq -> WriteCompleteResp
These packets share some variables, in particular senderState and a
status vector.
One issue was the WriteResp packet decremented the status vector, which
was used by the WriteCompleteResp packets to determine when to handle
the global memory response. This could lead to multiple
WriteCompleteResp packets attempting to handle the global memory
response.
Because of that, the WriteCompleteResp packets needed to handle the
status vector. this patch moves WriteCompleteResp packet handling back
into ComputeUnit::DataPort::processMemRespEvent from
ComputeUnit::DataPort::recvTimingResp. This helps remove some redundant
code.
This patch has the WriteResp packet return without doing any status
vector handling, and without deleting the senderState, which had
previously caused a segfault.
Another issue was WriteCompleteResp packets weren't being issued for
each active lane, as the coalesced request was being issued too early.
In order to fix that, we have to ensure every active lane puts their
request into their applicable coalesced request before issuing the
coalesced request. Because of that change, we change the issuing of
CoalescedRequests from GPUCoalescer::coalescePacket to
GPUCoalescer::completeIssue.
That change involves adding a new variable to store the
CoalescedRequests that are created in the calls to coalescePacket. This
variable is a map from instruction sequence number to coalesced
requests.
Additionally, the WriteCompleteResp packet was attempting to access
physical memory in hitCallback while not having any data, which
caused a crash. This can be resolved either by not allowing
WriteCompleteResp packets to access memory, or by copying the data
from the WriteReq packet. This patch denies WriteCompleteResp packets
memory access in hitCallback.
Finally, in VIPERCoalescer::writeCompleteCallback there was a map
that held the WriteComplete packets, but no packets were ever being
removed. This patch removes packets that match the address that was
passed in to the function.
Change-Id: I9a064a0def2bf6c513f5295596c56b1b652b0ca4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33656
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The create() method on Params structs usually instantiate SimObjects
using a constructor which takes the Params struct as a parameter
somehow. There has been a lot of needless variation in how that was
done, making it annoying to pass Params down to base classes. Some of
the different forms were:
const Params &
Params &
Params *
const Params *
Params const*
This change goes through and fixes up every constructor and every
create() method to use the const Params & form. We use a reference
because the Params struct should never be null. We use const because
neither the create method nor the consuming object should modify the
record of the parameters as they came in from the config. That would
make consuming them not idempotent, and make it impossible to tell what
the actual simulation configuration was since it would change from any
user visible form (config script, config.ini, dot pdf output).
Change-Id: I77453cba52fdcfd5f4eec92dfb0bddb5a9945f31
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35938
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This change replaces the __attribute__ syntax with the now standard [[]]
syntax. It also reorganizes compiler.hh so that all special macros have
some explanatory text saying what they do, and each attribute which has a
standard version can use that if available and what version of c++ it's
standard in is put in a comment.
Also, the requirements as far as where you put [[]] style attributes are
a little more strict than the old school __attribute__ style. The use of
the attribute macros was updated to fit these new, more strict
requirements.
Change-Id: Iace44306a534111f1c38b9856dc9e88cd9b49d2a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35219
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch augments the MESI_Three_Level Ruby protocol with hardware
transactional memory support.
The HTM implementation relies on buffering of speculative memory updates.
The core notifies the L0 cache controller that a new transaction has
started and the controller in turn places itself in transactional state
(htmTransactionalState := true).
When operating in transactional state, the usual MESI protocol changes
slightly. Lines loaded or stored are marked as part of a transaction's
read and write set respectively. If there is an invalidation request to
cache line in the read/write set, the transaction is marked as failed.
Similarly, if there is a read request by another core to a speculatively
written cache line, i.e. in the write set, the transaction is marked as
failed. If failed, all subsequent loads and stores from the core are
made benign, i.e. made into NOPS at the cache controller, and responses
are marked to indicate that the transactional state has failed. When the
core receives these marked responses, it generates a HtmFailureFault
with the reason for the transaction failure. Servicing this fault does
two things--
(a) Restores the architectural checkpoint
(b) Sends an HTM abort signal to the cache controller
The restoration includes all registers in the checkpoint as well as the
program counter of the instruction before the transaction started.
The abort signal is sent to the L0 cache controller and resets the
failed transactional state. It resets the transactional read and write
sets and invalidates any speculatively written cache lines. It also
exits the transactional state so that the MESI protocol operates as
usual.
Alternatively, if the instructions within a transaction complete without
triggering a HtmFailureFault, the transaction can be committed. The core
is responsible for notifying the cache controller that the transaction
is complete and the cache controller makes all speculative writes
visible to the rest of the system and exits the transactional state.
Notifting the cache controller is done through HtmCmd Requests which are
a subtype of Load Requests.
KUDOS:
The code is based on a previous pull request by Pradip Vallathol who
developed HTM and TSX support in Gem5 as part of his master’s thesis:
http://reviews.gem5.org/r/2308/index.html
JIRA: https://gem5.atlassian.net/browse/GEM5-587
Change-Id: Icc328df93363486e923b8bd54f4d77741d8f5650
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30319
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Adds support for device memories in the system and RubySystem classes.
Devices may register memory ranges with the system class and packets
which originate from the device MasterID will update the device memory
in Ruby. In RubySystem functional access is updated to keep the packets
within the Ruby network they originated from.
Change-Id: I47850df1dc1994485d471ccd9da89e8d88eb0d20
JIRA: https://gem5.atlassian.net/browse/GEM5-470
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29653
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Previously, with HSAIL, we were guaranteed by the HSA specification
that the GPU will never issue unaligned accesses. However, now
that we are directly running GCN this is no longer true.
Accordingly, this commit adds support for unaligned accesses.
Moreover, to reduce the replication of nearly identical
code for the different request types, I also added new helper
functions that are called by all the different memory request
producing instruction types in op_encodings.hh.
Adding support for unaligned instructions requires changing
the statusBitVector used to track the status of the memory
requests for each lane from a bit per lane to an int per lane.
This is necessary because an unaligned access may span multiple
cache lines. In the worst case, each lane may span multiple
cache lines. There are corresponding changes in the files that
use the statusBitVector.
Change-Id: I319bf2f0f644083e98ca546d2bfe68cf87a5f967
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29920
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch addresses multiple cases:
- When a controller has read/write permissions while others have read
only permissions, the one with r/w permissions performs the read as
the others may have stale data
- When controllers only have lines with stale or busy access permissions,
a valid copy of the line may be in a message in transit in the network
or in a message buffer (not seen by the controller yet). In this case,
we forward the functional request accordingly.
- Sequencer messages should not accept functional reads
- Functional writes also update the packet data on the sequencer
outstanding request lists and the cpu-side response queue.
Change-Id: I6b0656f1a2b81d41bdcf6c783dfa522a77393981
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/22022
Tested-by: Gem5 Cloud Project GCB service account <345032938727@cloudbuild.gserviceaccount.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: John Alsop <johnathan.alsop@amd.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
While the up-to-date data may reside in any agent of Ruby's memory
hierarchy, there's an optional backing store in Ruby that provides
a 'correct' view of the physical memory. When it is enabled by the
user, every Ruby memory access will update this global memory view
as well upon finishing.
The issue is that Ruby's atomic access, used in fast-forward, does
not currently access the backing store, leading to data
incorrectness. More specifically, at the very beginning stage of the
simulation, a loader loads the program into the backing store using
functional accesses. Then the program starts execution with
fast-forward enabled, using atomic accesses for faster simulation. But
because atomic access only accesses the real memory hierarchy, the CPU
fetches incorrect instructions.
The fix is simple. Just make Ruby's atomic access update the backing
store as well as the real physical memory.
Change-Id: I2541d923e18ea488d383097ca7abd4124e47e59b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/26343
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Onur Kayıran <onur.kayiran@amd.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Tested-by: kokoro <noreply+kokoro@google.com>
MemObject doesn't provide anything beyond its base ClockedObject any
more, so this change removes it from most inheritance hierarchies.
Occasionally MemObject is replaced with SimObject when I was fairly
confident that the extra functionality of ClockedObject wasn't needed.
Change-Id: Ic014ab61e56402e62548e8c831eb16e26523fdce
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18289
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Gabe Black <gabeblack@google.com>
This patch is changing the underlying type for RequestPtr from Request*
to shared_ptr<Request>. Having memory requests being managed by smart
pointers will simplify the code; it will also prevent memory leakage and
dangling pointers.
Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/10996
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Ruby has no support for atomic_noncaching accesses, which prevents using
it with kvm-cpu. This patch fixes this by directly forwarding atomic
requests from the ruby port/sequencer to the corresponding directory
based on the destination address of the packet.
Change-Id: I0b4928bfda44fd9e5e48583c51d1ea422800da2d
Reviewed-on: https://gem5-review.googlesource.com/5601
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Ruby has no support for cache maintenace operations. As a workaround,
after printing a warning, we treat them as no-ops in the memory system
and respond immediately without handling them. There should be
workarounds in the memory system already that allow execution to
proceed without the requirement for cache maintenance operations.
Change-Id: I125ee4fa37b674c636d87f2d9205bbc1a74da101
Reviewed-on: https://gem5-review.googlesource.com/5057
Reviewed-by: Jieming Yin <bjm419@gmail.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
There are cases where we want to put boot ROMs on the PIO bus. Ruby
currently doesn't support functional accesses to such memories since
functional accesses are always assumed to go to physical memory. Add
the required support for routing functional accesses to the PIO bus.
Change-Id: Ia5b0fcbe87b9642bfd6ff98a55f71909d1a804e3
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Brad Beckmann <brad.beckmann@amd.com>
Reviewed-by: Michael LeBeane <michael.lebeane@amd.com>
This patch changes the name of a bunch of packet flags and MSHR member
functions and variables to make the coherency protocol easier to
understand. In addition the patch adds and updates lots of
descriptions, explicitly spelling out assumptions.
The following name changes are made:
* the packet memInhibit flag is renamed to cacheResponding
* the packet sharedAsserted flag is renamed to hasSharers
* the packet NeedsExclusive attribute is renamed to NeedsWritable
* the packet isSupplyExclusive is renamed responderHadWritable
* the MSHR pendingDirty is renamed to pendingModified
The cache states, Modified, Owned, Exclusive, Shared are also called
out in the cache and MSHR code to make it easier to understand.
This patch allows the ruby random tester to use ruby ports that may only
support instr or data requests. This patch is similar to a previous changeset
(8932:1b2c17565ac8) that was unfortunately broken by subsequent changesets.
This current patch implements the support in a more straight-forward way.
Since retries are now tested when running the ruby random tester, this patch
splits up the retry and drain check behavior so that RubyPort children, such
as the GPUCoalescer, can perform those operations correctly without having to
duplicate code. Finally, the patch also includes better DPRINTFs for
debugging the tester.
In RubyPort::ruby_eviction_callback, prior changes fixed a memory leak caused
by instantiating separate packets for each port that the eviction was forwarded
to. That change, however, left the instantiated request to also leak. Allocate
it on the stack to avoid the leak.
This patch changes MessageBuffer and TimerTable, two structures used for
buffering messages by components in ruby. These structures would no longer
maintain pointers to clock objects. Functions in these structures have been
changed to take as input current time in Tick. Similarly, these structures
will not operate on Cycle valued latencies for different operations. The
corresponding functions would need to be provided with these latencies by
components invoking the relevant functions. These latencies should also be
in Ticks.
I felt the need for these changes while trying to speed up ruby. The ultimate
aim is to eliminate Consumer class and replace it with an EventManager object in
the MessageBuffer and TimerTable classes. This object would be used for
scheduling events. The event itself would contain information on the object and
function to be invoked.
In hindsight, it seems I should have done this while I was moving away from use
of a single global clock in the memory system. That change led to introduction
of clock objects that replaced the global clock object. It never crossed my
mind that having clock object pointers is not a good design. And now I really
don't like the fact that we have separate consumer, receiver and sender
pointers in message buffers.
This patch eliminates the type Address defined by the ruby memory system.
This memory system would now use the type Addr that is in use by the
rest of the system.
This patch was created by Bihn Pham during his internship at AMD.
There is no need to delay hit callback response messages by a cycle because
the response latency is already incurred in the Ruby protocol. This ensures
correct timing of memory instructions.
This is another step in the process of removing global variables
from Ruby to enable multiple RubySystem instances in a single simulation.
With possibly multiple RubySystem objects, we can no longer use a global
variable to find "the" RubySystem object. Instead, each Ruby component
has to carry a pointer to the RubySystem object to which it belongs.
The drain() call currently passes around a DrainManager pointer, which
is now completely pointless since there is only ever one global
DrainManager in the system. It also contains vestiges from the time
when SimObjects had to keep track of their child objects that needed
draining.
This changeset moves all of the DrainState handling to the Drainable
base class and changes the drain() and drainResume() calls to reflect
this. Particularly, the drain() call has been updated to take no
parameters (the DrainManager argument isn't needed) and return a
DrainState instead of an unsigned integer (there is no point returning
anything other than 0 or 1 any more). Drainable objects should return
either DrainState::Draining (equivalent to returning 1 in the old
system) if they need more time to drain or DrainState::Drained
(equivalent to returning 0 in the old system) if they are already in a
consistent state. Returning DrainState::Running is considered an
error.
Drain done signalling is now done through the signalDrainDone() method
in the Drainable class instead of using the DrainManager directly. The
new call checks if the state of the object is DrainState::Draining
before notifying the drain manager. This means that it is safe to call
signalDrainDone() without first checking if the simulator has
requested draining. The intention here is to reduce the code needed to
implement draining in simple objects.
Draining is currently done by traversing the SimObject graph and
calling drain()/drainResume() on the SimObjects. This is not ideal
when non-SimObjects (e.g., ports) need draining since this means that
SimObjects owning those objects need to be aware of this.
This changeset moves the responsibility for finding objects that need
draining from SimObjects and the Python-side of the simulator to the
DrainManager. The DrainManager now maintains a set of all objects that
need draining. To reduce the overhead in classes owning non-SimObjects
that need draining, objects inheriting from Drainable now
automatically register with the DrainManager. If such an object is
destroyed, it is automatically unregistered. This means that drain()
and drainResume() should never be called directly on a Drainable
object.
While implementing the new functionality, the DrainManager has now
been made thread safe. In practice, this means that it takes a lock
whenever it manipulates the set of Drainable objects since SimObjects
in different threads may create Drainable objects
dynamically. Similarly, the drain counter is now an atomic_uint, which
ensures that it is manipulated correctly when objects signal that they
are done draining.
A nice side effect of these changes is that it makes the drain state
changes stricter, which the simulation scripts can exploit to avoid
redundant drains.
The drain state enum is currently a part of the Drainable
interface. The same state machine will be used by the DrainManager to
identify the global state of the simulator. Make the drain state a
global typed enum to better cater for this usage scenario.
WriteInvalidateReq ensures that a whole-line write does not incur the
cost of first doing a read exclusive, only to later overwrite the
data. This patch splits the existing WriteInvalidateReq into a
WriteLineReq, which is done locally, and an InvalidateReq that is sent
out throughout the memory system. The WriteLineReq re-uses the normal
WriteResp.
The change allows us to better express the difference between the
cache that is performing the write, and the ones that are merely
invalidating. As a consequence, we no longer have to rely on the
isTopLevel flag. Moreover, the actual memory in the system does not
see the intitial write, only the writeback. We were marking the
written line as dirty already, so there is really no need to also push
the write all the way to the memory.
The overall flow of the write-invalidate operation remains the same,
i.e. the operation is only carried out once the response for the
invalidate comes back. This patch adds the InvalidateResp for this
very reason.
This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.
The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.
The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.
Previously, the user would have to manually set access_backing_store=True
on all RubyPorts (Sequencers) in the config files.
Now, instead there is one global option that each RubyPort checks on
initialization.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
This patch aligns how the response routing is done in the RubyPort,
using the SenderState for both memory and I/O accesses. Before this
patch, only the I/O used the SenderState, whereas the memory accesses
relied on the src field in the packet. With this patch we shift to
using SenderState in both cases, thus not relying on the src field any
longer.
Ruby's functional accesses are not guaranteed to succeed as of now. While
this is not a problem for the protocols that are currently in the mainline
repo, it seems that coherence protocols for gpus rely on a backing store to
supply the correct data. The aim of this patch is to make this backing store
configurable i.e. it comes into play only when a particular option:
--access-backing-store is invoked.
The backing store has been there since M5 and GEMS were integrated. The only
difference is that earlier the system used to maintain the backing store and
ruby's copy was write-only. Sometime last year, we moved to data being
supplied supplied by ruby in SE mode simulations. And now we have patches on
the reviewboard, which remove ruby's copy of memory altogether and rely
completely on the system's memory to supply data. This patch adds back a
SimpleMemory member to RubySystem. This member is used only if the option:
access-backing-store is set to true. By default, the memory would not be
accessed.
This patch makes the memory system ISA-agnostic by enabling the Ruby
Sequencer to dynamically determine if it has to do a store check. To
enable this check, the ISA is encoded as an enum, and the system
is able to provide the ISA to the Sequencer at run time.
--HG--
rename : src/arch/x86/insts/microldstop.hh => src/arch/x86/ldstflags.hh
Piobus was recently added to se scripts for ruby so that the interrupt
controller can be connected to something (required since the interrupt
controller sends address range messages). This patch removes the piobus
and instead, the pio port of ruby port will now ignore the range change
messages in se mode.