This change replaces the __attribute__ syntax with the now standard [[]]
syntax. It also reorganizes compiler.hh so that all special macros have
some explanatory text saying what they do, and each attribute which has a
standard version can use that if available and what version of c++ it's
standard in is put in a comment.
Also, the requirements as far as where you put [[]] style attributes are
a little more strict than the old school __attribute__ style. The use of
the attribute macros was updated to fit these new, more strict
requirements.
Change-Id: Iace44306a534111f1c38b9856dc9e88cd9b49d2a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35219
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These were including instruction class definitions from x86 for some
reason. There was no code in those .cc files which actually used
anything from them, as evidenced by the fact that the GCN3_X86 build
still works. No other code in the file was conditionally compiled as of
today.
Change-Id: I3cef8348fb601dd7af67665cf64bbf514c91c3db
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34577
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
This version of garnet includes HeteroGarnet which
supports heterogenous interconnect systems, flexible
router and link configurations, and better debugging
resources.
This patch changes the garnet directory structure
to not include the version number. The user will be
informed about the garnet version being used.
Change-Id: Id4763421528305193ae0cd10c159b385a9513553
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34259
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Initializing the network bridge with NULL causes it to have
an class error when instatiating a link. The bridge is only
needed whne either a CDC or SerDes is enabled. This is handled
later during construction of the GarnetLink.
Change-Id: If19a21a6d9bf49449b9c390467d08d3422ae991a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34257
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch augments the MESI_Three_Level Ruby protocol with hardware
transactional memory support.
The HTM implementation relies on buffering of speculative memory updates.
The core notifies the L0 cache controller that a new transaction has
started and the controller in turn places itself in transactional state
(htmTransactionalState := true).
When operating in transactional state, the usual MESI protocol changes
slightly. Lines loaded or stored are marked as part of a transaction's
read and write set respectively. If there is an invalidation request to
cache line in the read/write set, the transaction is marked as failed.
Similarly, if there is a read request by another core to a speculatively
written cache line, i.e. in the write set, the transaction is marked as
failed. If failed, all subsequent loads and stores from the core are
made benign, i.e. made into NOPS at the cache controller, and responses
are marked to indicate that the transactional state has failed. When the
core receives these marked responses, it generates a HtmFailureFault
with the reason for the transaction failure. Servicing this fault does
two things--
(a) Restores the architectural checkpoint
(b) Sends an HTM abort signal to the cache controller
The restoration includes all registers in the checkpoint as well as the
program counter of the instruction before the transaction started.
The abort signal is sent to the L0 cache controller and resets the
failed transactional state. It resets the transactional read and write
sets and invalidates any speculatively written cache lines. It also
exits the transactional state so that the MESI protocol operates as
usual.
Alternatively, if the instructions within a transaction complete without
triggering a HtmFailureFault, the transaction can be committed. The core
is responsible for notifying the cache controller that the transaction
is complete and the cache controller makes all speculative writes
visible to the rest of the system and exits the transactional state.
Notifting the cache controller is done through HtmCmd Requests which are
a subtype of Load Requests.
KUDOS:
The code is based on a previous pull request by Pradip Vallathol who
developed HTM and TSX support in Gem5 as part of his master’s thesis:
http://reviews.gem5.org/r/2308/index.html
JIRA: https://gem5.atlassian.net/browse/GEM5-587
Change-Id: Icc328df93363486e923b8bd54f4d77741d8f5650
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30319
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently there are independent round robin arbiter at each
input port and output port. Every time a VC is selected for
output allocation round robin is incremented irrespective of
if it is selected by its output port or not. This leads to
unfair arbitration at input port and is well known[1]. This
patch fixes it to increment only if the output port also
selects it.
[1] D. U. Becker and W. J. Dally, "Allocator implementations
for network-on-chip routers," Proceedings of the Conference
on High Performance Computing Networking, Storage and
Analysis, Portland, OR, 2009, pp. 1-12
Change-Id: I65963fb8082c51c0e3c6e031a8b87b4f5c3626e1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32601
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Currently the Switch Allocator takes up most of the simulation
wall clock time. This function checks for all VCs to see if it
should wakeup next. The input units which are simulated before
the switch allocator could have scheduled it already. This patch
adds a check for it.
Change-Id: I8609d4e7f925aa5e97198f6cd07466530f6fcf4c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32600
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
This change allows configuring each router with a certain number
of VCs for each VNET. This is beneficial when dealing with
heterogenous link widths in a system. Configuring VCs
for each router allows one to ensure equal throughput
within the network while avoiding head-of-line blocking.
Changing a router's VCs number can be done in topology files
using the vcs_per_vnet value argument of router.
Change-Id: Icf4f510248128429a1a11f19f9802ee96f340611
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32599
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This upgrades the garnet model to support HeteroGarnet
1) Static and dynamic multi-freq domains in network
2) Support for CDC
3) Separate links for each message class
4) Separate linkwidth for each message class
5) Support for SerDes
Change-Id: I6d00e3b5cb3745e849d221066cb46b2138c47871
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32597
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
There are race conditions while running several benchmarks, where
the DMA engine and the CorePair simultaneously send requests for the
same block. This patch fixes two scenarios
(a) If the request from the DMA engine arrives before the one from the
CorePair, the directory controller records it as a pending request.
However, once the DMA request is serviced, the directory doesn't check
for pending requests. The CorePair, consequently, never sees a response
to its request and this results in a Deadlock.
Added call to wakeUpDependents in the transition from BDR_Pm to U
Added call to wakeUpDependents in the transition from BDW_P to U
(b) If the request from the CorePair is being serviced by the directory
and the DMA requests for the same block, this causes an invalid
transition because the current coherence doesn't take care of this
scenario.
Added transition state where the requests from DMA are added to the
stall buffer.
Updated B to U CoreUnblock transition to check all buffers, as the DMA
requests were being placed later in the stall buffer than was being checked
Change-Id: I5a76efef97723bc53cf239ea7e112f84fc874ef8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31996
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This changeset adds the necessary changes for running
GCN3 ISA with VIPER in apu_se.py.
Changes to the VIPER protocol configs are made to add support
for DMA and scalar caches.
hsaTopology is added to help the pseudo FS create the files
needed by ROCm to understand the device on which the SW is
being run.
Change-Id: I0f47a6a36bb241a26972c0faafafcf332a7d7d1f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30274
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Adding getter and setter methods for getting and setting the atomic ops
in the WriteMask class. This allows for message types with WriteMasks to
get or set the atomic ops without explicitly modifying the constructor
for the message type. This will beused by the DMASequencer which uses the
SequencerMsg type where the constructor is auto generated via SLICC.
Change-Id: I71787d294c1b89547618e9a13e386b65bb3e1021
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31474
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The RubyPrefetcher uses makeNextStrideAddress() with
a negative stride to find prefetched address.
The type of this expression is:
uint64_t + uint32_t * int;
This gives wrong result due to implicit conversion.
Fix this with static cast and it works correctly:
uint64_t + int * int;
Change-Id: I36e17e00d5c66c3699fe1d5b287971225a162d04
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31314
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Adds support for device memories in the system and RubySystem classes.
Devices may register memory ranges with the system class and packets
which originate from the device MasterID will update the device memory
in Ruby. In RubySystem functional access is updated to keep the packets
within the Ruby network they originated from.
Change-Id: I47850df1dc1994485d471ccd9da89e8d88eb0d20
JIRA: https://gem5.atlassian.net/browse/GEM5-470
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29653
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Previously, with HSAIL, we were guaranteed by the HSA specification
that the GPU will never issue unaligned accesses. However, now
that we are directly running GCN this is no longer true.
Accordingly, this commit adds support for unaligned accesses.
Moreover, to reduce the replication of nearly identical
code for the different request types, I also added new helper
functions that are called by all the different memory request
producing instruction types in op_encodings.hh.
Adding support for unaligned instructions requires changing
the statusBitVector used to track the status of the memory
requests for each lane from a bit per lane to an int per lane.
This is necessary because an unaligned access may span multiple
cache lines. In the worst case, each lane may span multiple
cache lines. There are corresponding changes in the files that
use the statusBitVector.
Change-Id: I319bf2f0f644083e98ca546d2bfe68cf87a5f967
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29920
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
makeLineAddress function uses m_block_size_bits to create
masked addresses. m_block_size_bits is used to specify
cache, directory, and memory controller interleaving,
and it can be larger than the cache line size.
To generate addresses that can align with the cache line
rather than the interleaving granularity, a version of
makeLineAddress is created to specify bits that need to
be masked.
Change-Id: I06deec4949da7fa46f1d6f7575334f18ee61c786
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28135
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Onur Kayıran <onur.kayiran@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
This patch enables cache controllers to make response
messages in advance, store them in a per-address saved
map in an output message buffer and enqueue them altogether
in the future. This patch introduces new slicc statement
called defer_enqueueing. This patch would help simplify
the logic of state machines that deal with coalesing
multiple requests from different requestors.
Change-Id: I566d4004498b367764238bb251260483c5a1a5e5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28132
Reviewed-by: Tuan Ta <qtt2@cornell.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Add support for multiple networks per RubySystem. This is done by
introducing local IDs to each network and translating from a global ID
passed around through Ruby and SLICC code. The local IDs represents the
NodeID of a MachineType in the network and are ordered the same way
that NodeIDs are ordered using MachineType_base_number. If there are
not multiple networks in a RubySystem the local and global IDs are the
same value.
This is useful in cases where multiple isolated networks are needed to
support devices with Ruby caches which do not interact with other
networks. For example, a dGPU device will have a cache hierarchy that
will not interact with the CPU cache hierachy.
Change-Id: I33a917b3a394eec84b16fbf001c3c2c44c047f66
JIRA: https://gem5.atlassian.net/browse/GEM5-445
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27927
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
AbstractController sends requests using a QueuedMasterPort which has an
implicit buffer which is unbounded. Remove this by changing the port to
a MasterPort and implement a retry mechanism for AbstractController.
Although the request remains in the MessageBuffer if a retry is needed,
the additional retry logic optimizes serviceMemoryQueue slightly and
prevents the DRAMCtrl retry stats from being incorrect due to multiple
calls to sendTimingReq.
Change-Id: I8c592af92a1a499a418f34cfee16dd69d84803ad
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28387
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Remove the read/write tables and coalescing table and introduce a two
levels of tables for uncoalesced and coalesced packets. Tokens are
granted to GPU instructions to place in uncoalesced table. If tokens
are available, the operation always succeeds such that the 'Aliased'
status is never returned. Coalesced accesses are placed in the
coalesced table while requests are outstanding. Requests to the same
address are added as targets to the table similar to how MSHRs
operate.
Change-Id: I44983610307b638a97472db3576d0a30df2de600
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27429
Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The recent commit dd6cd33 modified the behaviour of the the Ruby
sequencer to handle load linked requests as loads rather than
stores. This caused the regression test
realview-simple-timing-dual-ruby-ARM-x86_64-opt
to become stuck when booting Linux. This patch fixes the issue by
adding a missing forward_eviction_to_cpu action to the state
transition(OM, Fwd_GETX, IM).
Change-Id: I8f253c5709488b07ddc5143a15eda406e31f3cc6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28787
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>