This changeset adds the necessary changes for running
GCN3 ISA with VIPER in apu_se.py.
Changes to the VIPER protocol configs are made to add support
for DMA and scalar caches.
hsaTopology is added to help the pseudo FS create the files
needed by ROCm to understand the device on which the SW is
being run.
Change-Id: I0f47a6a36bb241a26972c0faafafcf332a7d7d1f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30274
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Adding getter and setter methods for getting and setting the atomic ops
in the WriteMask class. This allows for message types with WriteMasks to
get or set the atomic ops without explicitly modifying the constructor
for the message type. This will beused by the DMASequencer which uses the
SequencerMsg type where the constructor is auto generated via SLICC.
Change-Id: I71787d294c1b89547618e9a13e386b65bb3e1021
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31474
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The data rate is used by the drampower lib to estimate the power
consumption of the DRAM Core. Previously, we used the formula:
burst_cycles = divCeil(p->tBURST_MAX, p->tCK);
data_rate = p->burst_length / burst_cycles;
to derive the data_rate. However, under certain configurations this
formula computes the wrong result due to rounding errors. This patch
simplifies the way we derive the data_rate by passing the value of the
DRAM parameter beats_per_clock.
Change-Id: Ic8cd35bb4641d9c0a704675d2672a6fe4f4ec13e
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Wendy Elsasser <wendy.elsasser@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30056
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Compiling gem5 with dramsim2 included fails due to some inconsistencies in
including SimObjects. In this patch this issue is fixed along with
temporarily disabling -Werror=nonnull-compare in CCFLAGS. Also, the remote
for cloning dramsim2 has been changed.
Change-Id: Ia24095150d026d736352aaf0d735b7554ede10bb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31434
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The RubyPrefetcher uses makeNextStrideAddress() with
a negative stride to find prefetched address.
The type of this expression is:
uint64_t + uint32_t * int;
This gives wrong result due to implicit conversion.
Fix this with static cast and it works correctly:
uint64_t + int * int;
Change-Id: I36e17e00d5c66c3699fe1d5b287971225a162d04
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31314
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch adds the ability for a host-OS process external to gem5
to access the backing store via POSIX shared memory.
The new param shared_backstore of the System object is the filename
of the shared memory (i.e., the first argument to shm_open()).
Change-Id: I98c948a32a15049a4515e6c02a14595fb5fe379f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30994
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Adds support for device memories in the system and RubySystem classes.
Devices may register memory ranges with the system class and packets
which originate from the device MasterID will update the device memory
in Ruby. In RubySystem functional access is updated to keep the packets
within the Ruby network they originated from.
Change-Id: I47850df1dc1994485d471ccd9da89e8d88eb0d20
JIRA: https://gem5.atlassian.net/browse/GEM5-470
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29653
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Previously, with HSAIL, we were guaranteed by the HSA specification
that the GPU will never issue unaligned accesses. However, now
that we are directly running GCN this is no longer true.
Accordingly, this commit adds support for unaligned accesses.
Moreover, to reduce the replication of nearly identical
code for the different request types, I also added new helper
functions that are called by all the different memory request
producing instruction types in op_encodings.hh.
Adding support for unaligned instructions requires changing
the statusBitVector used to track the status of the memory
requests for each lane from a bit per lane to an int per lane.
This is necessary because an unaligned access may span multiple
cache lines. In the worst case, each lane may span multiple
cache lines. There are corresponding changes in the files that
use the statusBitVector.
Change-Id: I319bf2f0f644083e98ca546d2bfe68cf87a5f967
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29920
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There was an add-hoc check added to getAddrRanges, but the other methods
would just segfault if they tried to talk to their peers. This change
wraps all the calls in try blocks and catches the exception which the
peer will throw if it's the default and the port is not actually
connected to anything.
Change-Id: Ie46be0230f33f74305c599b251ca319a65ba008d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30296
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The XBar uses the concept of Layers to model throughput and
instantiates one Layer per master. As it forwards a packet to and from
master, the corresponding Layer is marked as occupied for a number of
cycles. Requests/responses to/from a master are blocked while the
corresponding Layer is occupied. Previously the delay would be
calculated based on the formula 1 + size / width, which assumes that
the Layer is always occupied for 1 cycle while processing the packet
header. This changes makes the header latency a parameter which
defaults to 1.
Change-Id: I12752ab4415617a94fbd8379bcd2ae8982f91fd8
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30054
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Tested-by: kokoro <noreply+kokoro@google.com>
The System class has a few different arrays of values which each
correspond to a thread of execution based on their position. This
change collects them together into a single class to make managing them
easier and less error prone. It also collects methods for manipulating
those threads as an API for that class.
This class acts as a collection point for thread based state which the
System class can look into to get at all its state. It also acts as an
interface for interacting with threads for other classes. This forces
external consumers to use the API instead of accessing the individual
arrays which improves consistency.
Change-Id: Idc4575c5a0b56fe75f5c497809ad91c22bfe26cc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/25144
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Instead of calling into object files after the fact and asking them to
put symbols into a target symbol table, this change makes object files
fill in a symbol table themselves at construction. Then, that table can
be retrieved and used to fill in aggregate tables, masked, moved,
and/or filtered to have only one type of symbol binding.
This simplifies the symbol management API of the object file types
significantly, and makes it easier to deal with symbol tables alongside
binaries in the FS workload classes.
Change-Id: Ic9006ca432033d72589867c93d9c5f8a1d87f73c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/24787
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
When the write buffer is full, it still has space to store an additional
number of entries (reserve) equal to the number of MSHRs so that if any
of them requires a writeback this can be handled. Even if the slave port
is blocked, a prefetcher can generate new MSHR entries that may lead to
additional writebacks and eventually saturate the reserve space. This is
solved by checking if the cache is blocked for accesses before
prefetching data.
Change-Id: Iaad04dd6786a09eab7afae4a53d1b1299c341f33
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29615
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
makeLineAddress function uses m_block_size_bits to create
masked addresses. m_block_size_bits is used to specify
cache, directory, and memory controller interleaving,
and it can be larger than the cache line size.
To generate addresses that can align with the cache line
rather than the interleaving granularity, a version of
makeLineAddress is created to specify bits that need to
be masked.
Change-Id: I06deec4949da7fa46f1d6f7575334f18ee61c786
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28135
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Onur Kayıran <onur.kayiran@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
This patch enables cache controllers to make response
messages in advance, store them in a per-address saved
map in an output message buffer and enqueue them altogether
in the future. This patch introduces new slicc statement
called defer_enqueueing. This patch would help simplify
the logic of state machines that deal with coalesing
multiple requests from different requestors.
Change-Id: I566d4004498b367764238bb251260483c5a1a5e5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28132
Reviewed-by: Tuan Ta <qtt2@cornell.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Without this fix `error: call to function 'operator<<' that is neither
visible in the template definition nor found by argument-dependent
loopup` is thrown when compiling HSAIL_X86 using a clang compiler (at
`base/cprintf_formats.hhi:139`).
This error is due to a "<<" operator in a template declared prior to its
definition in the code. The operator is used in
`base/cprintf_formats.hh`, included in `base/cprintf.hh`, and defined in
`mem/ruby/common/BoolVec.hh`. Therefore, for clang to compile without
error, `mem/ruby/common/BoolVec.hh` must be included before
`base/cprintf.hh` when generating the
`mem/ruby/protocol/RegionBuffer_Controller.cc` in
`mem/slicc/symbols/StateMachine.py`.
Due to the gem5 style-checker, an overly-verbose solution was required
to permit this patch to be committed to the codebase.
Change-Id: Ie0ae4053e4adc8c4e918e4a714035637925ca104
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29532
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Add support for multiple networks per RubySystem. This is done by
introducing local IDs to each network and translating from a global ID
passed around through Ruby and SLICC code. The local IDs represents the
NodeID of a MachineType in the network and are ordered the same way
that NodeIDs are ordered using MachineType_base_number. If there are
not multiple networks in a RubySystem the local and global IDs are the
same value.
This is useful in cases where multiple isolated networks are needed to
support devices with Ruby caches which do not interact with other
networks. For example, a dGPU device will have a cache hierarchy that
will not interact with the CPU cache hierachy.
Change-Id: I33a917b3a394eec84b16fbf001c3c2c44c047f66
JIRA: https://gem5.atlassian.net/browse/GEM5-445
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27927
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This singleton object is used thruoughout the simulator. There is
really no reason not to have it statically allocated, except that
whether it was allocated seems to sometimes be used as a signal that
something already put symbols in it, specifically in SE mode.
To keep that functionality for the moment, this change adds an "empty"
method to the SymbolTable class to make it easy to check if the symbol
table is empty, or if someone already populated it.
Change-Id: Ia93510082d3f9809fc504bc5803254d8c308d572
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/24785
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
AbstractController sends requests using a QueuedMasterPort which has an
implicit buffer which is unbounded. Remove this by changing the port to
a MasterPort and implement a retry mechanism for AbstractController.
Although the request remains in the MessageBuffer if a retry is needed,
the additional retry logic optimizes serviceMemoryQueue slightly and
prevents the DRAMCtrl retry stats from being incorrect due to multiple
calls to sendTimingReq.
Change-Id: I8c592af92a1a499a418f34cfee16dd69d84803ad
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28387
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Remove the read/write tables and coalescing table and introduce a two
levels of tables for uncoalesced and coalesced packets. Tokens are
granted to GPU instructions to place in uncoalesced table. If tokens
are available, the operation always succeeds such that the 'Aliased'
status is never returned. Coalesced accesses are placed in the
coalesced table while requests are outstanding. Requests to the same
address are added as targets to the table similar to how MSHRs
operate.
Change-Id: I44983610307b638a97472db3576d0a30df2de600
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27429
Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Bradford Beckmann <brad.beckmann@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>