Commit Graph

142 Commits

Author SHA1 Message Date
Giacomo Travaglini
41928dac80 misc: Remove unused params() definitions
Lots of times the params() helper has been defined but not used

Change-Id: Id71829aca71341d46964d8f071099342b946b62f
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41613
Tested-by: kokoro <noreply+kokoro@google.com>
2021-02-19 23:27:34 +00:00
Alexander Klimov
92ba3ba843 misc: Use PARAMS
The patch is using the newly defined PARAMS macro to replace
custom params() getters in derived class.

The patch is also removing redundant _params:
Instead of creating yet another _params field, SimObject descendants
should use params() to expose the real type of SimObject::_params they
already have.

Change-Id: I43394cebb9661fe747bdbb332236f0f0181b3dba
Signed-off-by: Alexander Klimov <Alexander.Klimov@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39900
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-02-19 23:27:34 +00:00
Gabe Black
9a0b79459d misc: Fix mismatched struct/class "tags" and reenable that warning.
The mismatches were from places where Params structs had been declared
as classes instead of structs, and ruby's MachineID struct.

A comment describing why the warning had been disabled said that it was
because of libstdc++ version 4.8. As far as I can tell, that version is
old enough to be outside the window we support, and so that should no
longer be a problem. It looks like the oldest version of gcc we
support, 5.0, corresponds with approximately libstdc++ version 6.0.21.

https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html#abi.versioning

Change-Id: I75ad92f3723a1883bd47e3919c5572a353344047
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40953
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-02-19 08:29:00 +00:00
Alexandru Dutu
14d6e8fac4 arch-gcn3: Implementation of s_sleep
This changeset implements the s_sleep instruction in a similar
way to s_waitcnt.

Change-Id: I4811c318ac2c76c485e2bfd9d93baa1205ecf183
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39115
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-02-04 00:07:10 +00:00
Bobby R. Bruce
a4345ff324 gpu-compute,misc: Remove unused private variable
Clang 9 fails to compile GCN3 due to the unused private variable,
`_nxtFreeIdx`, in `src/gpu-compute/dyn_pool_manager.hh`. This variable
has therefore been removed.

Change-Id: I33f2e9634bbf8d5cea7a42ae2ac9f3ea8298d406
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40397
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-02-03 19:08:02 +00:00
Bobby R. Bruce
ae33daa8d7 gpu-compute,misc: Fix Clang missing override errors
Clang fails to compile GCN3 due to missing overrides in
`src/gpu-compute/gpu_command_processor.hh`. This commit fixes this
errror.

Change-Id: I6da9fce7c3eb86a5418a931ee4f225cceda488a5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40396
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-02-03 19:08:02 +00:00
Gabe Black
fc4caa6ad0 misc: Re-remove Authors lines from source files.
These were universally removed a while ago, but a bunch have crept back
in. Remove them.

Change-Id: I3cb5b9f40c9c19aafb5e39a51d1baeae60a591c0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40335
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
2021-02-03 12:55:17 +00:00
Sooraj Puthoor
965ad12b9a dev-hsa: enable interruptible hsa signal support
Event creation and management support from emulated drivers is required
to support interruptible signals in HSA and this support was not
available. This changeset adds the event creation and management support
in the emulated driver.  With this patch, each interruptible signal
created by the HSA runtime is associated with a signal event. The HSA
runtime can then put a thread waiting on a signal condition to sleep
asking the driver to monitor the event associated with that signal. If
the signal is modified by the GPU, the dispatcher notifies the driver
about signal value change.  If the modifier is a CPU thread, the thread
will have to make HSA API calls to modify the signal and these API calls
will notify the driver about signal value change. Once the driver is
notified about a change in the signal value, the driver checks to see if
any thread is sleeping on that signal and wake up the sleeping thread
associated with that event. The driver has also implemented the time_out
wakeup that can wake up the thread after a certain time period has
expired. This is also true for barrier packets.

Each signal has an event address in a kernel managed and allocated
event page that can be used as a mailbox pointer to notify an event.
However, this feature used by non-CPU agents to communicate with the
driver is not implemented by this changeset because the non-CPU HSA
agents in our model can directly communicate with driver in our
implementation. Having said that, adding that feature should be trivial
because the event address and event pages are correctly setup by this
changeset and just adding the event page's virtual address to our PIO
doorbell interface in the page tables and registering that pio address
to the driver should be sufficient. Managing mailbox pointer for an
event is based on event ID and using this event ID as an index into
event page, this changeset already provides a unique mailbox pointer for
each event.

Change-Id: Ic62794076ddd47526b1f952fdb4c1bad632bdd2e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38335
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-01-31 03:25:05 +00:00
Kyle Roarty
68fe6fa6cb gpu-compute: Simplify LGKM decrementing for Flat instructions
This commit makes it so LGKM count is decremented in a single place
(after completeAcc), which fixes a couple of potential bugs

1. Data is only written by completeAcc, not after initiateAcc. LGKM
count is supposed to be decremented after data is written.
2. LGKM count is now properly decremented for atomics without return

Change-Id: Ic791af3b42e04f7baaa0ce50cb2a2c6286c54f5a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39396
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-01-21 21:41:05 +00:00
Matthew Poremba
5323cccfdd arch-gcn3,gpu-compute: Update stats style for GPU
Convert all gpu-compute stats to Stats::Group style.

Change-Id: I29116f1de53ae379210c6cfb5bed3fc74f50cca5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39135
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-01-18 17:58:05 +00:00
Kyle Roarty
a4657f1b84 gpu-compute: Fix LGKM decrementing for flat atomic insts
A prior commit (f6ec145fc0) fixed early LGKM decrementing for flat loads
and stores, but failed to address flat atomics.

Per the GCN3 ISA, LGKM count is decremented on flat atomics with return
when the data has been returned. This patch checks if the flat
instruction is an atomic with return, and decrements LGKM count if so.

Change-Id: I5c0c2c205a8b21327d4c42ba71c59842c15bd63b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39155
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-01-15 06:00:06 +00:00
gauravjain14
c29523665e gpu-compute: Support for dynamic register alloc
SimplePoolManager doesn't allow mapping of two WGs
simultaneously on the same Compute Unit (provided
the previous WG has been mapped to all the SIMDs)
even if there is sufficient VRF and SRF space
available.

DynPoolManager takes care of that by dynamically
allocating and deallocating register file space
to wavefronts

Change-Id: I2255c68d4b421615d7b231edc05d3ebb27cbd66c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32034
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>
2021-01-14 17:04:27 +00:00
Gabe Black
c6933a27da misc: Fix missing includes.
Change-Id: I545ff03041e8fe66dc489c6aa95c009e54df0970
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38995
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-01-13 08:55:59 +00:00
Kyle Roarty
f6ec145fc0 gpu-compute: Fix FLAT insts decrementing lgkm count early
FLAT instructions used to decrement lgkm count on execute, while the
GCN3 ISA specifies that lgkm count should be decremented on data being
returned or data being written.

This patch changes it so that lgkm is decremented after initiateAcc (for
stores) and after completeAcc (for loads) to better reflect the ISA
definition.

This fixes a bug where waitcnts would be satisfied even though the
memory access wasn't completed, which lead to instructions using the
wrong data.

Change-Id: I596cb031af9cda8d47a1b5e146e4a4ffd793d36c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38696
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-01-07 17:12:31 +00:00
Kyle Roarty
e49a072c7d gpu-compute: Use dict.get syntax for accessing buildEnv keys
37775 removed SmartDict, which is the type buildEnv used to be.
Because of that change, doing buildEnv[key] with a key not in the dict
returns KeyError instead of False. By using buildEnv(key, False), we are
able to return False when the key isn't in the dict.

Change-Id: I4aae29b95b082efb2b021f21d608f9cd1c196379
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38135
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-12-01 19:19:52 +00:00
Kyle Roarty
0bb385941b gpu-compute: Add exp_cnt tracking for buffer store instructions
exp_cnt (expInstsIssued in the code) is used in the waitcnt instruction
to track that data has been read out of VGPRs in previous global
memory instructions, making it safe to overwrite the VGPRs used in said
global memory instructions.

Previously, exp_cnt wasn't being tracked at all, which lead to the
waitcnt finishing immediately, leading to the memory instruction's VPGRs
getting overwritten by subsequent instructions, causing errors.

This patch makes it so waitcnts waiting on exp_cnt will wait for MUBUF
buffer store instructions to read their VGPRs before completing

Change-Id: Idd2b59511bc086cf316217da27b7a228272b0b0f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37555
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-11-30 20:59:31 +00:00
Daniel Gerzhoy
9a01d3e927 dev-hsa,gpu-compute: Agent Packet handler implemented.
HSA packet processor will now accept and process agent packets.

Type field in packet is command type.
For now:
        AgentCmd::Nop = 0
        AgentCmd::Steal = 1

Steal command steals the completion signal for a running kernel.
This enables a benchmark to use hsa primitives to send an agent
packet to steal the signal, then wait on that signal.

Minimal working example to be added in gem5-resources.

Change-Id: I37f8a4b7ea1780b471559aecbf4af1050353b0b1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37015
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-11-16 16:12:48 +00:00
Tuan Ta
173c1c6eb0 gpu-compute,mem-ruby: Replace ACQUIRE and RELEASE request flags
This patch replaces ACQUIRE and RELEASE flags which are HSA-specific.
ACQUIRE flag becomes INV_L1 in VIPER protocol. RELEASE flag is removed.
Future protocols may support extra cache coherence flags like INV_L2 and
WB_L2.

Change-Id: I3d60c9d3625c898f4110a12d81742b6822728533
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32859
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-11-04 21:09:26 +00:00
Gabe Black
d05a0a4ea1 misc: Delete the now unnecessary create methods.
Most create() methods are no longer necessary. This change deletes them,
and occasionally moves some code from them into the constructors they
call.

Change-Id: Icbab29ba280144b892f9b12fac9e29a0839477e5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36536
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-10-30 04:00:20 +00:00
Gabe Black
3a49ed0156 gpu: Use X86ISA instead of TheISA in src/gpu-compute.
These files are nominally not tied to the X86ISA, but in reality they
are because they reach into the GPU TLB, which is defined unchangeably in
the X86ISA namespaces, and uses data structures within it. Rather than try
to pretend that these structures are generic, we'll instead just use X86ISA
instead of TheISA. If this really does become generic in the future, a
base class with the ISA agnostic essentials defined in it can be used
instead, and the ISA specific TLBs can defined their own derived class
which has whatever else they need. Really the compute unit shouldn't be
communicating with the TLB using sender state since those are supposed
to be little notes for the sender to keep with a transaction, not for
communicating between entities across a port.

Change-Id: Ie6573396f6c77a9a02194f5f4595eefa45d6d66b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34174
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-10-26 20:32:43 +00:00
Gabe Black
463cb28ca5 misc: Use compiler.hh macros when available.
Some places were hand coding __attribute__s when macros in compiler.hh
were available to do that job. Using the macros helps abstract away
compiler specific details and should be used when possible.

Change-Id: I94befebcfde2d673e874e9959588f69781bd9021
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35975
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-10-19 05:52:40 +00:00
Kyle Roarty
b20cc7e6d8 gpu-compute,mem-ruby: Properly create/handle WriteCompletePkts
There is a flow of packets as so:
WriteResp -> WriteReq -> WriteCompleteResp

These packets share some variables, in particular senderState and a
status vector.

One issue was the WriteResp packet decremented the status vector, which
was used by the WriteCompleteResp packets to determine when to handle
the global memory response. This could lead to multiple
WriteCompleteResp packets attempting to handle the global memory
response.

Because of that, the WriteCompleteResp packets needed to handle the
status vector. this patch moves WriteCompleteResp packet handling back
into ComputeUnit::DataPort::processMemRespEvent from
ComputeUnit::DataPort::recvTimingResp. This helps remove some redundant
code.

This patch has the WriteResp packet return without doing any status
vector handling, and without deleting the senderState, which had
previously caused a segfault.

Another issue was WriteCompleteResp packets weren't being issued for
each active lane, as the coalesced request was being issued too early.
In order to fix that, we have to ensure every active lane puts their
request into their applicable coalesced request before issuing the
coalesced request. Because of that change, we change the issuing of
CoalescedRequests from GPUCoalescer::coalescePacket to
GPUCoalescer::completeIssue.

That change involves adding a new variable to store the
CoalescedRequests that are created in the calls to coalescePacket. This
variable is a map from instruction sequence number to coalesced
requests.

Additionally, the WriteCompleteResp packet was attempting to access
physical memory in hitCallback while not having any data, which
caused a crash. This can be resolved either by not allowing
WriteCompleteResp packets to access memory, or by copying the data
from the WriteReq packet. This patch denies WriteCompleteResp packets
memory access in hitCallback.

Finally, in VIPERCoalescer::writeCompleteCallback there was a map
that held the WriteComplete packets, but no packets were ever being
removed. This patch removes packets that match the address that was
passed in to the function.

Change-Id: I9a064a0def2bf6c513f5295596c56b1b652b0ca4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33656
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-10-15 17:52:51 +00:00
Gabe Black
91d83cc8a1 misc: Standardize the way create() constructs SimObjects.
The create() method on Params structs usually instantiate SimObjects
using a constructor which takes the Params struct as a parameter
somehow. There has been a lot of needless variation in how that was
done, making it annoying to pass Params down to base classes. Some of
the different forms were:

const Params &
Params &
Params *
const Params *
Params const*

This change goes through and fixes up every constructor and every
create() method to use the const Params & form. We use a reference
because the Params struct should never be null. We use const because
neither the create method nor the consuming object should modify the
record of the parameters as they came in from the config. That would
make consuming them not idempotent, and make it impossible to tell what
the actual simulation configuration was since it would change from any
user visible form (config script, config.ini, dot pdf output).

Change-Id: I77453cba52fdcfd5f4eec92dfb0bddb5a9945f31
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35938
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-10-14 12:06:44 +00:00
Matthew Poremba
53807c8276 configs,gpu-compute: Fixes to connect gmTokenPort
When the TokenPort was moved from the GCN3 staging branch to develop the
TokenPort was changed from being the port connecting the ComputeUnit to
Ruby's vector memory port to a sideband port which inhibits requests to
Ruby's vector memory port. As such, it needs to be explicitly connected
as a new port. This changes the getPort method in ComputeUnit to be
aware of the port as well as modifying the example config to connect to
TCPs.

The iteration to connect in the config file was modified since it was
not properly connecting to TCPs each time and Ruby.py does not
explicitly return a list of each MachineType.

Change-Id: Ia70a6756b2af54d95e94d19bec5d8aadd3c2d5c0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35096
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-30 20:19:21 +00:00
Gabe Black
b877efa6d4 misc: Update attribute syntax, and reorganize compiler.hh.
This change replaces the __attribute__ syntax with the now standard [[]]
syntax. It also reorganizes compiler.hh so that all special macros have
some explanatory text saying what they do, and each attribute which has a
standard version can use that if available and what version of c++ it's
standard in is put in a comment.

Also, the requirements as far as where you put [[]] style attributes are
a little more strict than the old school __attribute__ style. The use of
the attribute macros was updated to fit these new, more strict
requirements.

Change-Id: Iace44306a534111f1c38b9856dc9e88cd9b49d2a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35219
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-28 21:52:59 +00:00
Gabe Black
50a0b85367 arm,base,gpu: Use std::make_unique instead of m5::make_unique.
Now that we're using c++14, we can just assume that std::make_unique
exists. We no longer have to conditionally inject our own version.

Change-Id: I5d851afb02dd05c7af93864ffec3b3184f3d4ec8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35215
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-28 05:41:08 +00:00
Kyle Roarty
347d7644eb gpu-compute: replace uint32_t* casts with bits API calls
The uint32_t* casting was challenging to fully understand what was
being done at a glance. Replaced with calls to various bits functions
as it's functionally equivalent and much more clear.

This also fixes a segfault in GPUInitAbi DPRINTFs from a mis-typed
uint32_t* cast.

Change-Id: Id5d1863942848dd7a9e5e17e8180c33adbc72f15
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34677
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-24 14:53:16 +00:00
Gabe Black
24e87cb1c5 gpu: Stop using TheISA in the GPU TLB.
This class is defined inside the X86ISA namespace, so there's no point
in pretending it's generic. Remove TheISA and let the code access what
it needs from X86ISA naturally since it's there already.

Change-Id: I21b5d2d2b9af6aa0c10ddbb5b3ddca1692188dcc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34173
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2020-09-18 13:48:45 +00:00
Kyle Roarty
be3bcd1629 gpu-compute: Fix deadlock in fetch_unit after branch instruction
The following deadlock was occuring in fetch_unit w/timingSim:
1. exec() is called, a wave is ready to fetch, so it sets pendingFetch
2. A packet is sent to ITLB to fetch for that wave
3. The wave executes a branch, causing the fetch buffer to be cleared
4. The packet is handled, and fetch() is called. However, because the
fetch buffer was cleared, it returns doing nothing.
5. exec() gets called again, but the wave will never be scheduled to
fetch, as pendingFetch is still set to true.

This patch clears pendingFetch (and dropFetch) before returning in fetch()
when the fetch buffer has been cleared.

dropFetch needed to be cleared otherwise gem5 would crash.

Change-Id: Iccbac7defc4849c19e8b17aa2492da641defb772
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34555
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-17 21:24:19 +00:00
Gabe Black
49a41da964 gpu: Fix a syntax error in X86GPUTLB.py.
The recent changes which removed master/slave terminology also
accidentally deleted an "=", making the syntax in that file illegal.

Change-Id: I50aa945f0f66765db36775380b98a88caff23c13
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34576
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-16 06:08:14 +00:00
Shivani Parekh
392c1ced53 misc: Replaced master/slave terminology
Change-Id: I4df2557c71e38cc4e3a485b0e590e85eb45de8b6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33553
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-10 23:02:28 +00:00
Kyle Roarty
b00b986353 misc: Use VPtr in hsa_driver.cc
This change updates HSADriver::allocateQueue to take in a ThreadContext
pointer as opposed to a PortProxy ref. This allows the TypedBufferArg
to be replaced with VPtr.

This also fixes building GCN3_X86

Change-Id: I1fea26b10c7344daf54a0cb05337e961f834a5fd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33655
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-08-31 17:44:11 +00:00
Gabe Black
1d755b4ba1 misc: Clean up usage of arch/isa_traits.hh.
isa_traits.hh used to have much more in it, but now it only has
PageShift, PageBytes, and (for now) the guest endianness. These values
should only be retrieved from the System class generally speaking, so
only the system class should include arch/isa_traits.hh.

Some gpu compute related files need PageBytes or PageShift. Even though
those files don't advertise their ISA dependence, they are tied to x86.
In those files, they can include arch/x86/isa_traits.hh.

The only other file which legitimately needs arch/isa_traits.hh is the
decoder cache since it uses PageBytes to size an array.

Change-Id: I12686368715623e3140a68a7027c136bd52567b1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33203
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-08-28 07:20:58 +00:00
Tony Gutierrez
94000aefe6 gpu-compute: Create CU's ports in the standard way
The CU would initialize its ports in getMasterPort(), which
is not desirable as getMasterPort() may be called several
times for the same port. This can lead to a fatal if the CU
expects to only create a single port of a given type, and may
lead to other issues where stat names are duplicated.

This change instantiates and initializes the CU's ports in the
CU constructor using the CU params.

The index field is also removed from the CU's ports because the
base class already has an ID field, which will be set to the
default value in the base class's constructor for scalar ports.

It doesn't make sense for scalar port's to take an index because
they are scalar, so we let the base class initialize the ID to
the invalid port ID.

Change-Id: Id18386f5f53800a6447d968380676d8fd9bac9df
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32836
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-08-27 16:31:46 +00:00
Emily Brickey
6333e914d3 gpu-compute: update port terminology
Change-Id: I3121c4afb1e137aebe09c1d694e9484844d02b9b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32313
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Poremba <chesp3@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-08-26 16:48:13 +00:00
Kyle Roarty
b872f02ab1 configs,gpu-compute,mem-ruby: connect gmTokenPorts in apu_se
This patch adds gmTokenPorts to the ComputeUnit and RubyGPUCoalescer
python classes so the gmTokenPorts can be connected in apu_se.

Change-Id: Icf3cb05c757754d6935b46f14e4b1b1d5072c4ca
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32677
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-08-18 23:47:16 +00:00
Gabe Black
40e8cac306 misc: Make registerExitCallback use CallbackQueue2.
Issue-on: https://gem5.atlassian.net/browse/GEM5-698
Change-Id: I526d4a19ca4e54a6469a4ee26693c1c0400fcc70
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32644
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-08-18 11:49:06 +00:00
Matthew Poremba
9b95f32b12 arch-gcn3,gpu-compute: Fix GCN3 related compiler errors
Fix all errors that were revealed using the util/compiler-test.sh
script.

Change-Id: Ie0d35568624e5e1405143593f0677bbd0b066b61
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31154
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-20 14:53:13 +00:00
Tony Gutierrez
4d737462c2 gpu-compute, arch-gcn3: Change how waitcnts are implemented
Use single counters per memory operation type and increment
them upon issue, not execute.

Change-Id: I6afc0b66b21882538ef90a14a57a3ab3cc7bd6f3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29973
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 16:36:23 +00:00
Tony Gutierrez
63c76448eb gpu-compute: Add pipeline stage interface classes
This change separates the pipeline stage interfaces
for the GPU's compute unit into their own classes
with a well-defined interface. This helps to create
a cleaner interface for users to extend the CU
pipeline's capabilities and also helps consolidate
all the pipeline communication code in one place
in the source.

Change-Id: I569d52bce84dc1b9fbf8f0f96d53a81a2b6773c6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29972
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 16:36:09 +00:00
Alexandru Dutu
7d50d5d972 gpu-compute: No RF scheduling in case of SKIP or EMPTY
In case of flat memory instructions the status for the
LM pipe execution unit is set to SKIP or EMPTY, as the bus
between the VRF and the GM and LM pipe is shared. The
destination operands should not be scheduled for the LM pipe,
event if the wave is in the dispatch list. This can lead
to deadlock in the destination cache as DCEs are reused
and the slotsAvailableForBank count gets artificially
incremented.

Change-Id: I2230c53e3bc1032d2cccbe00fab62c99ab8de6cd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29970
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:34:59 +00:00
Tony Gutierrez
5f0378b8d0 gpu-compute: Use refs to CU in pipe stages/mem pipes
The pipe stages and memory pipes are changed to store
a reference to their parent CU as opposed to a pointer.
These objects will never change which CU they belong to,
and they are constructed by their parent CU.

Change-Id: Ie5476e1e2e124a024c2efebceb28cb3a9baa78c1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29969
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 16:34:36 +00:00
Michael LeBeane
83fe4754e7 gpu-compute: Fix Y-dimension ABI decode
We currently have a bug in decoding workitem ID from the kernel
descriptor with multiple dimensions.  The enable_vgpr_workitem_id bits
are currently seperated into x and y components, when they should be
treated as a single 2 bit value, where y is enabled when it is > 0,
and z is enabled when it is > 1.  The current setup allows a kernel
launch with vgprs reserved for the z dimension and not the y dimension,
which is incorrect.

Change-Id: Iee64b207feb95bcf064898d5db33b8f201e25323
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29965
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:32:56 +00:00
Tony Gutierrez
f64ff89212 gpu-compute: Don't track vector store insts in CU's headTailMap
This change fixes a memory leak due to live GPUDynInstPtr references
to vector store insts being stored in the CU's headTailMap and never
released.

This happened because store insts are not supposed to have their
head-tail latencies tracked by the headTailMap; instead they use
timing information from the GPUCoalescer. When updating the
headTailLatency stat via the headTailMap, only loads were considered
and removed from the headTailMap, however when inserting into the
headTailMap loads and stores were considered, thus leading to the
memory leak.

This change fixes the issue by only adding loads to the headTailMap.

Change-Id: I8a8f5b79f55e00481ae5e82519a9ed627a7ecbd1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29963
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:32:06 +00:00
Michael LeBeane
1d816250f8 gpu_compute: Support loading BLIT kernels
The BLIT kernels used to implement DMA through the shaders don't fill
out all of the standard fields in an amd_kernel_code_t object.  This
patch modifies the code object parsing logic to support these new
kernels.

BLIT kernels are used in APUs when using ROCm memcopies for certain size
buffers, and are used for dGPUs when the SDMA engines are disabled.

Change-Id: Id4e667474d05e311097dbec443def07dfad14a79
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29959
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:13:59 +00:00
Matt Sinclair
13079629a1 arch-gcn3: convert vALU instruction counters from 32 to 64-bit
The vALU instruction counters were previously 32 bits, but for some
workloads this value wraps around and triggers an assert failure
because the max vALU operations are reached.  To resolve this, this
commit increases the counter size to 64 bits.

Change-Id: I90ed4514669485cfea7ccc37ba9d69665277bccb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29950
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Tony Gutierrez
0c5d671ea1 gpu-compute: Init CU object for pipe stages in their ctors
This change updates the constructors of the CU's pipe
stages/memory pipelines to accept a pointer to their
parent CU. Because the CU creates these objects, and
can pass a pointer to itself to these object via their
constructors, this is the safer way to initalize these
classes.

Change-Id: I0b3732ce7c03781ee15332dac7a21c097ad387a4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29945
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-16 20:37:22 +00:00
Tony Gutierrez
af621cd6e6 gpu-compute, arch-gcn3: refactor barriers
Barriers were not modeled properly. Firstly, barriers were
allocated to each WG that was launched, which is not
correct, and the CU would provide an infinite number
of barrier slots. There are a limited number of barrier slots
per CU in reality. In addition, the CU will not allocate
barrier slots to WGs with a single WF (nothing to sync if
only one WF).

Beyond modeling problems, there also the issue of deadlock.
The barrier could deadlock because not all WFs are freed
from the barrier once it has been satisfied. Instead, we
relied on the scoreboard stage to release them lazily,
one-by-one.

Under this implementation the scoreboard may not fully release
all WFs participating in a barrier; this happens because the
first WF to be freed from the barrier could reach an s_barrier
instruction again, forever causing the barrier counts across
WFs to be out-of-sync.

This change refactors the barrier logic to:

1) Create a proper barrier slot implementation

2) Enforce (via a parameter) the number of barrier
   slots on the CU.

3) Simplify the logic and cleanup the code (i.e., we
   no longer iterate through the entire WF list each
   time we check if a barrier is satisfied).

4) Fix deadlock issues.

Change-Id: If53955b54931886baaae322640a7b9da7a1595e0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29943
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-16 20:37:22 +00:00
Tony Gutierrez
701f026ba5 gpu-compute: Fix LDS out-of-bounds behavior
The LDS is capable of handling out-of-bounds accesses,
that is, accesses that are outside the bounds of the
chunk allocated to a WG. Currently, the simulator asserts
on these accesses. This patch changes the behavior of the
LDS to return 0 for reads and dropping writes that are
out-of-bounds.

Change-Id: I5f467d0f52113e8565e1a3029e82fb89cc6f07ea
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29940
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Xianwei Zhang
024f978cff gpu-compute: enable kernel-end WB functionality
Change-Id: Ib17e1d700586d1aa04d408e7b924270f0de82efe
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29938
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com>
2020-07-13 23:32:37 +00:00