Commit Graph

151 Commits

Author SHA1 Message Date
Kyle Roarty
b40b361bee arch-vega, gpu-compute: Add vectors to hold op info
This removes the need for redundant functions like
isScalarRegister/isVectorRegister, as well as
isSrcOperand/isDstOperand. Also, the op info is only
generated once this way instead of every time it's needed.

Change-Id: I8af5080502ed08ed9107a441e2728828f86496f4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42211
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2021-04-01 02:58:31 +00:00
Tony Gutierrez
0e2564a629 arch-gcn3, gpu-compute: Update getRegisterIndex() API
This change removes the GPUDynInstPtr argument from
getRegisterIndex(). The dynamic inst was only needed
to get access to its parent WF's state so it could
determine the number of scalar registers the wave was
allocated. However, we can simply pass the number of
scalar registers directly. This cuts down on shared
pointer usage.

Change-Id: I29ab8d9a3de1f8b82b820ef421fc653284567c65
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42210
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2021-04-01 02:58:31 +00:00
Tony Gutierrez
236b4a502f gpu-compute: Add operand info class to GPUDynInst
This change adds a class that stores operand register info
for the GPUDynInst. The operand info is calculated when the
instruction object is created and stored for easy access
by the RF, etc.

Change-Id: I3cf267942e54fe60fcb4224d3b88da08a1a0226e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42209
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2021-04-01 02:58:31 +00:00
Gabe Black
3f67faec83 arch,dev,gpu-compute,sim: Rename isa_traits.hh page_size.hh.
The only thing left in isa_traits.hh are two constants, one for the
number of bytes in a page, and one for how far to shift an address to
get the page number. To make it clear that this is the only thing
isa_traits.hh should be used for from this point forward (until it is
entirely eliminated), this change renames it to the much less generic
page_size.hh.

Also, because isa_traits.hh used to have *much* more stuff in it, it was
included in a lot of places it didn't need to be. This change also
clears out all these legacy includes while updating the actually needed
ones to the new name.

Change-Id: I939b01b117c53d620b6b0a98982f6f21dc2ada72
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40179
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-03-30 10:17:48 +00:00
Kyle Roarty
c9415dc389 gpu-compute: Remove unused functions
These functions were probably used for some stat collection,
but they're no longer used, so they're being removed

Change-Id: Ic99f22391c0d5ffb0e9963670efb35e503f9957d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42202
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2021-03-25 17:21:16 +00:00
Michael LeBeane
25e8a14a6b gpu-compute: Support dynamic scratch allocations
dGPUs in all versions of ROCm and APUs starting with ROCM 2.2 can
under-allocate scratch resources.  This patch adds support for
the CP to trigger a recoverable error so that the host can attempt to
re-allocate scratch to satisfy the currently stalled kernel.

Note that this patch does not include a mechanism to handle dynamic
scratch allocation for queues with in-flight kernels, as these queues
would first need to be drained and descheduled, which would require some
additional effort in the hsaPP and HW queue scheduler.  If the CP
encounters this scenerio it will assert.  I suspect this is not a
particularly common occurence in most of our applications so it is left
as a TODO.

This patch also fixes a few memory leaks and updates the old DMA callback
object interface to use a much cleaner c++11 lambda interface.

Change-Id: Ica8a5fc88888283415507544d6cc49fa748fe84d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42201
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2021-03-25 17:21:08 +00:00
Daniel R. Carvalho
7f1de4e686 misc: Fix coding style for enum's opening braces
The systemc dir was not included in this fix.

First it was identified that there were only occurrences
at 0, 1, and 2 levels of indentation (and 2 of 2 spaces,
1 of 3 spaces and 2 of 12 spaces), using:

    grep -nrE --exclude-dir=systemc \
        "^ *enum [A-Za-z].* {$" src/

Then the following commands were run to replace:

    <indent level>enum X ... {

by:

    <indent level>enum X ...
    <indent level>{

Level 0:
    grep -nrl --exclude-dir=systemc \
        "^enum [A-Za-z].* {$" src/ | \
        xargs sed -Ei \
        's/^enum ([A-Za-z].*) \{$/enum \1\n\{/g'

Level 1:
    grep -nrl --exclude-dir=systemc \
        "^    enum [A-Za-z].* {$" src/ | \
        xargs sed -Ei \
        's/^    enum ([A-Za-z].*) \{$/    enum \1\n    \{/g'

and so on.

Change-Id: Ib186cf379049098ceaec20dfe4d1edcedd5f940d
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/43326
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-03-23 16:26:04 +00:00
Kyle Roarty
f5383a5733 gpu-compute: Fix accidental execution when stopped at barrier
Due the compute unit pipeline being executed in reverse order, there
exists a scenario where a compute unit will execute an extra
instruction when it's supposed to be stopped at a barrier. It occurs
as follows:

* The ScheduleStage sets a barrier instruction ready to execute.

* The ScoreboardCheckStage adds another instruction to the readyList.
This is where the barrier is checked, but because the barrier isn't
executing yet, the instruction can be passed along to ScheduleStage

* The barrier executes, and stalls

* The ScheduleStage sees that there's a new instruction and schedules
it to be executed.

* Only now will the ScoreboardCheckStage realize a barrier is active
and stall accordingly

* The subsequent instruction executes

This patch sets the wavefront status to be S_BARRIER in ScheduleStage
instead of in the barrier instruction execution in order to have
ScoreboardCheckStage realize that we're going to execute a barrier,
preventing it from marking another instruciton as ready.

Change-Id: Ib683e2c68f361d7ee60a3beaf53b4b6c888c9f8d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41573
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-03-04 17:37:19 +00:00
Kyle Roarty
a9e0a1ccf1 gpu-compute: Explicitly set driver to nullptr in constructor
We have a fail_if in attachDriver to prevent driver from being
overwritten. However, the fail_if only checks for if the driver
is not nullptr.

Previously, in some cases driver was set to garbage, which made
the fail_if trip the first time we were assigning the driver.

This patch explicitly sets driver to nullptr in the constructor, thus
ensuring that it will be nullptr the first time we call attachDriver

Change-Id: I325f6033e785025a912e3af3888c66cee0332f40
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41973
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-03-01 18:10:11 +00:00
Giacomo Travaglini
41928dac80 misc: Remove unused params() definitions
Lots of times the params() helper has been defined but not used

Change-Id: Id71829aca71341d46964d8f071099342b946b62f
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41613
Tested-by: kokoro <noreply+kokoro@google.com>
2021-02-19 23:27:34 +00:00
Alexander Klimov
92ba3ba843 misc: Use PARAMS
The patch is using the newly defined PARAMS macro to replace
custom params() getters in derived class.

The patch is also removing redundant _params:
Instead of creating yet another _params field, SimObject descendants
should use params() to expose the real type of SimObject::_params they
already have.

Change-Id: I43394cebb9661fe747bdbb332236f0f0181b3dba
Signed-off-by: Alexander Klimov <Alexander.Klimov@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39900
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-02-19 23:27:34 +00:00
Gabe Black
9a0b79459d misc: Fix mismatched struct/class "tags" and reenable that warning.
The mismatches were from places where Params structs had been declared
as classes instead of structs, and ruby's MachineID struct.

A comment describing why the warning had been disabled said that it was
because of libstdc++ version 4.8. As far as I can tell, that version is
old enough to be outside the window we support, and so that should no
longer be a problem. It looks like the oldest version of gcc we
support, 5.0, corresponds with approximately libstdc++ version 6.0.21.

https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html#abi.versioning

Change-Id: I75ad92f3723a1883bd47e3919c5572a353344047
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40953
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-02-19 08:29:00 +00:00
Alexandru Dutu
14d6e8fac4 arch-gcn3: Implementation of s_sleep
This changeset implements the s_sleep instruction in a similar
way to s_waitcnt.

Change-Id: I4811c318ac2c76c485e2bfd9d93baa1205ecf183
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39115
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-02-04 00:07:10 +00:00
Bobby R. Bruce
a4345ff324 gpu-compute,misc: Remove unused private variable
Clang 9 fails to compile GCN3 due to the unused private variable,
`_nxtFreeIdx`, in `src/gpu-compute/dyn_pool_manager.hh`. This variable
has therefore been removed.

Change-Id: I33f2e9634bbf8d5cea7a42ae2ac9f3ea8298d406
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40397
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-02-03 19:08:02 +00:00
Bobby R. Bruce
ae33daa8d7 gpu-compute,misc: Fix Clang missing override errors
Clang fails to compile GCN3 due to missing overrides in
`src/gpu-compute/gpu_command_processor.hh`. This commit fixes this
errror.

Change-Id: I6da9fce7c3eb86a5418a931ee4f225cceda488a5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40396
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-02-03 19:08:02 +00:00
Gabe Black
fc4caa6ad0 misc: Re-remove Authors lines from source files.
These were universally removed a while ago, but a bunch have crept back
in. Remove them.

Change-Id: I3cb5b9f40c9c19aafb5e39a51d1baeae60a591c0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40335
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
2021-02-03 12:55:17 +00:00
Sooraj Puthoor
965ad12b9a dev-hsa: enable interruptible hsa signal support
Event creation and management support from emulated drivers is required
to support interruptible signals in HSA and this support was not
available. This changeset adds the event creation and management support
in the emulated driver.  With this patch, each interruptible signal
created by the HSA runtime is associated with a signal event. The HSA
runtime can then put a thread waiting on a signal condition to sleep
asking the driver to monitor the event associated with that signal. If
the signal is modified by the GPU, the dispatcher notifies the driver
about signal value change.  If the modifier is a CPU thread, the thread
will have to make HSA API calls to modify the signal and these API calls
will notify the driver about signal value change. Once the driver is
notified about a change in the signal value, the driver checks to see if
any thread is sleeping on that signal and wake up the sleeping thread
associated with that event. The driver has also implemented the time_out
wakeup that can wake up the thread after a certain time period has
expired. This is also true for barrier packets.

Each signal has an event address in a kernel managed and allocated
event page that can be used as a mailbox pointer to notify an event.
However, this feature used by non-CPU agents to communicate with the
driver is not implemented by this changeset because the non-CPU HSA
agents in our model can directly communicate with driver in our
implementation. Having said that, adding that feature should be trivial
because the event address and event pages are correctly setup by this
changeset and just adding the event page's virtual address to our PIO
doorbell interface in the page tables and registering that pio address
to the driver should be sufficient. Managing mailbox pointer for an
event is based on event ID and using this event ID as an index into
event page, this changeset already provides a unique mailbox pointer for
each event.

Change-Id: Ic62794076ddd47526b1f952fdb4c1bad632bdd2e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38335
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-01-31 03:25:05 +00:00
Kyle Roarty
68fe6fa6cb gpu-compute: Simplify LGKM decrementing for Flat instructions
This commit makes it so LGKM count is decremented in a single place
(after completeAcc), which fixes a couple of potential bugs

1. Data is only written by completeAcc, not after initiateAcc. LGKM
count is supposed to be decremented after data is written.
2. LGKM count is now properly decremented for atomics without return

Change-Id: Ic791af3b42e04f7baaa0ce50cb2a2c6286c54f5a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39396
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-01-21 21:41:05 +00:00
Matthew Poremba
5323cccfdd arch-gcn3,gpu-compute: Update stats style for GPU
Convert all gpu-compute stats to Stats::Group style.

Change-Id: I29116f1de53ae379210c6cfb5bed3fc74f50cca5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39135
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-01-18 17:58:05 +00:00
Kyle Roarty
a4657f1b84 gpu-compute: Fix LGKM decrementing for flat atomic insts
A prior commit (f6ec145fc0) fixed early LGKM decrementing for flat loads
and stores, but failed to address flat atomics.

Per the GCN3 ISA, LGKM count is decremented on flat atomics with return
when the data has been returned. This patch checks if the flat
instruction is an atomic with return, and decrements LGKM count if so.

Change-Id: I5c0c2c205a8b21327d4c42ba71c59842c15bd63b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39155
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-01-15 06:00:06 +00:00
gauravjain14
c29523665e gpu-compute: Support for dynamic register alloc
SimplePoolManager doesn't allow mapping of two WGs
simultaneously on the same Compute Unit (provided
the previous WG has been mapped to all the SIMDs)
even if there is sufficient VRF and SRF space
available.

DynPoolManager takes care of that by dynamically
allocating and deallocating register file space
to wavefronts

Change-Id: I2255c68d4b421615d7b231edc05d3ebb27cbd66c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32034
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>
2021-01-14 17:04:27 +00:00
Gabe Black
c6933a27da misc: Fix missing includes.
Change-Id: I545ff03041e8fe66dc489c6aa95c009e54df0970
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38995
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-01-13 08:55:59 +00:00
Kyle Roarty
f6ec145fc0 gpu-compute: Fix FLAT insts decrementing lgkm count early
FLAT instructions used to decrement lgkm count on execute, while the
GCN3 ISA specifies that lgkm count should be decremented on data being
returned or data being written.

This patch changes it so that lgkm is decremented after initiateAcc (for
stores) and after completeAcc (for loads) to better reflect the ISA
definition.

This fixes a bug where waitcnts would be satisfied even though the
memory access wasn't completed, which lead to instructions using the
wrong data.

Change-Id: I596cb031af9cda8d47a1b5e146e4a4ffd793d36c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38696
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-01-07 17:12:31 +00:00
Kyle Roarty
e49a072c7d gpu-compute: Use dict.get syntax for accessing buildEnv keys
37775 removed SmartDict, which is the type buildEnv used to be.
Because of that change, doing buildEnv[key] with a key not in the dict
returns KeyError instead of False. By using buildEnv(key, False), we are
able to return False when the key isn't in the dict.

Change-Id: I4aae29b95b082efb2b021f21d608f9cd1c196379
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38135
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-12-01 19:19:52 +00:00
Kyle Roarty
0bb385941b gpu-compute: Add exp_cnt tracking for buffer store instructions
exp_cnt (expInstsIssued in the code) is used in the waitcnt instruction
to track that data has been read out of VGPRs in previous global
memory instructions, making it safe to overwrite the VGPRs used in said
global memory instructions.

Previously, exp_cnt wasn't being tracked at all, which lead to the
waitcnt finishing immediately, leading to the memory instruction's VPGRs
getting overwritten by subsequent instructions, causing errors.

This patch makes it so waitcnts waiting on exp_cnt will wait for MUBUF
buffer store instructions to read their VGPRs before completing

Change-Id: Idd2b59511bc086cf316217da27b7a228272b0b0f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37555
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-11-30 20:59:31 +00:00
Daniel Gerzhoy
9a01d3e927 dev-hsa,gpu-compute: Agent Packet handler implemented.
HSA packet processor will now accept and process agent packets.

Type field in packet is command type.
For now:
        AgentCmd::Nop = 0
        AgentCmd::Steal = 1

Steal command steals the completion signal for a running kernel.
This enables a benchmark to use hsa primitives to send an agent
packet to steal the signal, then wait on that signal.

Minimal working example to be added in gem5-resources.

Change-Id: I37f8a4b7ea1780b471559aecbf4af1050353b0b1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37015
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-11-16 16:12:48 +00:00
Tuan Ta
173c1c6eb0 gpu-compute,mem-ruby: Replace ACQUIRE and RELEASE request flags
This patch replaces ACQUIRE and RELEASE flags which are HSA-specific.
ACQUIRE flag becomes INV_L1 in VIPER protocol. RELEASE flag is removed.
Future protocols may support extra cache coherence flags like INV_L2 and
WB_L2.

Change-Id: I3d60c9d3625c898f4110a12d81742b6822728533
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32859
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-11-04 21:09:26 +00:00
Gabe Black
d05a0a4ea1 misc: Delete the now unnecessary create methods.
Most create() methods are no longer necessary. This change deletes them,
and occasionally moves some code from them into the constructors they
call.

Change-Id: Icbab29ba280144b892f9b12fac9e29a0839477e5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36536
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-10-30 04:00:20 +00:00
Gabe Black
3a49ed0156 gpu: Use X86ISA instead of TheISA in src/gpu-compute.
These files are nominally not tied to the X86ISA, but in reality they
are because they reach into the GPU TLB, which is defined unchangeably in
the X86ISA namespaces, and uses data structures within it. Rather than try
to pretend that these structures are generic, we'll instead just use X86ISA
instead of TheISA. If this really does become generic in the future, a
base class with the ISA agnostic essentials defined in it can be used
instead, and the ISA specific TLBs can defined their own derived class
which has whatever else they need. Really the compute unit shouldn't be
communicating with the TLB using sender state since those are supposed
to be little notes for the sender to keep with a transaction, not for
communicating between entities across a port.

Change-Id: Ie6573396f6c77a9a02194f5f4595eefa45d6d66b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34174
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-10-26 20:32:43 +00:00
Gabe Black
463cb28ca5 misc: Use compiler.hh macros when available.
Some places were hand coding __attribute__s when macros in compiler.hh
were available to do that job. Using the macros helps abstract away
compiler specific details and should be used when possible.

Change-Id: I94befebcfde2d673e874e9959588f69781bd9021
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35975
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-10-19 05:52:40 +00:00
Kyle Roarty
b20cc7e6d8 gpu-compute,mem-ruby: Properly create/handle WriteCompletePkts
There is a flow of packets as so:
WriteResp -> WriteReq -> WriteCompleteResp

These packets share some variables, in particular senderState and a
status vector.

One issue was the WriteResp packet decremented the status vector, which
was used by the WriteCompleteResp packets to determine when to handle
the global memory response. This could lead to multiple
WriteCompleteResp packets attempting to handle the global memory
response.

Because of that, the WriteCompleteResp packets needed to handle the
status vector. this patch moves WriteCompleteResp packet handling back
into ComputeUnit::DataPort::processMemRespEvent from
ComputeUnit::DataPort::recvTimingResp. This helps remove some redundant
code.

This patch has the WriteResp packet return without doing any status
vector handling, and without deleting the senderState, which had
previously caused a segfault.

Another issue was WriteCompleteResp packets weren't being issued for
each active lane, as the coalesced request was being issued too early.
In order to fix that, we have to ensure every active lane puts their
request into their applicable coalesced request before issuing the
coalesced request. Because of that change, we change the issuing of
CoalescedRequests from GPUCoalescer::coalescePacket to
GPUCoalescer::completeIssue.

That change involves adding a new variable to store the
CoalescedRequests that are created in the calls to coalescePacket. This
variable is a map from instruction sequence number to coalesced
requests.

Additionally, the WriteCompleteResp packet was attempting to access
physical memory in hitCallback while not having any data, which
caused a crash. This can be resolved either by not allowing
WriteCompleteResp packets to access memory, or by copying the data
from the WriteReq packet. This patch denies WriteCompleteResp packets
memory access in hitCallback.

Finally, in VIPERCoalescer::writeCompleteCallback there was a map
that held the WriteComplete packets, but no packets were ever being
removed. This patch removes packets that match the address that was
passed in to the function.

Change-Id: I9a064a0def2bf6c513f5295596c56b1b652b0ca4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33656
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-10-15 17:52:51 +00:00
Gabe Black
91d83cc8a1 misc: Standardize the way create() constructs SimObjects.
The create() method on Params structs usually instantiate SimObjects
using a constructor which takes the Params struct as a parameter
somehow. There has been a lot of needless variation in how that was
done, making it annoying to pass Params down to base classes. Some of
the different forms were:

const Params &
Params &
Params *
const Params *
Params const*

This change goes through and fixes up every constructor and every
create() method to use the const Params & form. We use a reference
because the Params struct should never be null. We use const because
neither the create method nor the consuming object should modify the
record of the parameters as they came in from the config. That would
make consuming them not idempotent, and make it impossible to tell what
the actual simulation configuration was since it would change from any
user visible form (config script, config.ini, dot pdf output).

Change-Id: I77453cba52fdcfd5f4eec92dfb0bddb5a9945f31
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35938
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-10-14 12:06:44 +00:00
Matthew Poremba
53807c8276 configs,gpu-compute: Fixes to connect gmTokenPort
When the TokenPort was moved from the GCN3 staging branch to develop the
TokenPort was changed from being the port connecting the ComputeUnit to
Ruby's vector memory port to a sideband port which inhibits requests to
Ruby's vector memory port. As such, it needs to be explicitly connected
as a new port. This changes the getPort method in ComputeUnit to be
aware of the port as well as modifying the example config to connect to
TCPs.

The iteration to connect in the config file was modified since it was
not properly connecting to TCPs each time and Ruby.py does not
explicitly return a list of each MachineType.

Change-Id: Ia70a6756b2af54d95e94d19bec5d8aadd3c2d5c0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35096
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-30 20:19:21 +00:00
Gabe Black
b877efa6d4 misc: Update attribute syntax, and reorganize compiler.hh.
This change replaces the __attribute__ syntax with the now standard [[]]
syntax. It also reorganizes compiler.hh so that all special macros have
some explanatory text saying what they do, and each attribute which has a
standard version can use that if available and what version of c++ it's
standard in is put in a comment.

Also, the requirements as far as where you put [[]] style attributes are
a little more strict than the old school __attribute__ style. The use of
the attribute macros was updated to fit these new, more strict
requirements.

Change-Id: Iace44306a534111f1c38b9856dc9e88cd9b49d2a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35219
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-28 21:52:59 +00:00
Gabe Black
50a0b85367 arm,base,gpu: Use std::make_unique instead of m5::make_unique.
Now that we're using c++14, we can just assume that std::make_unique
exists. We no longer have to conditionally inject our own version.

Change-Id: I5d851afb02dd05c7af93864ffec3b3184f3d4ec8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35215
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-28 05:41:08 +00:00
Kyle Roarty
347d7644eb gpu-compute: replace uint32_t* casts with bits API calls
The uint32_t* casting was challenging to fully understand what was
being done at a glance. Replaced with calls to various bits functions
as it's functionally equivalent and much more clear.

This also fixes a segfault in GPUInitAbi DPRINTFs from a mis-typed
uint32_t* cast.

Change-Id: Id5d1863942848dd7a9e5e17e8180c33adbc72f15
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34677
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-24 14:53:16 +00:00
Gabe Black
24e87cb1c5 gpu: Stop using TheISA in the GPU TLB.
This class is defined inside the X86ISA namespace, so there's no point
in pretending it's generic. Remove TheISA and let the code access what
it needs from X86ISA naturally since it's there already.

Change-Id: I21b5d2d2b9af6aa0c10ddbb5b3ddca1692188dcc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34173
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2020-09-18 13:48:45 +00:00
Kyle Roarty
be3bcd1629 gpu-compute: Fix deadlock in fetch_unit after branch instruction
The following deadlock was occuring in fetch_unit w/timingSim:
1. exec() is called, a wave is ready to fetch, so it sets pendingFetch
2. A packet is sent to ITLB to fetch for that wave
3. The wave executes a branch, causing the fetch buffer to be cleared
4. The packet is handled, and fetch() is called. However, because the
fetch buffer was cleared, it returns doing nothing.
5. exec() gets called again, but the wave will never be scheduled to
fetch, as pendingFetch is still set to true.

This patch clears pendingFetch (and dropFetch) before returning in fetch()
when the fetch buffer has been cleared.

dropFetch needed to be cleared otherwise gem5 would crash.

Change-Id: Iccbac7defc4849c19e8b17aa2492da641defb772
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34555
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-17 21:24:19 +00:00
Gabe Black
49a41da964 gpu: Fix a syntax error in X86GPUTLB.py.
The recent changes which removed master/slave terminology also
accidentally deleted an "=", making the syntax in that file illegal.

Change-Id: I50aa945f0f66765db36775380b98a88caff23c13
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34576
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-16 06:08:14 +00:00
Shivani Parekh
392c1ced53 misc: Replaced master/slave terminology
Change-Id: I4df2557c71e38cc4e3a485b0e590e85eb45de8b6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33553
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-10 23:02:28 +00:00
Kyle Roarty
b00b986353 misc: Use VPtr in hsa_driver.cc
This change updates HSADriver::allocateQueue to take in a ThreadContext
pointer as opposed to a PortProxy ref. This allows the TypedBufferArg
to be replaced with VPtr.

This also fixes building GCN3_X86

Change-Id: I1fea26b10c7344daf54a0cb05337e961f834a5fd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33655
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-08-31 17:44:11 +00:00
Gabe Black
1d755b4ba1 misc: Clean up usage of arch/isa_traits.hh.
isa_traits.hh used to have much more in it, but now it only has
PageShift, PageBytes, and (for now) the guest endianness. These values
should only be retrieved from the System class generally speaking, so
only the system class should include arch/isa_traits.hh.

Some gpu compute related files need PageBytes or PageShift. Even though
those files don't advertise their ISA dependence, they are tied to x86.
In those files, they can include arch/x86/isa_traits.hh.

The only other file which legitimately needs arch/isa_traits.hh is the
decoder cache since it uses PageBytes to size an array.

Change-Id: I12686368715623e3140a68a7027c136bd52567b1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33203
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-08-28 07:20:58 +00:00
Tony Gutierrez
94000aefe6 gpu-compute: Create CU's ports in the standard way
The CU would initialize its ports in getMasterPort(), which
is not desirable as getMasterPort() may be called several
times for the same port. This can lead to a fatal if the CU
expects to only create a single port of a given type, and may
lead to other issues where stat names are duplicated.

This change instantiates and initializes the CU's ports in the
CU constructor using the CU params.

The index field is also removed from the CU's ports because the
base class already has an ID field, which will be set to the
default value in the base class's constructor for scalar ports.

It doesn't make sense for scalar port's to take an index because
they are scalar, so we let the base class initialize the ID to
the invalid port ID.

Change-Id: Id18386f5f53800a6447d968380676d8fd9bac9df
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32836
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-08-27 16:31:46 +00:00
Emily Brickey
6333e914d3 gpu-compute: update port terminology
Change-Id: I3121c4afb1e137aebe09c1d694e9484844d02b9b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32313
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Poremba <chesp3@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-08-26 16:48:13 +00:00
Kyle Roarty
b872f02ab1 configs,gpu-compute,mem-ruby: connect gmTokenPorts in apu_se
This patch adds gmTokenPorts to the ComputeUnit and RubyGPUCoalescer
python classes so the gmTokenPorts can be connected in apu_se.

Change-Id: Icf3cb05c757754d6935b46f14e4b1b1d5072c4ca
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32677
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-08-18 23:47:16 +00:00
Gabe Black
40e8cac306 misc: Make registerExitCallback use CallbackQueue2.
Issue-on: https://gem5.atlassian.net/browse/GEM5-698
Change-Id: I526d4a19ca4e54a6469a4ee26693c1c0400fcc70
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32644
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-08-18 11:49:06 +00:00
Matthew Poremba
9b95f32b12 arch-gcn3,gpu-compute: Fix GCN3 related compiler errors
Fix all errors that were revealed using the util/compiler-test.sh
script.

Change-Id: Ie0d35568624e5e1405143593f0677bbd0b066b61
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31154
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-20 14:53:13 +00:00
Tony Gutierrez
4d737462c2 gpu-compute, arch-gcn3: Change how waitcnts are implemented
Use single counters per memory operation type and increment
them upon issue, not execute.

Change-Id: I6afc0b66b21882538ef90a14a57a3ab3cc7bd6f3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29973
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 16:36:23 +00:00
Tony Gutierrez
63c76448eb gpu-compute: Add pipeline stage interface classes
This change separates the pipeline stage interfaces
for the GPU's compute unit into their own classes
with a well-defined interface. This helps to create
a cleaner interface for users to extend the CU
pipeline's capabilities and also helps consolidate
all the pipeline communication code in one place
in the source.

Change-Id: I569d52bce84dc1b9fbf8f0f96d53a81a2b6773c6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29972
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 16:36:09 +00:00
Alexandru Dutu
7d50d5d972 gpu-compute: No RF scheduling in case of SKIP or EMPTY
In case of flat memory instructions the status for the
LM pipe execution unit is set to SKIP or EMPTY, as the bus
between the VRF and the GM and LM pipe is shared. The
destination operands should not be scheduled for the LM pipe,
event if the wave is in the dispatch list. This can lead
to deadlock in the destination cache as DCEs are reused
and the slotsAvailableForBank count gets artificially
incremented.

Change-Id: I2230c53e3bc1032d2cccbe00fab62c99ab8de6cd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29970
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:34:59 +00:00