Commit Graph

65 Commits

Author SHA1 Message Date
Daniel R. Carvalho
974a47dfb9 misc: Adopt the gem5 namespace
Apply the gem5 namespace to the codebase.

Some anonymous namespaces could theoretically be removed,
but since this change's main goal was to keep conflicts
at a minimum, it was decided not to modify much the
general shape of the files.

A few missing comments of the form "// namespace X" that
occurred before the newly added "} // namespace gem5"
have been added for consistency.

std out should not be included in the gem5 namespace, so
they weren't.

ProtoMessage has not been included in the gem5 namespace,
since I'm not familiar with how proto works.

Regarding the SystemC files, although they belong to gem5,
they actually perform integration between gem5 and SystemC;
therefore, it deserved its own separate namespace.

Files that are automatically generated have been included
in the gem5 namespace.

The .isa files currently are limited to a single namespace.
This limitation should be later removed to make it easier
to accomodate a better API.

Regarding the files in util, gem5:: was prepended where
suitable. Notice that this patch was tested as much as
possible given that most of these were already not
previously compiling.

Change-Id: Ia53d404ec79c46edaa98f654e23bc3b0e179fe2d
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46323
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-07-01 19:08:24 +00:00
Gabe Black
0dade68dae arch,cpu,gpu-compute: Further simplify VecRegContainer.
Get rid of VecRegT, and a few redundant or unused methods.

Change-Id: I6c88c40653e1939fe74b8ffb847ef50ab8064670
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41995
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-04-10 07:31:23 +00:00
Tony Gutierrez
236b4a502f gpu-compute: Add operand info class to GPUDynInst
This change adds a class that stores operand register info
for the GPUDynInst. The operand info is calculated when the
instruction object is created and stored for easy access
by the RF, etc.

Change-Id: I3cf267942e54fe60fcb4224d3b88da08a1a0226e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42209
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2021-04-01 02:58:31 +00:00
Kyle Roarty
de134bae21 arch-gcn3: Modify directory structure as prep for adding vega isa
Change-Id: I7c5f4a3a9d82ca4550e833dec2cd576dbe333627
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42203
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2021-03-29 21:21:35 +00:00
Daniel R. Carvalho
b2c0b191e1 misc: Fix coding style for union's opening braces
The systemc dir was not included in this fix.

First it was identified that there were only occurrences
at 0, 1, 2 and 3 levels of indentation, using:

    grep -nrE --exclude-dir=systemc \
        "^ *union [A-Za-z].* {$" src/

Then the following commands were run to replace:

    <indent level>union X ... {

by:

    <indent level>union X ...
    <indent level>{

Level 0:
    grep -nrl --exclude-dir=systemc \
        "^union [A-Za-z].* {$" src/ | \
        xargs sed -Ei \
        's/^union ([A-Za-z].*) \{$/union \1\n\{/g'

Level 1:
    grep -nrl --exclude-dir=systemc \
        "^    union [A-Za-z].* {$" src/ | \
        xargs sed -Ei \
        's/^    union ([A-Za-z].*) \{$/    union \1\n    \{/g'

and so on.

Change-Id: I066854eb27a8acd2cc2dfa41596bb1b1f66c71b1
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/43328
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
2021-03-23 16:26:04 +00:00
Daniel R. Carvalho
2922f763e1 misc: Fix coding style for struct's opening braces
The systemc dir was not included in this fix.

First it was identified that there were only occurrences
at 0, 1, 2 and 3 levels of indentation (and a single
occurrence of 2 and 3 spaces), using:

    grep -nrE --exclude-dir=systemc \
        "^ *struct [A-Za-z].* {$" src/

Then the following commands were run to replace:

<indent level>struct X ... {

by:

<indent level>struct X ...
<indent level>{

Level 0:
    grep -nrl --exclude-dir=systemc
        "^struct [A-Za-z].* {$" src/ | \
        xargs sed -Ei \
        's/^struct ([A-Za-z].*) \{$/struct \1\n\{/g'

Level 1:
    grep -nrl --exclude-dir=systemc \
        "^    struct [A-Za-z].* {$" src/ | \
        xargs sed -Ei \
        's/^    struct ([A-Za-z].*) \{$/    struct \1\n    \{/g'

and so on.

Change-Id: I362ef58c86912dabdd272c7debb8d25d587cd455
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39017
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-03-19 20:57:24 +00:00
Kyle Roarty
f5383a5733 gpu-compute: Fix accidental execution when stopped at barrier
Due the compute unit pipeline being executed in reverse order, there
exists a scenario where a compute unit will execute an extra
instruction when it's supposed to be stopped at a barrier. It occurs
as follows:

* The ScheduleStage sets a barrier instruction ready to execute.

* The ScoreboardCheckStage adds another instruction to the readyList.
This is where the barrier is checked, but because the barrier isn't
executing yet, the instruction can be passed along to ScheduleStage

* The barrier executes, and stalls

* The ScheduleStage sees that there's a new instruction and schedules
it to be executed.

* Only now will the ScoreboardCheckStage realize a barrier is active
and stall accordingly

* The subsequent instruction executes

This patch sets the wavefront status to be S_BARRIER in ScheduleStage
instead of in the barrier instruction execution in order to have
ScoreboardCheckStage realize that we're going to execute a barrier,
preventing it from marking another instruciton as ready.

Change-Id: Ib683e2c68f361d7ee60a3beaf53b4b6c888c9f8d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41573
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-03-04 17:37:19 +00:00
Kyle Roarty
dd270656f0 arch-gcn3: Fix sign extension for branches with multiplied offset
Certain branch instructions specify that the result of (simm16 * 4)
gets sign-extended before being added to the PC.

Previously, that result was being sign extended as if it was still a
16-bit number. This patch fixes that by having the result be sign
extended as an 18-bit number.

Change-Id: Id4d430f8daa71ca7910b570e7e39790626f1decf
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41053
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-02-11 20:08:25 +00:00
Alexandru Dutu
14d6e8fac4 arch-gcn3: Implementation of s_sleep
This changeset implements the s_sleep instruction in a similar
way to s_waitcnt.

Change-Id: I4811c318ac2c76c485e2bfd9d93baa1205ecf183
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39115
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-02-04 00:07:10 +00:00
Gabe Black
fc4caa6ad0 misc: Re-remove Authors lines from source files.
These were universally removed a while ago, but a bunch have crept back
in. Remove them.

Change-Id: I3cb5b9f40c9c19aafb5e39a51d1baeae60a591c0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40335
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
2021-02-03 12:55:17 +00:00
Matthew Poremba
5323cccfdd arch-gcn3,gpu-compute: Update stats style for GPU
Convert all gpu-compute stats to Stats::Group style.

Change-Id: I29116f1de53ae379210c6cfb5bed3fc74f50cca5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39135
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-01-18 17:58:05 +00:00
Kyle Roarty
9e1f543407 arch-gcn3: Explicitly sign-extend simm16
In some instructions, simm16 needs to be sign extended. Previous code
simply casted the simm16 to a 32-bit or 64-bit datatype, however this
didn't actually sign-extend the value.

This patch explicitly calls sext<16> on simm16 whenever it's supposed
to be sign-extended.

Change-Id: I32f02e51fbab220d1a73dc7e68c7410937db21c7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37495
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-11-17 17:08:24 +00:00
Kyle Roarty
5f6ebe752e arch-gcn3: Implement flat_load_sbyte instruction
Change-Id: I3aa7547a393b9ecb4b3d4d107394c54d690a0ac2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37476
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-11-17 17:08:24 +00:00
Kyle Roarty
c1ddd01b66 arch-gcn3: Implement s_setreg_imm32_b32 instruction
Change-Id: I5383243403156dc17d4997106085a62fb0483fec
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37475
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-11-17 17:08:24 +00:00
Kyle Roarty
a82ea84244 arch-gcn3: Fix operand size reporting for Flat insts
Some Flat instructions were reporting their operand sizes in bits
instead of bytes. This lead to panics occuring in
StaticRegisterManagerPolicy::mapVgpr.

This patch updates those insts to report their operand sizes in bytes.

Change-Id: I48f485e638864a1f2a1a3be66ed20893e73e9705
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36275
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-11-06 00:31:52 +00:00
Gabe Black
74005aa8d6 misc: Replace enable_if<>::type with enable_if_t<>.
This new abreviated form was added for C++14. Now that we're using that
version of the standard, we can move over to it.

Change-Id: Ia291d2b1e73e503c37593b1e1c4c1b3011abc63b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36477
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-10-23 12:59:59 +00:00
Kyle Roarty
45f57ff2c2 gpu-compute: set exec_mask for permute,bpermute instructions
This change sets gpuDynInst->exec_mask for permute and bpermute
instructions, fixing a bug where they would never write their data.

permute and bpermute instructions are load instructions that write
to a VGPR. Because of that, they use gpuDynInst->exec_mask when
checking what lanes should write to the VGPR.

gpuDynInst->exec_mask gets set to wf->execMask() as that is what other
load instructions that write to VGPRs do.

Change-Id: Ie443283488cbd2ab9c17fc255e7cc44418353419
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35036
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-24 17:56:04 +00:00
Shivani Parekh
392c1ced53 misc: Replaced master/slave terminology
Change-Id: I4df2557c71e38cc4e3a485b0e590e85eb45de8b6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33553
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-10 23:02:28 +00:00
Kyle Roarty
0983929a24 arch-gcn3: Update LmReqsInPipe in atomic flats when execMask=0
In flat instructions, wrLmReqsInPipe/rdLmReqsInPipe are decremented
in the calcAddr() function. However, the calcAddr() function is only
called when execMask != 0.

This patch adds in statements to decrement wrLmReqsInPipe and
rdLmReqsInPipe in all implemented atomic flats when execMask is 0.

This fixes a scenario where vector local memory and flat instructions
are unable to execute due to LocalMemPipeline::isLMReqFIFOWrRdy
always returning false in ScheduleStage::dispatchReady after too many
atomic flats execute with execMask = 0

Change-Id: I081cfd3faf74bbfcf0728445e7160fa2a76a6a7e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32614
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-08-27 20:46:28 +00:00
Kyle Roarty
62ec973244 arch-gcn3: Free registers when execMask = 0
Flat instructions free some of their registers through their memory
requests, in particuar a call to scheduleWriteOperandsFromLoad(),
which gets called from GlobalMemPipeline::exec.

When execMask is 0, the instruction doesn't issue a memory request.

This patch adds in a call to scheduleWriteOperandsFromLoad() when
execMask is 0 for Flat Load and AtomicReturn instructions, as those
are the instructions that call scheduleWriteOperandsFromLoad()
in the memory pipeline.

This patch also adds in a missing return statement when execMask is 0
in one of the Flat instructions.

Change-Id: I09296adb7401e7515d3cedceb780a5df4598b109
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32234
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-08-13 19:39:34 +00:00
Kyle Roarty
d542dc838e arch-gcn3: make read2st64_b32 write proper registers
Per the GCN3 ISA, read2st64_b32 writes to consecutive registers

Change-Id: Ibc1672584a72cf7de12e06068a03fe304b34dce2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32236
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>
Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-08-13 19:04:36 +00:00
Matt Sinclair
4d84590dee arch-gcn3: add support for flat atomic adds, subs, incs, decs
Add support for all missing flat atomic adds, subtracts, increments,
and decrements, including their x2 variants.

Change-Id: I37a67fcacca91a09a82be6597facaa366105d2dc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31974
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-30 23:57:02 +00:00
Matthew Poremba
9b95f32b12 arch-gcn3,gpu-compute: Fix GCN3 related compiler errors
Fix all errors that were revealed using the util/compiler-test.sh
script.

Change-Id: Ie0d35568624e5e1405143593f0677bbd0b066b61
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31154
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-20 14:53:13 +00:00
Tony Gutierrez
4d737462c2 gpu-compute, arch-gcn3: Change how waitcnts are implemented
Use single counters per memory operation type and increment
them upon issue, not execute.

Change-Id: I6afc0b66b21882538ef90a14a57a3ab3cc7bd6f3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29973
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 16:36:23 +00:00
Chow, Marcus
6655161037 arch-gcn3: Add case to op selector when operand is vcc_hi
Change-Id: Ib8846656e18aad04ccb8c9112bc629c69078fe36
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29971
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:35:44 +00:00
Michael LeBeane
f509fa735c arch-gcn3: Fix stride bug in buffer OOB detection logic
The out-of-range logic for buffer accesses is missing the top 4 bits of
const_stride when dealing with scratch buffers.  This can cause
perfectly valid scratch acceses to be suppressed when const_stride is
large.

Change-Id: I8f94d44c242fda26cf6dfb75db04fa3aca934b3e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29968
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 16:34:07 +00:00
Travis Boraten
4c1dc827bc arch-gcn3: Replace some instances of std::isnormal with std::fpclassify
Affected instructions: V_DIV_SCALE_F64, V_CMP_CLASS_F64,
V_CMPX_CLASS_F64 and their VOPC, VOP3, F32 variants.

These instances of std::isnormal were being used to check for
subnormal (denorms) values. std::isnormal is not specific enough.
It returns true for normal values but false for NaN, Inf, 0.0, and
subnormals. std::fpclassify returns macros for each category of
floating point numbers. Now we only catch subnormals.

Change-Id: I8d8f4452ff58de71e7c8e0b2b5e73467b532e196
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29967
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:33:26 +00:00
Travis Boraten
e1d10c3894 arch-gcn3: Fix VOP3 V_LDEXP_F64
Replaced !std::isnormal with std::fpclassify because std::isnormal
is not specific enough. !std::isnormal was incorrectly catching
NaN, Inf, 0.0, and subnormals (aka denormals), where as it was only
suppose to catch subnormals.

The return value and error handling spec of std::ldexp listed on
cppreference.com appears to match up in nearly all cases after
making these changes. If std::ldexp handled subnormals as described
in the GCN3 2016 guide, we could have used vdst[lane] = std::ldexp
and not need to check for any corner cases.

Change-Id: I4c77af77c3b7798f86d40442610cef1296a28441
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29966
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:32:56 +00:00
Travis Boraten
e4f7982e90 arch-gcn3: Fix roundNearestEven for V_RNDNE_F64 and V_RNDNE_F32
roundNearestEven is an inst_util function that RNDNE_F64 and F32
call, including both VOP1 and VOP3 formats. IEEE 754 spec says this
function should round inputs to the nearest integer but round ties
to the nearest even integer. Prior to this patch it was rounding all
inputs to nearest even, not just the ties. It was probably implemented
this way originally because the language in the ISA manual is ambiguous
although it provided the correct logic.

Fixed roundNearestEven to use the semantics originally described in
the GCN3 ISA manual.

Change-Id: I83ecb1d516fcf5bdf17e54ddf409b447a129a9a7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29964
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:32:56 +00:00
Matt Sinclair
a23ef78c91 arch-gcn3: add all s_buffer_load_dword instructions
Adds the other s_buffer_load_dword* instruction implementations to
f134a84.

Change-Id: I8d97527278900dc68c32463ea1824409ccd04e1d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29962
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:31:39 +00:00
Matthew Poremba
39f305b329 arch-gcn3: Add memcpy condition when writing EXEC_LO
Some compilers emit an error on the operand template class when writing
exec mask. Add a condition to explicitly set memcpy size argument to
32b or 64b based on the number of dwords.

Change-Id: I49b0e4a1680283e772d0a5a8efd687b31d4f1624
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29961
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 16:31:10 +00:00
Tony Gutierrez
550f0203aa arch-gcn3: Remove invalid assert when reading EXEC_LO
This assert assumed all reads to EXEC_LO would be
64b, that is, we would always read the entire EXEC
mask. This is invalid as some kernels read only
the low 32b of EXEC.

The write to EXEC_LO is also updated to handle 32b
writes.

Change-Id: Ifeb167578515bf112b1eab70bbf2201a5e936358
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29960
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:30:41 +00:00
Tony Gutierrez
72e9324ef0 arch-gcn3: Implement ds_swizzle
Change-Id: I7d188388afa16932217ae207368666a724207c52
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29958
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:13:43 +00:00
Tony Gutierrez
513e75d99a arch-gcn3: Implement s_buffer_load_dwordx16
Change-Id: I25382dcae9bb55eaf035385fa925157f25d39c20
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29957
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:13:17 +00:00
Tony Gutierrez
0c3b84fd33 arch-gcn3: Fixup DIV instructions
Adds support to handle the special cases
for GCN3 DIV instructions.

Change-Id: I18f91870e802407c93831f313ce76be053bc4230
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29956
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 16:12:58 +00:00
Chow, Marcus
b267350ee5 arch-gcn3: fixed scale,fixup,fmas f64 ops
Change-Id: Ie13794554db8a958fda1f7103ec18058fda2e66d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29955
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Tony Gutierrez
5dc5d23b79 arch-gcn3: Fix s_getpc operand information
s_getpc was currently reporting only a single operand,
and was only considering the SSRC operand. However,
this instruction' source is implicitly the PC.
Because its destination register was never tracked for
dependence checking purposes, dependence violations
are possible.

Change-Id: Ia80b8b3e24d5885f646a9ee41212a2cb35b9ffe6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29954
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Chow, Marcus
a0cfd8da6b arch-gcn3: Add handling for Inf/overflow in CVT insts
Change-Id: I0fddffdeaebd9f45fe89f44d536f80a43de63ff5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29953
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Tony Gutierrez
5c3b02de09 arch-gcn3: Add ds_bpermute and ds_permute insts
The implementation of these insts provided by this
change is based on the description provided here:

https://gpuopen.com/amd-gcn-assembly-cross-lane-operations/

Change-Id: Id63b6c34c9fdc6e0dbd445d859e7b209023f2874
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29952
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Alexandru Dutu
3aa633cc3f arch-gcn3: ds_read_u8 and ds_read_u16 fix
This changeset zero extends the destination register
for ds_read_u8 and ds_read_u16 instructions.

Change-Id: I193adadd68adf2572b59743b1504f18ad225f506
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29951
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Xianwei Zhang
fff185993a arch-gcn3: implement instruction s_setreg_b32
Instruction s_setreg_b32 was unimplemented, but is used by hipified
rodinia 'srad'. The instruction sets values of hardware internal
registers. If the instruction is writing into MODE to control
single-precision FP round and denorm modes, a simple warn will be
printed; for all other cases (non-MODE hw register or other
precisions), panic will happen.

Change-Id: Idb1cd5f60548a146bc980f1a27faff30259e74ce
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29949
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com>
2020-07-16 20:37:22 +00:00
Matt Sinclair
1836d58b36 arch-gcn3: add support for v_mbcnt_hi and v_mbcnt_lo
Change-Id: I1c70fe693c904f1abd7d5a2b99220c74a075eae5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29948
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Matt Sinclair
c7b6e7c613 arch-gcn3: fix bug with DPP support
Instructions that use the DPP field need to use the extra SRC0
register associated with the DPP instruction instead of the
"default" SRC0 register, since the default SRC0 register contains
the DPP information when DPP is being used.  This commit fixes
2735c3bb88 to take this into account.  Additionally, this commit
removes write of the src register from the DPP helper functions,
to avoid overwriting any changes made to the destination register.
Finally, this change modifies the instructions that use DPP to
simplify the flow through the execute() functions.

Change-Id: I80fd0af1f131f287f18ff73b3c1c9122d8c60823
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29947
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Matt Sinclair
ed3135ea6a arch-gcn3: implement multi-dword buffer loads and stores
Add support for all multi-dword buffer loads and stores:
buffer_load_dword x2, x3, and x4 and buffer_store_dword x2, x3, and x4

Change-Id: I4017b6b4f625fc92002ce8ade695ae29700fa55e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29946
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Tony Gutierrez
ea52df816d arch-gcn3: Add support for rd/wr EXEC_HI to operand class
Change-Id: Ib22dd604f88ea56801964235082835002deffca1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29944
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Tony Gutierrez
af621cd6e6 gpu-compute, arch-gcn3: refactor barriers
Barriers were not modeled properly. Firstly, barriers were
allocated to each WG that was launched, which is not
correct, and the CU would provide an infinite number
of barrier slots. There are a limited number of barrier slots
per CU in reality. In addition, the CU will not allocate
barrier slots to WGs with a single WF (nothing to sync if
only one WF).

Beyond modeling problems, there also the issue of deadlock.
The barrier could deadlock because not all WFs are freed
from the barrier once it has been satisfied. Instead, we
relied on the scoreboard stage to release them lazily,
one-by-one.

Under this implementation the scoreboard may not fully release
all WFs participating in a barrier; this happens because the
first WF to be freed from the barrier could reach an s_barrier
instruction again, forever causing the barrier counts across
WFs to be out-of-sync.

This change refactors the barrier logic to:

1) Create a proper barrier slot implementation

2) Enforce (via a parameter) the number of barrier
   slots on the CU.

3) Simplify the logic and cleanup the code (i.e., we
   no longer iterate through the entire WF list each
   time we check if a barrier is satisfied).

4) Fix deadlock issues.

Change-Id: If53955b54931886baaae322640a7b9da7a1595e0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29943
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-16 20:37:22 +00:00
Xianwei Zhang
c2641eec89 arch-gcn3: add support of 64-bit SOPK instruction
s_setreg_imm32_b32 is a 64-bit instruction, using a 32-bit literal
constant. Related functions are added to support decoding the second
dword.

Change-Id: I290f8578f726885c137dbfac3773035f814e0a3a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29942
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com>
2020-07-16 20:37:22 +00:00
Matt Sinclair
3e84a8d710 arch-gcn3: ensure that atomics follow HSA conventions
Add asserts to make sure atomics are following the HSA conventions
that atomics should be word aligned (i.e., can't be byte aligned)
and should not be misaligned such that a given lane's access
spans multiple cache lines.

Change-Id: Ia48758b9ed96764864234dc607f337e30e287d1c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29941
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Alexandru Dutu
07fcbf16fc arch-gcn3: Implementation of flat atomic swap instruction
Change-Id: I9b9042899e65e8c9848b31c509eb2e3b13293e52
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29937
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-13 23:32:27 +00:00
Michael LeBeane
6747b127af arch-gcn3: Fix VOP2 dissasembly prints
VOP2 prints VSRC1 register index as hex instead of decimal if the
instruction contains a literal operand.  This patch resets the
format specifiers in the stream to print the register correctly.

Change-Id: Icc7e6588b3c5af545be6590ce412460e72df253f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29936
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
2020-07-13 19:48:12 +00:00