Apply the gem5 namespace to the codebase.
Some anonymous namespaces could theoretically be removed,
but since this change's main goal was to keep conflicts
at a minimum, it was decided not to modify much the
general shape of the files.
A few missing comments of the form "// namespace X" that
occurred before the newly added "} // namespace gem5"
have been added for consistency.
std out should not be included in the gem5 namespace, so
they weren't.
ProtoMessage has not been included in the gem5 namespace,
since I'm not familiar with how proto works.
Regarding the SystemC files, although they belong to gem5,
they actually perform integration between gem5 and SystemC;
therefore, it deserved its own separate namespace.
Files that are automatically generated have been included
in the gem5 namespace.
The .isa files currently are limited to a single namespace.
This limitation should be later removed to make it easier
to accomodate a better API.
Regarding the files in util, gem5:: was prepended where
suitable. Notice that this patch was tested as much as
possible given that most of these were already not
previously compiling.
Change-Id: Ia53d404ec79c46edaa98f654e23bc3b0e179fe2d
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46323
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The systemc dir was not included in this fix.
First it was identified that there were only occurrences
at 0, 1, 2 and 3 levels of indentation, using:
grep -nrE --exclude-dir=systemc \
"^ *union [A-Za-z].* {$" src/
Then the following commands were run to replace:
<indent level>union X ... {
by:
<indent level>union X ...
<indent level>{
Level 0:
grep -nrl --exclude-dir=systemc \
"^union [A-Za-z].* {$" src/ | \
xargs sed -Ei \
's/^union ([A-Za-z].*) \{$/union \1\n\{/g'
Level 1:
grep -nrl --exclude-dir=systemc \
"^ union [A-Za-z].* {$" src/ | \
xargs sed -Ei \
's/^ union ([A-Za-z].*) \{$/ union \1\n \{/g'
and so on.
Change-Id: I066854eb27a8acd2cc2dfa41596bb1b1f66c71b1
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/43328
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
The systemc dir was not included in this fix.
First it was identified that there were only occurrences
at 0, 1, 2 and 3 levels of indentation (and a single
occurrence of 2 and 3 spaces), using:
grep -nrE --exclude-dir=systemc \
"^ *struct [A-Za-z].* {$" src/
Then the following commands were run to replace:
<indent level>struct X ... {
by:
<indent level>struct X ...
<indent level>{
Level 0:
grep -nrl --exclude-dir=systemc
"^struct [A-Za-z].* {$" src/ | \
xargs sed -Ei \
's/^struct ([A-Za-z].*) \{$/struct \1\n\{/g'
Level 1:
grep -nrl --exclude-dir=systemc \
"^ struct [A-Za-z].* {$" src/ | \
xargs sed -Ei \
's/^ struct ([A-Za-z].*) \{$/ struct \1\n \{/g'
and so on.
Change-Id: I362ef58c86912dabdd272c7debb8d25d587cd455
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39017
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Due the compute unit pipeline being executed in reverse order, there
exists a scenario where a compute unit will execute an extra
instruction when it's supposed to be stopped at a barrier. It occurs
as follows:
* The ScheduleStage sets a barrier instruction ready to execute.
* The ScoreboardCheckStage adds another instruction to the readyList.
This is where the barrier is checked, but because the barrier isn't
executing yet, the instruction can be passed along to ScheduleStage
* The barrier executes, and stalls
* The ScheduleStage sees that there's a new instruction and schedules
it to be executed.
* Only now will the ScoreboardCheckStage realize a barrier is active
and stall accordingly
* The subsequent instruction executes
This patch sets the wavefront status to be S_BARRIER in ScheduleStage
instead of in the barrier instruction execution in order to have
ScoreboardCheckStage realize that we're going to execute a barrier,
preventing it from marking another instruciton as ready.
Change-Id: Ib683e2c68f361d7ee60a3beaf53b4b6c888c9f8d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41573
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Certain branch instructions specify that the result of (simm16 * 4)
gets sign-extended before being added to the PC.
Previously, that result was being sign extended as if it was still a
16-bit number. This patch fixes that by having the result be sign
extended as an 18-bit number.
Change-Id: Id4d430f8daa71ca7910b570e7e39790626f1decf
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41053
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This change sets gpuDynInst->exec_mask for permute and bpermute
instructions, fixing a bug where they would never write their data.
permute and bpermute instructions are load instructions that write
to a VGPR. Because of that, they use gpuDynInst->exec_mask when
checking what lanes should write to the VGPR.
gpuDynInst->exec_mask gets set to wf->execMask() as that is what other
load instructions that write to VGPRs do.
Change-Id: Ie443283488cbd2ab9c17fc255e7cc44418353419
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35036
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
In flat instructions, wrLmReqsInPipe/rdLmReqsInPipe are decremented
in the calcAddr() function. However, the calcAddr() function is only
called when execMask != 0.
This patch adds in statements to decrement wrLmReqsInPipe and
rdLmReqsInPipe in all implemented atomic flats when execMask is 0.
This fixes a scenario where vector local memory and flat instructions
are unable to execute due to LocalMemPipeline::isLMReqFIFOWrRdy
always returning false in ScheduleStage::dispatchReady after too many
atomic flats execute with execMask = 0
Change-Id: I081cfd3faf74bbfcf0728445e7160fa2a76a6a7e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32614
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Flat instructions free some of their registers through their memory
requests, in particuar a call to scheduleWriteOperandsFromLoad(),
which gets called from GlobalMemPipeline::exec.
When execMask is 0, the instruction doesn't issue a memory request.
This patch adds in a call to scheduleWriteOperandsFromLoad() when
execMask is 0 for Flat Load and AtomicReturn instructions, as those
are the instructions that call scheduleWriteOperandsFromLoad()
in the memory pipeline.
This patch also adds in a missing return statement when execMask is 0
in one of the Flat instructions.
Change-Id: I09296adb7401e7515d3cedceb780a5df4598b109
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32234
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Affected instructions: V_DIV_SCALE_F64, V_CMP_CLASS_F64,
V_CMPX_CLASS_F64 and their VOPC, VOP3, F32 variants.
These instances of std::isnormal were being used to check for
subnormal (denorms) values. std::isnormal is not specific enough.
It returns true for normal values but false for NaN, Inf, 0.0, and
subnormals. std::fpclassify returns macros for each category of
floating point numbers. Now we only catch subnormals.
Change-Id: I8d8f4452ff58de71e7c8e0b2b5e73467b532e196
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29967
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Replaced !std::isnormal with std::fpclassify because std::isnormal
is not specific enough. !std::isnormal was incorrectly catching
NaN, Inf, 0.0, and subnormals (aka denormals), where as it was only
suppose to catch subnormals.
The return value and error handling spec of std::ldexp listed on
cppreference.com appears to match up in nearly all cases after
making these changes. If std::ldexp handled subnormals as described
in the GCN3 2016 guide, we could have used vdst[lane] = std::ldexp
and not need to check for any corner cases.
Change-Id: I4c77af77c3b7798f86d40442610cef1296a28441
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29966
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
roundNearestEven is an inst_util function that RNDNE_F64 and F32
call, including both VOP1 and VOP3 formats. IEEE 754 spec says this
function should round inputs to the nearest integer but round ties
to the nearest even integer. Prior to this patch it was rounding all
inputs to nearest even, not just the ties. It was probably implemented
this way originally because the language in the ISA manual is ambiguous
although it provided the correct logic.
Fixed roundNearestEven to use the semantics originally described in
the GCN3 ISA manual.
Change-Id: I83ecb1d516fcf5bdf17e54ddf409b447a129a9a7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29964
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
s_getpc was currently reporting only a single operand,
and was only considering the SSRC operand. However,
this instruction' source is implicitly the PC.
Because its destination register was never tracked for
dependence checking purposes, dependence violations
are possible.
Change-Id: Ia80b8b3e24d5885f646a9ee41212a2cb35b9ffe6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29954
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Instruction s_setreg_b32 was unimplemented, but is used by hipified
rodinia 'srad'. The instruction sets values of hardware internal
registers. If the instruction is writing into MODE to control
single-precision FP round and denorm modes, a simple warn will be
printed; for all other cases (non-MODE hw register or other
precisions), panic will happen.
Change-Id: Idb1cd5f60548a146bc980f1a27faff30259e74ce
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29949
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com>
Instructions that use the DPP field need to use the extra SRC0
register associated with the DPP instruction instead of the
"default" SRC0 register, since the default SRC0 register contains
the DPP information when DPP is being used. This commit fixes
2735c3bb88 to take this into account. Additionally, this commit
removes write of the src register from the DPP helper functions,
to avoid overwriting any changes made to the destination register.
Finally, this change modifies the instructions that use DPP to
simplify the flow through the execute() functions.
Change-Id: I80fd0af1f131f287f18ff73b3c1c9122d8c60823
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29947
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Barriers were not modeled properly. Firstly, barriers were
allocated to each WG that was launched, which is not
correct, and the CU would provide an infinite number
of barrier slots. There are a limited number of barrier slots
per CU in reality. In addition, the CU will not allocate
barrier slots to WGs with a single WF (nothing to sync if
only one WF).
Beyond modeling problems, there also the issue of deadlock.
The barrier could deadlock because not all WFs are freed
from the barrier once it has been satisfied. Instead, we
relied on the scoreboard stage to release them lazily,
one-by-one.
Under this implementation the scoreboard may not fully release
all WFs participating in a barrier; this happens because the
first WF to be freed from the barrier could reach an s_barrier
instruction again, forever causing the barrier counts across
WFs to be out-of-sync.
This change refactors the barrier logic to:
1) Create a proper barrier slot implementation
2) Enforce (via a parameter) the number of barrier
slots on the CU.
3) Simplify the logic and cleanup the code (i.e., we
no longer iterate through the entire WF list each
time we check if a barrier is satisfied).
4) Fix deadlock issues.
Change-Id: If53955b54931886baaae322640a7b9da7a1595e0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29943
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>