We currently have a bug in decoding workitem ID from the kernel
descriptor with multiple dimensions. The enable_vgpr_workitem_id bits
are currently seperated into x and y components, when they should be
treated as a single 2 bit value, where y is enabled when it is > 0,
and z is enabled when it is > 1. The current setup allows a kernel
launch with vgprs reserved for the z dimension and not the y dimension,
which is incorrect.
Change-Id: Iee64b207feb95bcf064898d5db33b8f201e25323
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29965
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
roundNearestEven is an inst_util function that RNDNE_F64 and F32
call, including both VOP1 and VOP3 formats. IEEE 754 spec says this
function should round inputs to the nearest integer but round ties
to the nearest even integer. Prior to this patch it was rounding all
inputs to nearest even, not just the ties. It was probably implemented
this way originally because the language in the ISA manual is ambiguous
although it provided the correct logic.
Fixed roundNearestEven to use the semantics originally described in
the GCN3 ISA manual.
Change-Id: I83ecb1d516fcf5bdf17e54ddf409b447a129a9a7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29964
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
This change fixes a memory leak due to live GPUDynInstPtr references
to vector store insts being stored in the CU's headTailMap and never
released.
This happened because store insts are not supposed to have their
head-tail latencies tracked by the headTailMap; instead they use
timing information from the GPUCoalescer. When updating the
headTailLatency stat via the headTailMap, only loads were considered
and removed from the headTailMap, however when inserting into the
headTailMap loads and stores were considered, thus leading to the
memory leak.
This change fixes the issue by only adding loads to the headTailMap.
Change-Id: I8a8f5b79f55e00481ae5e82519a9ed627a7ecbd1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29963
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
The BLIT kernels used to implement DMA through the shaders don't fill
out all of the standard fields in an amd_kernel_code_t object. This
patch modifies the code object parsing logic to support these new
kernels.
BLIT kernels are used in APUs when using ROCm memcopies for certain size
buffers, and are used for dGPUs when the SDMA engines are disabled.
Change-Id: Id4e667474d05e311097dbec443def07dfad14a79
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29959
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Based on the 64-bit ELF ABI for Power systems (ppc64 and
ppc64le), the data types int64_t and uint64_t are typedefs
of long and unsigned long respectively. If the SystemC
data types int64 and uint64 point to these, several errors
are observed while building the simulator on Power systems
due to ambiguity between the types when overloading some
operators and functions.
E.g.
...
build/POWER/systemc/ext/channel/../dt/bit/sc_bit.hh:114:17: error: 'static bool sc_dt::sc_bit::to_value(sc_dt::int64)' cannot be overloaded with 'static bool sc_dt::sc_bit::to_value(long int)'
114 | static bool to_value(tp i) { return to_value((int)i); }
| ^~~~~~~~
...
build/POWER/systemc/ext/channel/../dt/bit/sc_bit.hh:114:17: note: previous declaration 'static bool sc_dt::sc_bit::to_value(long int)'
114 | static bool to_value(tp i) { return to_value((int)i); }
| ^~~~~~~~
...
This adds a minor change to a SystemC datatype header to
ensure that the simulator can be built on Power systems.
Change-Id: Icd8bb38134bf98768cc38f9856d7d11a01ebaf21
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31414
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
s_getpc was currently reporting only a single operand,
and was only considering the SSRC operand. However,
this instruction' source is implicitly the PC.
Because its destination register was never tracked for
dependence checking purposes, dependence violations
are possible.
Change-Id: Ia80b8b3e24d5885f646a9ee41212a2cb35b9ffe6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29954
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
The vALU instruction counters were previously 32 bits, but for some
workloads this value wraps around and triggers an assert failure
because the max vALU operations are reached. To resolve this, this
commit increases the counter size to 64 bits.
Change-Id: I90ed4514669485cfea7ccc37ba9d69665277bccb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29950
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Instruction s_setreg_b32 was unimplemented, but is used by hipified
rodinia 'srad'. The instruction sets values of hardware internal
registers. If the instruction is writing into MODE to control
single-precision FP round and denorm modes, a simple warn will be
printed; for all other cases (non-MODE hw register or other
precisions), panic will happen.
Change-Id: Idb1cd5f60548a146bc980f1a27faff30259e74ce
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29949
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com>
Instructions that use the DPP field need to use the extra SRC0
register associated with the DPP instruction instead of the
"default" SRC0 register, since the default SRC0 register contains
the DPP information when DPP is being used. This commit fixes
2735c3bb88 to take this into account. Additionally, this commit
removes write of the src register from the DPP helper functions,
to avoid overwriting any changes made to the destination register.
Finally, this change modifies the instructions that use DPP to
simplify the flow through the execute() functions.
Change-Id: I80fd0af1f131f287f18ff73b3c1c9122d8c60823
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29947
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
This change updates the constructors of the CU's pipe
stages/memory pipelines to accept a pointer to their
parent CU. Because the CU creates these objects, and
can pass a pointer to itself to these object via their
constructors, this is the safer way to initalize these
classes.
Change-Id: I0b3732ce7c03781ee15332dac7a21c097ad387a4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29945
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Barriers were not modeled properly. Firstly, barriers were
allocated to each WG that was launched, which is not
correct, and the CU would provide an infinite number
of barrier slots. There are a limited number of barrier slots
per CU in reality. In addition, the CU will not allocate
barrier slots to WGs with a single WF (nothing to sync if
only one WF).
Beyond modeling problems, there also the issue of deadlock.
The barrier could deadlock because not all WFs are freed
from the barrier once it has been satisfied. Instead, we
relied on the scoreboard stage to release them lazily,
one-by-one.
Under this implementation the scoreboard may not fully release
all WFs participating in a barrier; this happens because the
first WF to be freed from the barrier could reach an s_barrier
instruction again, forever causing the barrier counts across
WFs to be out-of-sync.
This change refactors the barrier logic to:
1) Create a proper barrier slot implementation
2) Enforce (via a parameter) the number of barrier
slots on the CU.
3) Simplify the logic and cleanup the code (i.e., we
no longer iterate through the entire WF list each
time we check if a barrier is satisfied).
4) Fix deadlock issues.
Change-Id: If53955b54931886baaae322640a7b9da7a1595e0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29943
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The LDS is capable of handling out-of-bounds accesses,
that is, accesses that are outside the bounds of the
chunk allocated to a WG. Currently, the simulator asserts
on these accesses. This patch changes the behavior of the
LDS to return 0 for reads and dropping writes that are
out-of-bounds.
Change-Id: I5f467d0f52113e8565e1a3029e82fb89cc6f07ea
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29940
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Ubuntu 20.04 contains GCC verisons 9 and 10, which are not easily
obtainable (at least through APT) on Ubuntu 18.04. Therefore, a
Dockerfile for obtaining GCC versions in Ubuntu 20.04 has been added.
The orignal GCC version Dockerfile (Ubuntu 18.04) has been kept as
GCC versions 4.8, 5, and 6 are not obtainable, via APT, on Ubuntu
20.04. A complete migration to the 20.04 Dockerfile is not possible
until these earlier GCC versions are dropped.
The Docker images for GCC Versions 9 and 10 can be found here:
https://gcr.io/gem5-test/gcc-version-10https://gcr.io/gem5-test/gcc-version-9
The other Dockerfile directories have been renamed for consistency.
Change-Id: I569249331095ee62d1be5be479c7ba7da0077422
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30514
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
This script runs a series of compilations on gem5. The following
compilers are tested:
clang-9
clang-8
clang-7
clang-6
clang-5
clang-4
clang-3.9
gcc-9
gcc-8
gcc-7
gcc-6
gcc-5
gcc-4.8 (to be dropped soon:
https://gem5.atlassian.net/browse/GEM5-218)
They are tested by building the following build targets:
ARM
ARM_MESI_Three_Level
Garnet_standalone
GCN3_X86
MIPS
NULL_MESI_Two_Level
NULL_MOESI_CMP_directory
NULL_MOESI_CMP_token
NULL_MOESI_hammer
POWER
RISCV
SPARC
X86
X86_MOESI_AMD_BASE
For each, ".opt" and ".fast" compiler build settings are tested.
clang-9 and gcc-9 are tested against all targets with each build
setting. For the remaining compilers, a random build target is
chosen. After the script has run, the output of the tests can be
found in "compile-test-out".
Docker is required to run this script.
Change-Id: Id3bf4c89b9d424c87e9409930ee2aceaef72cb29
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30395
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The RubyPrefetcher uses makeNextStrideAddress() with
a negative stride to find prefetched address.
The type of this expression is:
uint64_t + uint32_t * int;
This gives wrong result due to implicit conversion.
Fix this with static cast and it works correctly:
uint64_t + int * int;
Change-Id: I36e17e00d5c66c3699fe1d5b287971225a162d04
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31314
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Motivation:
An AddressSizeFault on AArch32 can only happen during a table walk
since the register used as a base by LD/ST is always 32 bit wide.
On AArch64 on the other hand, addresses can be 64bit wide;
when MMU is off (no virtual memory) an invalid physical address
can be specified
Change-Id: Id3ef170e99202c6b0b511fa7205c754956861720
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31274
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This will still be technically possible with the right converters, but
this removes the tags, ignore file, and style checking hooks related to
mercurial. We no longer maintain a mercurial mirror of the main git
repository, and this support adds clutter and could diverge from the git
style hooks, etc, over time.
Change-Id: Icf4833c4f0fda51ea98989d1d741432ae3ddc6dd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31174
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This changeset drops fetches if there is no entry reserved in the
fetch buffer for that instruction. This can happen due to a fetch
attempted to be issued in the same cycle where a branch instruction
flushed the fetch buffer, while an ITLB or I-cache request is still
pending.
Change-Id: I3b80dbd71af27ccf790b543bd5c034bb9b02624a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29932
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Onur Kayıran <onur.kayiran@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>