Commit Graph

15608 Commits

Author SHA1 Message Date
Tony Gutierrez
4d737462c2 gpu-compute, arch-gcn3: Change how waitcnts are implemented
Use single counters per memory operation type and increment
them upon issue, not execute.

Change-Id: I6afc0b66b21882538ef90a14a57a3ab3cc7bd6f3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29973
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 16:36:23 +00:00
Tony Gutierrez
63c76448eb gpu-compute: Add pipeline stage interface classes
This change separates the pipeline stage interfaces
for the GPU's compute unit into their own classes
with a well-defined interface. This helps to create
a cleaner interface for users to extend the CU
pipeline's capabilities and also helps consolidate
all the pipeline communication code in one place
in the source.

Change-Id: I569d52bce84dc1b9fbf8f0f96d53a81a2b6773c6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29972
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 16:36:09 +00:00
Chow, Marcus
6655161037 arch-gcn3: Add case to op selector when operand is vcc_hi
Change-Id: Ib8846656e18aad04ccb8c9112bc629c69078fe36
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29971
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:35:44 +00:00
Alexandru Dutu
7d50d5d972 gpu-compute: No RF scheduling in case of SKIP or EMPTY
In case of flat memory instructions the status for the
LM pipe execution unit is set to SKIP or EMPTY, as the bus
between the VRF and the GM and LM pipe is shared. The
destination operands should not be scheduled for the LM pipe,
event if the wave is in the dispatch list. This can lead
to deadlock in the destination cache as DCEs are reused
and the slotsAvailableForBank count gets artificially
incremented.

Change-Id: I2230c53e3bc1032d2cccbe00fab62c99ab8de6cd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29970
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:34:59 +00:00
Tony Gutierrez
5f0378b8d0 gpu-compute: Use refs to CU in pipe stages/mem pipes
The pipe stages and memory pipes are changed to store
a reference to their parent CU as opposed to a pointer.
These objects will never change which CU they belong to,
and they are constructed by their parent CU.

Change-Id: Ie5476e1e2e124a024c2efebceb28cb3a9baa78c1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29969
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 16:34:36 +00:00
Michael LeBeane
f509fa735c arch-gcn3: Fix stride bug in buffer OOB detection logic
The out-of-range logic for buffer accesses is missing the top 4 bits of
const_stride when dealing with scratch buffers.  This can cause
perfectly valid scratch acceses to be suppressed when const_stride is
large.

Change-Id: I8f94d44c242fda26cf6dfb75db04fa3aca934b3e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29968
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 16:34:07 +00:00
Travis Boraten
4c1dc827bc arch-gcn3: Replace some instances of std::isnormal with std::fpclassify
Affected instructions: V_DIV_SCALE_F64, V_CMP_CLASS_F64,
V_CMPX_CLASS_F64 and their VOPC, VOP3, F32 variants.

These instances of std::isnormal were being used to check for
subnormal (denorms) values. std::isnormal is not specific enough.
It returns true for normal values but false for NaN, Inf, 0.0, and
subnormals. std::fpclassify returns macros for each category of
floating point numbers. Now we only catch subnormals.

Change-Id: I8d8f4452ff58de71e7c8e0b2b5e73467b532e196
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29967
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:33:26 +00:00
Travis Boraten
e1d10c3894 arch-gcn3: Fix VOP3 V_LDEXP_F64
Replaced !std::isnormal with std::fpclassify because std::isnormal
is not specific enough. !std::isnormal was incorrectly catching
NaN, Inf, 0.0, and subnormals (aka denormals), where as it was only
suppose to catch subnormals.

The return value and error handling spec of std::ldexp listed on
cppreference.com appears to match up in nearly all cases after
making these changes. If std::ldexp handled subnormals as described
in the GCN3 2016 guide, we could have used vdst[lane] = std::ldexp
and not need to check for any corner cases.

Change-Id: I4c77af77c3b7798f86d40442610cef1296a28441
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29966
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:32:56 +00:00
Michael LeBeane
83fe4754e7 gpu-compute: Fix Y-dimension ABI decode
We currently have a bug in decoding workitem ID from the kernel
descriptor with multiple dimensions.  The enable_vgpr_workitem_id bits
are currently seperated into x and y components, when they should be
treated as a single 2 bit value, where y is enabled when it is > 0,
and z is enabled when it is > 1.  The current setup allows a kernel
launch with vgprs reserved for the z dimension and not the y dimension,
which is incorrect.

Change-Id: Iee64b207feb95bcf064898d5db33b8f201e25323
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29965
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:32:56 +00:00
Travis Boraten
e4f7982e90 arch-gcn3: Fix roundNearestEven for V_RNDNE_F64 and V_RNDNE_F32
roundNearestEven is an inst_util function that RNDNE_F64 and F32
call, including both VOP1 and VOP3 formats. IEEE 754 spec says this
function should round inputs to the nearest integer but round ties
to the nearest even integer. Prior to this patch it was rounding all
inputs to nearest even, not just the ties. It was probably implemented
this way originally because the language in the ISA manual is ambiguous
although it provided the correct logic.

Fixed roundNearestEven to use the semantics originally described in
the GCN3 ISA manual.

Change-Id: I83ecb1d516fcf5bdf17e54ddf409b447a129a9a7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29964
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:32:56 +00:00
Tony Gutierrez
f64ff89212 gpu-compute: Don't track vector store insts in CU's headTailMap
This change fixes a memory leak due to live GPUDynInstPtr references
to vector store insts being stored in the CU's headTailMap and never
released.

This happened because store insts are not supposed to have their
head-tail latencies tracked by the headTailMap; instead they use
timing information from the GPUCoalescer. When updating the
headTailLatency stat via the headTailMap, only loads were considered
and removed from the headTailMap, however when inserting into the
headTailMap loads and stores were considered, thus leading to the
memory leak.

This change fixes the issue by only adding loads to the headTailMap.

Change-Id: I8a8f5b79f55e00481ae5e82519a9ed627a7ecbd1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29963
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:32:06 +00:00
Matt Sinclair
a23ef78c91 arch-gcn3: add all s_buffer_load_dword instructions
Adds the other s_buffer_load_dword* instruction implementations to
f134a84.

Change-Id: I8d97527278900dc68c32463ea1824409ccd04e1d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29962
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:31:39 +00:00
Matthew Poremba
39f305b329 arch-gcn3: Add memcpy condition when writing EXEC_LO
Some compilers emit an error on the operand template class when writing
exec mask. Add a condition to explicitly set memcpy size argument to
32b or 64b based on the number of dwords.

Change-Id: I49b0e4a1680283e772d0a5a8efd687b31d4f1624
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29961
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 16:31:10 +00:00
Tony Gutierrez
550f0203aa arch-gcn3: Remove invalid assert when reading EXEC_LO
This assert assumed all reads to EXEC_LO would be
64b, that is, we would always read the entire EXEC
mask. This is invalid as some kernels read only
the low 32b of EXEC.

The write to EXEC_LO is also updated to handle 32b
writes.

Change-Id: Ifeb167578515bf112b1eab70bbf2201a5e936358
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29960
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:30:41 +00:00
Michael LeBeane
1d816250f8 gpu_compute: Support loading BLIT kernels
The BLIT kernels used to implement DMA through the shaders don't fill
out all of the standard fields in an amd_kernel_code_t object.  This
patch modifies the code object parsing logic to support these new
kernels.

BLIT kernels are used in APUs when using ROCm memcopies for certain size
buffers, and are used for dGPUs when the SDMA engines are disabled.

Change-Id: Id4e667474d05e311097dbec443def07dfad14a79
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29959
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:13:59 +00:00
Tony Gutierrez
72e9324ef0 arch-gcn3: Implement ds_swizzle
Change-Id: I7d188388afa16932217ae207368666a724207c52
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29958
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:13:43 +00:00
Tony Gutierrez
513e75d99a arch-gcn3: Implement s_buffer_load_dwordx16
Change-Id: I25382dcae9bb55eaf035385fa925157f25d39c20
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29957
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-17 16:13:17 +00:00
Tony Gutierrez
0c3b84fd33 arch-gcn3: Fixup DIV instructions
Adds support to handle the special cases
for GCN3 DIV instructions.

Change-Id: I18f91870e802407c93831f313ce76be053bc4230
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29956
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 16:12:58 +00:00
Sandipan Das
a937089d6f systemc: Fix builds on Power systems
Based on the 64-bit ELF ABI for Power systems (ppc64 and
ppc64le), the data types int64_t and uint64_t are typedefs
of long and unsigned long respectively. If the SystemC
data types int64 and uint64 point to these, several errors
are observed while building the simulator on Power systems
due to ambiguity between the types when overloading some
operators and functions.

E.g.
  ...
  build/POWER/systemc/ext/channel/../dt/bit/sc_bit.hh:114:17: error: 'static bool sc_dt::sc_bit::to_value(sc_dt::int64)' cannot be overloaded with 'static bool sc_dt::sc_bit::to_value(long int)'
    114 |     static bool to_value(tp i) { return to_value((int)i); }
        |                 ^~~~~~~~
  ...
  build/POWER/systemc/ext/channel/../dt/bit/sc_bit.hh:114:17: note: previous declaration 'static bool sc_dt::sc_bit::to_value(long int)'
    114 |     static bool to_value(tp i) { return to_value((int)i); }
        |                 ^~~~~~~~
  ...

This adds a minor change to a SystemC datatype header to
ensure that the simulator can be built on Power systems.

Change-Id: Icd8bb38134bf98768cc38f9856d7d11a01ebaf21
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31414
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 02:38:51 +00:00
Gabe Black
536ea331b0 sim: Add a ProxyPtr test.
Change-Id: If71cc374030a5ef0dab62d351bc83960ff509af7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29401
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-17 01:01:13 +00:00
Chow, Marcus
b267350ee5 arch-gcn3: fixed scale,fixup,fmas f64 ops
Change-Id: Ie13794554db8a958fda1f7103ec18058fda2e66d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29955
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Tony Gutierrez
5dc5d23b79 arch-gcn3: Fix s_getpc operand information
s_getpc was currently reporting only a single operand,
and was only considering the SSRC operand. However,
this instruction' source is implicitly the PC.
Because its destination register was never tracked for
dependence checking purposes, dependence violations
are possible.

Change-Id: Ia80b8b3e24d5885f646a9ee41212a2cb35b9ffe6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29954
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Chow, Marcus
a0cfd8da6b arch-gcn3: Add handling for Inf/overflow in CVT insts
Change-Id: I0fddffdeaebd9f45fe89f44d536f80a43de63ff5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29953
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Tony Gutierrez
5c3b02de09 arch-gcn3: Add ds_bpermute and ds_permute insts
The implementation of these insts provided by this
change is based on the description provided here:

https://gpuopen.com/amd-gcn-assembly-cross-lane-operations/

Change-Id: Id63b6c34c9fdc6e0dbd445d859e7b209023f2874
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29952
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Alexandru Dutu
3aa633cc3f arch-gcn3: ds_read_u8 and ds_read_u16 fix
This changeset zero extends the destination register
for ds_read_u8 and ds_read_u16 instructions.

Change-Id: I193adadd68adf2572b59743b1504f18ad225f506
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29951
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Matt Sinclair
13079629a1 arch-gcn3: convert vALU instruction counters from 32 to 64-bit
The vALU instruction counters were previously 32 bits, but for some
workloads this value wraps around and triggers an assert failure
because the max vALU operations are reached.  To resolve this, this
commit increases the counter size to 64 bits.

Change-Id: I90ed4514669485cfea7ccc37ba9d69665277bccb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29950
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Xianwei Zhang
fff185993a arch-gcn3: implement instruction s_setreg_b32
Instruction s_setreg_b32 was unimplemented, but is used by hipified
rodinia 'srad'. The instruction sets values of hardware internal
registers. If the instruction is writing into MODE to control
single-precision FP round and denorm modes, a simple warn will be
printed; for all other cases (non-MODE hw register or other
precisions), panic will happen.

Change-Id: Idb1cd5f60548a146bc980f1a27faff30259e74ce
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29949
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com>
2020-07-16 20:37:22 +00:00
Matt Sinclair
1836d58b36 arch-gcn3: add support for v_mbcnt_hi and v_mbcnt_lo
Change-Id: I1c70fe693c904f1abd7d5a2b99220c74a075eae5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29948
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Matt Sinclair
c7b6e7c613 arch-gcn3: fix bug with DPP support
Instructions that use the DPP field need to use the extra SRC0
register associated with the DPP instruction instead of the
"default" SRC0 register, since the default SRC0 register contains
the DPP information when DPP is being used.  This commit fixes
2735c3bb88 to take this into account.  Additionally, this commit
removes write of the src register from the DPP helper functions,
to avoid overwriting any changes made to the destination register.
Finally, this change modifies the instructions that use DPP to
simplify the flow through the execute() functions.

Change-Id: I80fd0af1f131f287f18ff73b3c1c9122d8c60823
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29947
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Matt Sinclair
ed3135ea6a arch-gcn3: implement multi-dword buffer loads and stores
Add support for all multi-dword buffer loads and stores:
buffer_load_dword x2, x3, and x4 and buffer_store_dword x2, x3, and x4

Change-Id: I4017b6b4f625fc92002ce8ade695ae29700fa55e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29946
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Tony Gutierrez
0c5d671ea1 gpu-compute: Init CU object for pipe stages in their ctors
This change updates the constructors of the CU's pipe
stages/memory pipelines to accept a pointer to their
parent CU. Because the CU creates these objects, and
can pass a pointer to itself to these object via their
constructors, this is the safer way to initalize these
classes.

Change-Id: I0b3732ce7c03781ee15332dac7a21c097ad387a4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29945
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-16 20:37:22 +00:00
Tony Gutierrez
ea52df816d arch-gcn3: Add support for rd/wr EXEC_HI to operand class
Change-Id: Ib22dd604f88ea56801964235082835002deffca1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29944
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Tony Gutierrez
af621cd6e6 gpu-compute, arch-gcn3: refactor barriers
Barriers were not modeled properly. Firstly, barriers were
allocated to each WG that was launched, which is not
correct, and the CU would provide an infinite number
of barrier slots. There are a limited number of barrier slots
per CU in reality. In addition, the CU will not allocate
barrier slots to WGs with a single WF (nothing to sync if
only one WF).

Beyond modeling problems, there also the issue of deadlock.
The barrier could deadlock because not all WFs are freed
from the barrier once it has been satisfied. Instead, we
relied on the scoreboard stage to release them lazily,
one-by-one.

Under this implementation the scoreboard may not fully release
all WFs participating in a barrier; this happens because the
first WF to be freed from the barrier could reach an s_barrier
instruction again, forever causing the barrier counts across
WFs to be out-of-sync.

This change refactors the barrier logic to:

1) Create a proper barrier slot implementation

2) Enforce (via a parameter) the number of barrier
   slots on the CU.

3) Simplify the logic and cleanup the code (i.e., we
   no longer iterate through the entire WF list each
   time we check if a barrier is satisfied).

4) Fix deadlock issues.

Change-Id: If53955b54931886baaae322640a7b9da7a1595e0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29943
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-16 20:37:22 +00:00
Xianwei Zhang
c2641eec89 arch-gcn3: add support of 64-bit SOPK instruction
s_setreg_imm32_b32 is a 64-bit instruction, using a 32-bit literal
constant. Related functions are added to support decoding the second
dword.

Change-Id: I290f8578f726885c137dbfac3773035f814e0a3a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29942
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com>
2020-07-16 20:37:22 +00:00
Matt Sinclair
3e84a8d710 arch-gcn3: ensure that atomics follow HSA conventions
Add asserts to make sure atomics are following the HSA conventions
that atomics should be word aligned (i.e., can't be byte aligned)
and should not be misaligned such that a given lane's access
spans multiple cache lines.

Change-Id: Ia48758b9ed96764864234dc607f337e30e287d1c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29941
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Tony Gutierrez
701f026ba5 gpu-compute: Fix LDS out-of-bounds behavior
The LDS is capable of handling out-of-bounds accesses,
that is, accesses that are outside the bounds of the
chunk allocated to a WG. Currently, the simulator asserts
on these accesses. This patch changes the behavior of the
LDS to return 0 for reads and dropping writes that are
out-of-bounds.

Change-Id: I5f467d0f52113e8565e1a3029e82fb89cc6f07ea
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29940
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-16 20:37:22 +00:00
Bobby R. Bruce
cc791d51a2 base,scons: Fixed stats/hdf5.cc CXXFlags to -Wno-deprecated
`Wno-deprecated-copy` was added to disable a warning in hdf5.cc:
https://gem5-review.googlesource.com/c/public/gem5/+/26325.

This works with GCC but does not work with clang. Clang returns
`error: unknown warning option '-Wno-deprecated-copy'; did you mean
'-Wno-deprecated'? [-Werror,-Wunknown-warning-option]` when this flag
is enabled. This flag has therefore been changed to `Wno-deprecated`.
This works in both GCC and Clang.

Change-Id: I38dd58f3007975ccb60b2eec936c3b200b3df3ca
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31216
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-16 02:44:26 +00:00
Bobby R. Bruce
7deef45046 arch-arm: Added missing break to miscregs.cc
Later GCC compilers >=9 fail with a `this statement may fall through`
error due to this missing break.

Change-Id: I44b3386930a0b71b842a3a9b4837e4d6ad588f9d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31215
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-16 02:44:26 +00:00
Bobby R. Bruce
191212a6dc sim: Added M5_VAR_USED to unused cpu var
The `BaseCPU cpu` variable unused when compiling gem5.fast. This
causes the compilation to fail. Adding the M5_VAR_USED resolves this
issue.

Change-Id: I62588563e9cde384e30755742d6bc754e819d7f4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31214
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-16 02:44:26 +00:00
Bobby R. Bruce
5052aae04f util: Added cloudbuild_create_images.yaml
This cloudbuild file is used to build and upload our Docker images
to our Google Cloud infrastructure. To run:

```
gcloud builds submit --config \
util/cloudbuild/cloudbuild_create_images.yaml
```

Change-Id: I812584284a3cf79aac244a3b04a0f316f4281c49
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30516
Reviewed-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-16 02:44:26 +00:00
Bobby R. Bruce
e416e6b1f1 util,tests: Added gcc version 10 to compiler tests
In addition, gcc version 9 has been added, which was previously
served by "ubuntu-20.04_all-dependencies".

Change-Id: I57aeac2aa75b7751f0d4010efee7780e23d447d4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30515
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-16 02:44:26 +00:00
Bobby R. Bruce
f7a63fbd09 util: Added Dockerfile for GCC versions on ubuntu 20.04
Ubuntu 20.04 contains GCC verisons 9 and 10, which are not easily
obtainable (at least through APT) on Ubuntu 18.04. Therefore, a
Dockerfile for obtaining GCC versions in Ubuntu 20.04 has been added.
The orignal GCC version Dockerfile (Ubuntu 18.04) has been kept as
GCC versions 4.8, 5, and 6 are not obtainable, via APT, on Ubuntu
20.04. A complete migration to the 20.04 Dockerfile is not possible
until these earlier GCC versions are dropped.

The Docker images for GCC Versions 9 and 10 can be found here:
https://gcr.io/gem5-test/gcc-version-10
https://gcr.io/gem5-test/gcc-version-9

The other Dockerfile directories have been renamed for consistency.

Change-Id: I569249331095ee62d1be5be479c7ba7da0077422
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30514
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-16 02:44:26 +00:00
Bobby R. Bruce
91350dc4d5 util,tests: Added compiler test script
This script runs a series of compilations on gem5. The following
compilers are tested:

clang-9
clang-8
clang-7
clang-6
clang-5
clang-4
clang-3.9
gcc-9
gcc-8
gcc-7
gcc-6
gcc-5
gcc-4.8 (to be dropped soon:
         https://gem5.atlassian.net/browse/GEM5-218)

They are tested by building the following build targets:

ARM
ARM_MESI_Three_Level
Garnet_standalone
GCN3_X86
MIPS
NULL_MESI_Two_Level
NULL_MOESI_CMP_directory
NULL_MOESI_CMP_token
NULL_MOESI_hammer
POWER
RISCV
SPARC
X86
X86_MOESI_AMD_BASE

For each, ".opt" and ".fast" compiler build settings are tested.

clang-9 and gcc-9 are tested against all targets with each build
setting. For the remaining compilers, a random build target is
chosen. After the script has run, the output of the tests can be
found in "compile-test-out".

Docker is required to run this script.

Change-Id: Id3bf4c89b9d424c87e9409930ee2aceaef72cb29
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30395
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-15 20:49:57 +00:00
Tony Gutierrez
a408b1ada7 mem-ruby: Add support for MemSync reqs in VIPER
Change-Id: Ib129e82be5348c641a8ae18093324bcedfb38abe
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29939
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-15 18:14:41 +00:00
seanzw
75257c7a42 mem-ruby: Fix type casting in makeNextStrideAddress
The RubyPrefetcher uses makeNextStrideAddress() with
a negative stride to find prefetched address.
The type of this expression is:
uint64_t + uint32_t * int;
This gives wrong result due to implicit conversion.
Fix this with static cast and it works correctly:
uint64_t + int * int;

Change-Id: I36e17e00d5c66c3699fe1d5b287971225a162d04
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31314
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-15 17:38:12 +00:00
Giacomo Travaglini
3ce7333a36 arch-arm: AddressSize check on translateMmuOff for AArch64 only
Motivation:
An AddressSizeFault on AArch32 can only happen during a table walk
since the register used as a base by LD/ST is always 32 bit wide.
On AArch64 on the other hand, addresses can be 64bit wide;
when MMU is off (no virtual memory) an invalid physical address
can be specified

Change-Id: Id3ef170e99202c6b0b511fa7205c754956861720
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31274
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-15 13:14:56 +00:00
Jason Lowe-Power
981bdda174 misc: Add a code of conduct
This file codifies our current unofficial policies. This doesn't change
the community in any way except for clarifying our policies.

This was modeled off of https://www.contributor-covenant.org/.

Change-Id: Ib976636b490bbe4d46bf79260e6a345b46c02e2c
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30954
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-14 18:41:37 +00:00
Xianwei Zhang
024f978cff gpu-compute: enable kernel-end WB functionality
Change-Id: Ib17e1d700586d1aa04d408e7b924270f0de82efe
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29938
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com>
2020-07-13 23:32:37 +00:00
Alexandru Dutu
07fcbf16fc arch-gcn3: Implementation of flat atomic swap instruction
Change-Id: I9b9042899e65e8c9848b31c509eb2e3b13293e52
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29937
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2020-07-13 23:32:27 +00:00
Gabe Black
1427fdb455 misc: Remove support for checking out as a mercurial repo.
This will still be technically possible with the right converters, but
this removes the tags, ignore file, and style checking hooks related to
mercurial. We no longer maintain a mercurial mirror of the main git
repository, and this support adds clutter and could diverge from the git
style hooks, etc, over time.

Change-Id: Icf4833c4f0fda51ea98989d1d741432ae3ddc6dd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31174
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-07-13 22:25:41 +00:00