derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Xianwei Zhang	c2641eec89	arch-gcn3: add support of 64-bit SOPK instruction s_setreg_imm32_b32 is a 64-bit instruction, using a 32-bit literal constant. Related functions are added to support decoding the second dword. Change-Id: I290f8578f726885c137dbfac3773035f814e0a3a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29942 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com>	2020-07-16 20:37:22 +00:00
Matt Sinclair	3e84a8d710	arch-gcn3: ensure that atomics follow HSA conventions Add asserts to make sure atomics are following the HSA conventions that atomics should be word aligned (i.e., can't be byte aligned) and should not be misaligned such that a given lane's access spans multiple cache lines. Change-Id: Ia48758b9ed96764864234dc607f337e30e287d1c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29941 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-16 20:37:22 +00:00
Alexandru Dutu	07fcbf16fc	arch-gcn3: Implementation of flat atomic swap instruction Change-Id: I9b9042899e65e8c9848b31c509eb2e3b13293e52 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29937 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-13 23:32:27 +00:00
Michael LeBeane	6747b127af	arch-gcn3: Fix VOP2 dissasembly prints VOP2 prints VSRC1 register index as hex instead of decimal if the instruction contains a literal operand. This patch resets the format specifiers in the stream to print the register correctly. Change-Id: Icc7e6588b3c5af545be6590ce412460e72df253f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29936 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>	2020-07-13 19:48:12 +00:00
Michael LeBeane	ed7daa10aa	arch-gcn3, gpu-compute: Implement out-of-range accesses Certain buffer out-of-range memory accesses should be special cased and not generate memory accesses. This patch implements those special cases and supresses lanes from accessing memory when the calculated address falls in an ISA-specified out-of-range condition. Change-Id: I8298f861c6b59587789853a01e503ba7d98cb13d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29935 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>	2020-07-13 19:48:00 +00:00
Michael LeBeane	f8e295922b	arch-gcn3: Fix writelane src0,src1 usage Src1 should only be used for lane select. The data should come from src0. Change-Id: Ibe960df2e56d351a3819b40194104d2972a5cd4c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29933 Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-13 19:47:47 +00:00
Matt Sinclair	3846e90737	arch-gcn3: fix bits that SDWA selects This commit fixes a bug in 200f2408 where the SDWA support was selecting bits backwards. As part of this commit, to help resolve this problem in the future, I have added asserts in the helper functions in bitfield.hh to ensure that the number of bits aren't negative. Change-Id: I4b0ecb0e7c110600c0b5063101b75f9adcc512ac Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29931 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>	2020-07-13 16:19:47 +00:00
Michael LeBeane	22190c0165	arch-gcn3: Fix V_MAD_I32_I24 sign extension We are not properly sign extending the bits we hack off for V_MAD_I32_I24. This fixes rnn_fwdBwd 64 1 1 lstm pte assertion failure. Change-Id: I2516e5715227cbd822e6a62630674f64f7a109e0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29928 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-22 16:14:35 +00:00
Tony Gutierrez	ccee639904	arch-gcn3, gpu-compute: Fix issue when reading const operands Currently, when an instruction has an operand that reads a const value, it goes thru the same readMiscReg() api call as other misc registers (real HW registers, not constant values). There is an issue, however, when casting from the const values (which are 32b) to higher precision values, like 64b. This change creates a separate, templated function call to the GPU's ISA state that will return the correct type. Change-Id: I41965ebeeed20bb70e919fce5ad94d957b3af802 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29927 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-22 16:14:35 +00:00
Alexandru Dutu	8c3e9a19d5	arch-gcn3: Updating implementation of atomics This changeset is moving the access of the data operand from initiateAcc to the execute method of atomic instructions. Change-Id: I1debae302f0b13f79ed2b7a9ed2f6b07fcec5128 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29926 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-22 16:14:35 +00:00
Xianwei Zhang	d80f4a4004	arch-gcn3: Implement instruction v_div_fixup_f32 Instruction v_div_fixup_f32 was unimplemented. The implementation was added by mimicking v_div_fixup_f64. Change-Id: I9306b198f327e9fde3414aa1bb2bec20503b1efd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29924 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-19 20:42:32 +00:00
Xianwei Zhang	fb7796933e	arch-gcn3: Implement instruction v_div_fmas_f32 Instruction v_div_fmas_f32 was unimplemented. The implementation was added by mimicking v_div_fmas_f64. Change-Id: I262820a7a66877d140eb99b538715c3cae4d1860 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29923 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-19 20:42:18 +00:00
Matt Sinclair	c1ea14de44	arch-gcn3: fix bug with SDWA support Instructions that use the SDWA field need to use the extra SRC0 register associated with the SDWA instruction instead of the "default" SRC0 register, since the default SRC0 register contains the SDWA information when SDWA is being used. This commit fixes 15de044c to take this into account. Additionally, this commit removes reads of the registers from the SDWA helper functions, since they overwrite any changes made to the destination register. Finally, this change modifies the instructions that use SDWA to simplify the flow through the execute() functions. Change-Id: I3bad83133808dfffc6a4c40bbd49c3d76599e669 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29922 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-19 20:41:59 +00:00
Matt Sinclair	8177fc4392	arch-gcn3: add support for unaligned accesses Previously, with HSAIL, we were guaranteed by the HSA specification that the GPU will never issue unaligned accesses. However, now that we are directly running GCN this is no longer true. Accordingly, this commit adds support for unaligned accesses. Moreover, to reduce the replication of nearly identical code for the different request types, I also added new helper functions that are called by all the different memory request producing instruction types in op_encodings.hh. Adding support for unaligned instructions requires changing the statusBitVector used to track the status of the memory requests for each lane from a bit per lane to an int per lane. This is necessary because an unaligned access may span multiple cache lines. In the worst case, each lane may span multiple cache lines. There are corresponding changes in the files that use the statusBitVector. Change-Id: I319bf2f0f644083e98ca546d2bfe68cf87a5f967 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29920 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-19 20:41:18 +00:00
Xianwei Zhang	fbcdf880ee	arch-gcn3: Implement instruction v_div_scale_f32 Instruction v_div_scale_f32 was unimplemented, the implementation was added by mimicking v_div_scale_f64. Change-Id: I89cdfd02ab01b5936de0e9f6c41e7f3fc4f10ae1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29919 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-19 20:40:58 +00:00
Xianwei Zhang	2c1e9c4e81	gpu-compute: enable flexible control of kernel boundary syncs Kernel end release was turned on for VIPER protocol, which is in fact write-through based and thus no need to have release operation. This changeset splits the option 'impl_kern_boundary_sync' into 'impl_kern_launch_acq' and 'impl_kern_end_rel', and turns off release on VIPER. Change-Id: I5490019b6765a25bd801cc78fb7445b90eb02a3d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29917 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-19 20:40:05 +00:00
Tony Gutierrez	9d51dec937	arch, gpu-compute: Remove HSAIL related files Change-Id: Iefba0a38d62da7598bbfe3fe6ff46454d35144b1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28410 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-17 02:53:47 +00:00
Tony Gutierrez	b8da9abba7	gpu-compute, mem-ruby, configs: Add GCN3 ISA support to GPU model Change-Id: Ibe46970f3ba25d62ca2ade5cbc2054ad746b2254 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29912 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-15 22:45:17 +00:00
Tony Gutierrez	94f15bd3f7	arch-gcn3: Add files for arch gcn3 (GPU machine ISA) Decoder: gpu_decoder.hh and decoder.cc: The decoder is defined in these files. The decoder is implemented as a lookup table of function pointers where each decode function will decode to a unique ISA instruction, or do some sub-decoding to infer the next decode function to call. The format for each OP encoding is defined in the header file. Registers: registers.[hh\|cc] define the special registers and operand selector values, which are used to map operands to registers/special values. many convenience functions are also provides to determine the source/type of an operand, for example vector vs. scalar, register operand vs. constant, etc. GPU ISA: Some special GPU ISA state is maintained in gpu_isa.hh and isa.cc. This class is used to hold some special registers and values that can be used as operands by ISA instructions. Eventually more ISA-specific state should be moved here, and out of the WF class. Vector Operands: The operands for GCN3 instructions are defined in operand.hh. This file defines both scalar and vector operands wth GCN3 specific semantics. The vector operand class is desgned around the generic vec_reg.hh that is already present in gem5. Instructions: The GCN3 instructions are defined and implemented throughout gpu_static_inst.[hh\|cc], instructions.[hh\|cc], op_encodings.[hh\|cc], and inst_util.hh. GCN3 instructions all fall under one of the OP encoding types; for example scalar memory operands are of the type SMEM, vector ALU instructions can be VOP3, VOP2, etc. The base code common to all instructions of a certain OP encoding type is implemented in the OP encodings files, which includes operand information, disassembly methods, encoding type, etc. Each individual ISA isntruction is implemented as a class object in instructions.[hh\|cc] and are derived from one of the OP encoding types. The instructions.cc file is primarily for the execute() methods of each individual instruction, and the header file provides the class definition and a few instruction specific API calls. Note that these instruction classes were auto-generated but not using the gem5 ISA description language. A custom ISA description was used and that cannot be released publicly, therefore we are providing them already in C++. Change-Id: I14d2a02d6b87109f41341c8f50a69a2cca9f3d14 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28127 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-04-30 15:54:38 +00:00

19 Commits