derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matthew Poremba	03083ba5e3	arch-vega: Implement ds_write2st64_b64 Write two qwords at offsets multiplied by 8 * 64 bytes. Change-Id: I0d0e05f3e848c2fd02d32095e32b7f023bd8803b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67078 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2023-01-05 23:12:10 +00:00
Matthew Poremba	450bc254bd	arch-vega: Read one dword for SGPR base global insts Global instructions in Vega can either use a VGPR base address plus instruction offset or SGPR base address plus VGPR offset plus instruction offset. Currently the VGPR address/offset is always read as two dwords. This causes problems if the VGPR number is the last VGPR allocated to a wavefront since the second dword would be beyond the allocation and trip an assert. This changeset sets the operand size of the VGPR operand to one dword when SGPR base is used and two dwords otherwise so initDynOperandInfo does not assert. It also moves the read of the VGPR into the calcAddr method so that the correct ConstVecOperandU## is used to prevent another assertion failure when reading from the register file. These two changes are made to all flat instructions, as global instructions are a subsegement of flat instructions. Change-Id: I79030771aa6deec05ffa5853ca2d8b68943ee0a0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67077 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-01-05 23:12:10 +00:00
Matthew Poremba	3bfa220e4e	arch-vega: Implement ds_read_i8 Read one byte with sign extended from LDS. Change-Id: I9cb9b4033c6f834241cba944bc7e6a7ebc5401be Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67076 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-01-05 23:12:10 +00:00
Matthew Poremba	b83457df0b	arch-vega: Implement ds_add_u64 This instruction does an atomic add of an unsigned 64-bit data with a VGPR and value in LDS atomically without return. Change-Id: I6a7d6713b256607c4e69ddbdef5c83172493c077 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67075 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-01-05 23:12:10 +00:00
Matthew Poremba	892e3057f7	arch-vega: Implement ds_add_f32 atomic This instruction does an atomic add of a 32-bit float with a VGPR and value in LDS atomically without return. Change-Id: Id4f23a1ab587a23edfd1d88ede1cbcc5bdedc0cb Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67074 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-01-05 23:12:10 +00:00
Matthew Poremba	022a48f9f6	arch-vega: Implement ds_add_u32 atomic This instruction does an atomic add of unsigned 32-bit data with a VGPR and value in LDS atomically, without return. Change-Id: I87579a94f6200a9a066f8f7390e57fb5fb6eff8e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67072 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-01-03 22:53:30 +00:00
Matthew Poremba	e392603d81	arch-vega: Add missing operand size for ds_write2st64_b64 This instruction takes three operands (address, and two datas) but there were only operand sizes for two operands tripping assert in default case. Change-Id: I3f505b6432aee5f3f265acac46b83c0c7daff3e7 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67071 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-01-03 22:53:30 +00:00
Matthew Poremba	6449633827	arch-vega: Add DPP support for V_AND_B32 A DPP variant of V_AND_B32 was found in rocPRIM. With this changeset the unit tests for rocPRIM scan_inclusive are passing. Change-Id: I5a65f2cf6b56ac13609b191e3b3dfeb55e630942 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66753 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2023-01-03 22:53:30 +00:00
Matthew Poremba	bbdebc25da	arch-vega: Fix several issues with DPP DPP processing has several issues which are fixed in this changeset: 1) Incorrect comment is updated 2) newLane calculation for shift/rotate instructions is corrected 3) A copy of original data is made so that a copy of a copy is not made 4) Reset all booleans (OOB, zeroSrc, laneDisabled) after each lane iteration The shift, rotate, and broadcast variants were tested by implementing them in assembly and running on silicon. Change-Id: If86fbb26c87eaca4ef0587fd846978115858b168 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66752 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-01-03 22:53:30 +00:00
Matthew Poremba	f99a3c1f96	arch-vega: Fix signed BFE instructions The bitfield extract instructions come in unsigned and signed variants. The documentation on this is not correct, however the GCN3 documentation gives some clues. The instruction should extract an N-bit integer where N is defined in a source operand starting at some bit also defined by a source operand. For signed variants of this instruction, the N-bit integer should be sign extended but is currently not. This changeset does sign extension using the runtime value of N by ORing the upper bits with ones if the most significant bit is one. This was verified by writing these instructions in assembly and running on a real GPU. Changes are made to v_bfe_i32, s_bfe_i32, and s_bfe_i64. Change-Id: Ia192f5940200c6de48867b02f709a7f1b2daa974 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66751 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-01-03 22:53:30 +00:00
Matthew Poremba	8693d725e2	arch-vega: Fix SOPK instruction sign extends See: https://gem5-review.googlesource.com/c/public/gem5/+/37495 Same patch but for vega. This fixes issues with lulesh and probably rodinia - heartwall as well in fullsystem. Change-Id: I3af36bb9b60d32dc96cc3b439bb1167be1b0945d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65432 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-11-09 15:47:50 +00:00
Matthew Poremba	6a4a12ebbd	arch-vega: Improve non-native page size support Vega allows for any integer multiple of 4kB pages. However, the current implementation is designed for 4kB page primarily. In order to support variable page sizes, the physical address calculation needs to be updated to add the virtual page offset to the base physical address rather than bitwise-OR. Bitwise-OR assumes physical pages are at aligned to the page size which is generally not the case for very large pages (1GB+). This changeset changes all of the physical address computations to add the virtual offset to the physical page address. This fixes many GPUFS applications which use larger pages. The support was tested by hipMalloc'ing ~5GB to induce a large page being created. The test application now passes verification with this change. Change-Id: Ic8d1475e001def443f3e4ab609449bca0c40b638 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64751 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-10-31 14:30:13 +00:00
Matthew Poremba	e48285c244	arch-vega: Implement PDE2 and PDE1 as PTE Page directory entries (PDEs) can be interpreted as leaf node page table entries (PTEs) if the "p" bit is set. This is used for flexible page sizes in Vega. Currently there is only support for PDE level 0 entries which can be interpreted as 2MB pages. This changeset adds support for PDE1 and PDE2 which can be used to represent 1GB and 512GB pages. PDE1-as-PTE entries can be tested and were verified on applications by allocating >2GB of data. PDE0 is untested due to being too large for simulation, but the implementation is similar to PDE0 and PDE1. Change-Id: I801cbb5ec79110d57d2db760cc689c2e5778f9bb Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64451 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2022-10-13 20:17:00 +00:00
Matthew Poremba	925b1b5c8e	arch-vega: Implement V_XAD_U32 instruction Used in rodinia:heartwall. This instruction is new in vega and does not exist in GCN3. Change-Id: I6127290d1c85688a7f82e149e97762ca55e05fc6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63972 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-10-06 16:42:40 +00:00
Matthew Poremba	b919d9c5c9	arch-vega: Improve disasm for GLOBAL insts with scalar offset The previous print statement was not clear that a scalar offset was being used when printing disassembly, which made it slightly more difficult to track down bugs related to this (relatively) rare usage of global load/store instructions. This change improves the disassembly to closer match the output of hipcc's assembly code output. Change-Id: I8514aedacb5b1db93d0586c408c4cf1ce77a7db3 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63175 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-09-09 04:13:49 +00:00
Matthew Poremba	f91abb9770	arch-vega: Allow unaligned large host pages The virtual and physical address for device memory are typically aligned to the page size. On the host (x86), however, the physical address may not be aligned to page size for large page sizes when mixed with 4kB pages. As a result, the physical address calculation must add, rather than bitwise-OR, the virtual page offset to the physical page number. The virtual page offset on the GPU continues to use the variable page bytes for masking and shifting. Change-Id: I6563a1eb43d9b59577d32268b8645a7436304bcb Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63034 Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-09-09 04:13:49 +00:00
Alexandru Dutu	241023329d	arch-vega: DS_OR_B32 does not return data The DS_OR_B32 instruction should not return data. This changeset updates its implementation. Change-Id: Ib5fccdbb69cd1ad2639bbf54824dc363a3fbf599 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62656 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2022-09-01 15:16:37 +00:00
Gabe Black	f4209bbdee	misc: Remove lingering uses of TheISA::. Change-Id: Ie55e0d79867fbc8f75a993fb456a58c84de5def4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62196 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>	2022-08-20 07:30:16 +00:00
Gabe Black	2da4a2cdca	cpu: Stop including arch/isa.hh. This header file is no longer needed. Also fix some places where the isa header file was being transitively included. Change-Id: Ib9a9d7db0c9808b29d7614bbd68e2052ea345e9f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51239 Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-08-20 01:17:06 +00:00
Alexandru Dutu	c6b38909e1	gpu-compute: Adding support for LDS atomics This changeset is adding support for LDS atomics and implementing DS_OR_B32 instruction. Change-Id: I84c5cf6ce0e9494726dc7299f360551cd2a485f5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61791 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-08-19 16:44:31 +00:00
Bobby R. Bruce	787204c92d	python: Apply Black formatter to Python files The command executed was `black src configs tests util`. Change-Id: I8dfaa6ab04658fea37618127d6ac19270028d771 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47024 Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-08-03 09:10:41 +00:00
Matthew Poremba	f65f5a8981	gpu-compute,arch-vega: Overhaul HWRegs, setreg, getreg These instructions are supposed to be read/writing special shader hardware registers. Currently they are getting/setting to an SGPR. This results in getting incorrect registers at best and clobbering an SGPR being used by an application at worst. Furthermore, some registers need to be set in the shader and the application will never (can never) set them. This patch overhauls the getreg/setreg instructions to use different storage in the shader. The values will be updated either via setreg from an application (e.g., mode register) or set by a PM4 MAP_PROCESS. Change-Id: Ie5e5d552bd04dc47f5b35b5ee40a569ae345abac Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61655 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-07-28 14:10:33 +00:00
Matthew Poremba	5c7514c81c	arch-vega: Fix S_GETREG_B32 masking/shifting Here the mask should not be inverted. We also need to shift by the offset to remove the padding as the consumer of the value expects the offset to be removed. This can be easily tested by running a GPU kernel with __shared__ variables. This will generate the following assembly: s_getreg_b32 s6, hwreg(HW_REG_SH_MEM_BASES, 16, 16) The current implementation returns the lower 16 bits (private memory aperture) while the correct behavior is the uppter 16 bits (shared/LDS memory aperture). Change-Id: Iea8f0adceeadb24cdcf46ef4183fcaa8262ab9e7 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61654 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-07-28 14:10:33 +00:00
Matthew Poremba	618d16d6fc	arch-vega: Fix V_READFIRSTLANE_B32 This instruction appears to be the only VOP1 instruction that has a scalar destination using VDST as the destination register number. However, since VDST is only 8 bits it cannot encode all possible registers. Therefore, use the opcode to determine if the destination is a scalar or vector destination. This issue manifests as a VGPR dest being out of range for a kernel where the number of SGPRs is more than the number of VGPRs and the intended SGPR dest is larger than the count of VGPRs Change-Id: I95a7de1ddb97f7171f48331fed36aef776fa0cb4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61649 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-07-28 14:10:33 +00:00
Matthew Poremba	af9ecf7920	arch-vega,arch-gcn3: Add support for VCC_HI as scalar source Currently there is only support for VCC_LO as a scalar source. Add support for VCC_HI as well. The op selector symbol is also changed to be vcc_hi/vcc_lo as it is in disassembly from LLVM. Change-Id: I19ea8e23873049c33ffe2eb4ec8504a18f371c0e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61329 Reviewed-by: Matt Sinclair <mattdsinclair.wisc@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-07-18 17:27:50 +00:00
Matthew Poremba	3a73aa3ac1	arch-vega: Implement new VOP2 using VOP3 insts Vega adds three new VOP2 instructions that may use VOP3 encoding that are not part of the GCN3 ISA: v_add_u32, v_sub_u32, v_subrev_u32. This changeset implements those three new instructions to fix errors related to "invalid encoding" when those instructions are seen. Tested using srad from Rodinia 3.0 HIP port which compiles a v_add_u32 instruction with VOP3 encoding. Change-Id: I409a9f72f5c37895c3a0ab7ceb14a4dd121874a4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61330 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2022-07-15 14:55:33 +00:00
Matthew Poremba	40077055cf	arch-vega: Fix disassembly for two dword VOPC Calling opSelectorToRegSym in the disassembly for VOPC when there is a second dword (SDWA, DPP, or Literal) causes a panic as those registers do not have a string symbol. This is fixed by checking for a second dword before printing similar to how VOP1, VOP2, SOP1, etc. function. Change-Id: I97b33e1e45abcf3ff1d0bc5754773b4eee961a98 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61269 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-07-12 19:05:32 +00:00
Matt Sinclair	590719a383	arch-vega: explain when op encoder ignores src reg Previously `b40b361bee` added support for the Vega operand encoder. As part of this, it made sure to check for the S_GETPC_B64 instruction, which appears to be the only instruction in the Vega ISA that does not use the source register. However, at the time the commit used magic numbers without comment, which can be difficult for users to interpret. To resolve this, this commit adds a comment to explain where the magic numbers come from (Table 58 in the Vega ISA manual). Change-Id: Ic5007b510e0175558d21ede8eb6db273113187b2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/60650 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com>	2022-06-22 17:14:35 +00:00
Matt Sinclair	00008b725c	arch-vega: some Vega instructions don't use dest reg Some of the Vega scalar instructions (S_SETPC_B64, S_RFE_B64, S_CBRANCH_JOIN, and S_SET_GPR_IDX_IDX) do not use the SDST scalar destination register. However, Vega's operand encoding function for the SOP1 instruction type's class assumed all instructions used the destination register, which results in an assert failure for these instructions. To resolve this, this commit updates the Vega SOP1 operand encoder to ignore the destination register for these specific instructions. Change-Id: I2f0d830f6264fc7f47c0694a2fd5da5d33d2ea0b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/60649 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>	2022-06-22 17:14:35 +00:00
Michael Boyer	81058189af	arch-vega,arch-gcn3: Implement S_MEMTIME instruction Change-Id: I3e286eb6ff8af4097ad03d4066be79f73d938cea Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53603 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-06-21 20:19:46 +00:00
Charles Jamieson	7170c365be	arch-vega: implement S_GETREG_B32 instruction This commit adds support for the Vega GPU ISA's S_GETREG_B32 instruction. This work was done by Charles Jamieson but I am committing. Change-Id: Ic2e24f667ed1aec7b8b1404a06e17e7ffb192fba Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/60589 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-06-17 02:24:05 +00:00
Gabe Black	84ae0afa59	arch-x86: Put misc reg indexes into a name space. Also make them match the style guide. Change-Id: I845f141f85d4499a5acf56c2161240764906a232 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49758 Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Gabe Black <gabe.black@gmail.com>	2022-06-08 07:09:05 +00:00
Gabe Black	88143c940b	arch-x86: Convert segment indices to fit the style guide. Capitalize only their first letter, and use a namespace to namespace them instead of a SEGMENT_REG_ prefix. Change-Id: I69778c8d052ad6cc0ffd9e74dd1c643e9d28048d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49756 Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Gabe Black <gabe.black@gmail.com> Reviewed-by: Gabe Black <gabe.black@gmail.com>	2022-06-08 07:08:37 +00:00
Matthew Poremba	b64467025d	arch-vega: Implement SOP2 S_MUL_HI instructions Two new 32-bit signed and unsigned variants of S_MUL were added in gfx900 which operate similar to S_MUL expect they shift the product by 32 bits after multiplication. Tested with Histogram HIP-Sample and b+tree in rodinia 3.0 HIP port. Change-Id: I1bed32b17ccda7aa47f3b59528eb3304912d3610 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58473 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-11 17:06:43 +00:00
Matthew Poremba	e3f65393fd	dev-amdgpu,arch-vega: Implement TLB invalidation logic Add logic to collect pointers to all GPU TLBs in full system. Implement the invalid TLBs PM4 packet. The invalidate is done functionally since there is really no benefit to simulate it with timing and there is no support in the TLB to do so. This allow application with much larger data sets which may reuse device memory pages to work in gem5 without possibly crashing due to a stale translation being leftover in the TLB. Change-Id: Ia30cce02154d482d8f75b2280409abb8f8375c24 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58470 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-08 17:12:32 +00:00
Gabe Black	e6c0ba97db	scons: Put all config variables in an env['CONF'] sub-dict. This makes what are configuration and what are internal SCons variables explicit and separate, and makes it unnecessary to call out what variables to export to C++. These variables will also be plumbed into and out of kconfiglib in later changes. Change-Id: Iaf5e098d7404af06285c421dbdf8ef4171b3f001 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56892 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-28 20:31:21 +00:00
Matthew Poremba	dd90417211	arch-vega: Bypass Ruby for functional page walks Currently if a Ruby functional access fails to find an address in the caches, it gives up. For functional page table walks we need to be able to go all the way to memory. This adds a pointer to the system object which allows the walker to get a pointer to device memory which can be used to do a functional access directly to memory bypassing Ruby. Change-Id: I0ead6e5e130a0d53021c44ae9221b167c6316ab2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57529 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-25 19:51:29 +00:00
Matthew Poremba	9cbdf75295	dev-amdgpu: Add VM class for apertures, TranslationGens Create a VM class to reduce clutter in the amdgpu_device.* files. This new file is in charge of reading/writting MMIOs related to VM contexts and apertures. It also provides ranges checks for various apertures and breaks out the MMIO interface so that there are not overloaded macro definitions in the device MMIO methods. The new translation generator classes for the various apertures are also added to this class. Change-Id: Ic224c1aa485685685b1136a46eed50bcf99d2350 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53066 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-24 14:59:57 +00:00
Matthew Poremba	f64f05eff6	arch-vega: Mark global instructions executed as global The executed_as field is currently not set for global memory instructions. This results in the default of SC_NONE, causing the status vector to be all zeros. The GM pipe sees this and completes the instruction immediately rather than issuing memory requests. This is fixed by marking the instruction as executed as SC_GLOBAL always. Flat instructions use resolvedFlatSegment for this, however since global instructions are known to be global we can set this field directly. This results in the expected issuing of memory requests to GPU memory. Change-Id: Ic23102853ccd49a41e2f083b7bb24f033dfed18a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57829 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-18 22:44:28 +00:00
Gabe Black	72d67e6426	arch-vega: Replace deprecated Stats namespace recently reintroduced. The deprecated "Stats" namespace was recently reintroduced to the vega TLB code. Replace it with the new statistics namespace. Change-Id: Ie5daf288176ce7e8aadd27b84a70baf4cbc72dff Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57949 Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-18 20:19:37 +00:00
Matthew Poremba	539a2e2bcd	arch-vega: Add VEGA page tables and TLB Add the page table walker, page table format, TLB, TLB coalescer, and associated support in the AMDGPUDevice. This page table format used the hardware format for dGPU and is very different from APU/GCN3 which use the X86 page table format. In order to support either format for the GPU model, a common TranslationState called GpuTranslation state is created which holds the combined fields of both the APU and Vega translation state. Similarly the TlbEntry is cast at runtime by the corresponding arch files as they are the only files which touch the internals of the TlbEntry. The GPU model only checks if a TlbEntry is non-null and thus does not need to cast to peek inside the data structure. Change-Id: I4484c66239b48df5224d61caa6e968e56eea38a5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51848 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-17 00:11:14 +00:00
Kyle Roarty	5e721db9a2	arch-vega: Handle signed offsets in Global/Scratch instructions The offset field in Flat-style instructions is treated differently based on if the instruction is Flat or Global/Scratch. In Flat insts, the offset is treated as a 12-bit unsigned number. In Global/Scratch insts, the offset is treated as a 13-bit signed number. This patch updates the calcAddr function for Flat-style instructions to properly sign-extend the offset on Global/Scratch instructions Change-Id: I57f10258c23d900da9bf6ded6717c6e8abd177b7 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57209 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>	2022-03-01 21:14:38 +00:00
Gabe Black	5df52e0dca	arch-x86: Overhaul how address size is handled, particularly for stack. The stack size is something that applies to addresses when performing accesses as part of some instructions. This was handled inconsistently or incompletely or simply incorrectly in a few ways. First, when pushing or popping from the stack, the address size should be set to the stack size. The data size is generally the operand size. When the stack pointer is incremented/decremented, it should be changed by the data size. When a stack pointer is manipulated, the data size for those calculations should be the stack size. Importantly that does not change the value of the increment/decrement, which is the operand size still. This usage has been fixed throughout. The TLB generally needs to know what the address size was in order to figure out what segment offset was used so that it can do limit checks. There is some inherent inaccuracy in doing things in reverse like this, but that's how it works currently. To find that size, the TLB tried to start from first principles to figure out what the default address size was, and then whether there was an override was passed in through the request flags. This is very inaccurate for a few reasons. First, the override doesn't always apply. Second, the address size used by a particular instruction doesn't have to be based on any particular size, whether that is the default or alternate address size, the stack size, etc. Instead, the instructions now pass the actual size being used in as a 2 bit value (0 -> 1 byte, 1 -> 2 bytes, 2 -> 4 bytes, 3 -> 8 bytes), avoiding most of the inaccuracy and approximation. Because the CPU won't embed any size information into fetches, we'll just assume those have no wrap around within the address size. Finally, there were microops that had been added which overrode the address size to be the stack size internally, and try to help the TLB figure out what to do to figure out the address size. Because both of those things are now handled in a different way, those microops are no longer needed or used and have been deleted. Change-Id: I2b1bdf1acf1540bf643fac6d49fe1a5a576ba5c1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55443 Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Gabe Black <gabe.black@gmail.com> Reviewed-by: Gabe Black <gabe.black@gmail.com>	2022-02-26 01:58:23 +00:00
Matthew Poremba	faf3730559	arch-vega: Fix global 64-bit calcAddr with SGPR base Global instruction address calculation when using an SGPR or SGPR pair as a base address was being calculated incorrectly when 64-bit addresses were to be generated. From the ISA documentation, the SGPR should be read as 32-bit or 64-bit depending on "ADDRESS_MODE." The VGPR-offset (computed from the lower 32-bits of vaddr) should always be 32-bits and the offset is 12 bits from the instruction. This means the 32-bit mask should only be applied to vaddr to get the VGPU-offset rather than the final sum. The SGPR base format is being seen in more recent clang/ROCm versions to avoid unnecessary copies of SGPRs into VGPRs to use VGPRs as the base address. Change-Id: I48910611fcfac5b62bc63496bbaabd6f6e53fe0d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55643 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-01-20 16:03:23 +00:00
Matthew Poremba	3ecd28a222	arch-vega: Update FLAT memory access helpers to support LDS This patch ports the changes from a similar patch for arch-gcn3: https://gem5-review.googlesource.com/c/public/gem5/+/48343. Vega already has an helper function to send to the correct pipe depending on the scope, however the initMem helpers currently always assume global scope. In addition the MUBUF WBINVL1 instructions are updated similarly to the GCN3 patch. Change-Id: I612b9198cb56e226721a90e72bba64395c84ebcd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55465 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-01-18 15:20:10 +00:00
Matthew Poremba	ff17ecc177	arch-vega: Fix MUBUF out-of-bounds case 1 Ported from https://gem5-review.googlesource.com/c/public/gem5/+/51127: This patch updates the out-of-bounds check to properly check against the correct buffer_offset, which is different depending on if the const_swizzle_enable is true or false. Change-Id: I9757226e62c587b679cab2a42f3616a5dca97e60 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55464 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-01-18 15:20:10 +00:00
Matthew Poremba	0cb64ce9f0	arch-vega: Free dest registers in non-memory Load DS insts Ported from https://gem5-review.googlesource.com/c/public/gem5/+/48019: Certain DS insts are classfied as Loads, but don't actually go through the memory pipeline. However, any instruction classified as a load marks its destination registers as free in the memory pipeline. Because these instructions didn't use the memory pipeline, they never freed their destination registers, which led to a deadlock. This patch explicitly calls the function used to free the destination registers in the execute() method of those Load instructions that don't use the memory pipeline. Change-Id: I8231217a79661ca6acc837b2ab4931b946049a1a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55463 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-01-17 23:55:51 +00:00
Matthew Poremba	d6bd91a9fd	arch-vega: Implement large ds_read/write instructions Port large DS read/write instructions from https://gem5-review.googlesource.com/c/public/gem5/+/48342. This implements the 96 and 128b ds_read/write instructions in a similar fashion to the 3 and 4 dword flat_load/store instructions. These instructions are treated as reads/writes of 3 or 4 dwords, instead of as a single 96b/128b memory transaction, due to the limitations of the VecOperand class used in the amdgpu code. In order to handle treating the memory transaction as multiple dwords, the patch also adds in new initMemRead/initMemWrite functions for ds instructions. These are similar to the functions used in flat instructions for the same purpose. Change-Id: Iee2de14eb7f32b6654799d53dc97d806288af98f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55344 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-01-11 16:58:09 +00:00
Matthew Poremba	5a94e73d00	arch-vega: Validate if scalar sources are scalar gprs Port the fixes for scalar source checks from arch-gcn3 at https://gem5-review.googlesource.com/c/public/gem5/+/48344. Scalar sources can either be a general-purpose register or a constant register that holds a single value. If we don't check for if the register is a general-purpose register, it's possible that we get a constant register, which then causes all of the register mapping code to break, as the constant registers aren't supposed to be mapped like the general-purpose registers are. This fix adds an isScalarReg check to the instruction encodings that were missing it. Change-Id: I30dd2d082a5a1dcc3075843bcefd325113ed1df6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55343 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-01-11 16:58:09 +00:00
Kyle Roarty	f9deeea427	arch-gcn3,arch-vega: Select proper data on misaligned access req1->getSize() returns the size in bytes, but because we're using it in an array index, we need to scale it by the size of the data type. This ensures we give the second request the proper data. Change-Id: I578665406762d5d0c95f2ea8297c362e1cc0620b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/54503 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>	2021-12-20 18:28:08 +00:00

1 2 3

108 Commits