derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matthew Poremba	4d336c0636	arch-vega: Implement buffer_atomic_cmpswap (#439 ) This is a standard compare and swap but implemented on vector memory buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with MUBUF's special address calculation). This was tested using a Tensile kernel, a backend for rocBLAS, which is used by PyTorch and Tensorflow. Prior to this patch both ML frameworks crashed. With this patch they both make forward progress. Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06	2023-10-12 07:33:40 -07:00
Matthew Poremba	4b7f25fcb6	arch-vega: Ignore s_setprio instruction instead of panic This instruction is used by ML frameworks to prioritize certain wavefronts. Since gem5 does not have any support for wavefront scheduling based on priority (besides wavefront age), we ignore this instruction and warn_once rather than calling panic. Since hardware can override this priority anyways, we can be sure that ignoring the value will not inhibit forward progress resulting in application hangs. Change-Id: Ic5eef14f9685dd2b316c5cf76078bb78d5bfe3cc	2023-10-11 15:55:16 -05:00
Matthew Poremba	4b85a1710e	arch-vega: Implement buffer_atomic_cmpswap This is a standard compare and swap but implemented on vector memory buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with MUBUF's special address calculation). This was tested using a Tensile kernel, a backend for rocBLAS, which is used by PyTorch and Tensorflow. Prior to this patch both ML frameworks crashed. With this patch they both make forward progress. Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06	2023-10-11 15:42:50 -05:00
Matthew Poremba	9f4d334644	gpu-compute: Update tokens for flat global/scratch Memory instructions acquire coalescer tokens in the schedule stage. Currently this is only done for buffer and flat instructions, but not flat global or flat scratch. This change now acquires tokens for flat global and flat scratch instructions. This provides back-pressure to the CUs and helps to avoid deadlocks in Ruby. The change also handles returning tokens for buffer, flat global, and flat scratch instructions. This was previously only being done for normal flat instructions leading to deadlocks in some applications when the tokens were exhausted. To simplify the logic, added a needsToken() method to GPUDynInst which return if the instruction is buffer or any flat segment. The waitcnts were also incorrect for flat global and flat scratch. We should always decrement vmem and exp count for stores and only normal flat instructions should decrement lgkm. Currently vmem/exp are not decremented for flat global and flat scratch which can lead to deadlock. This change set fixes this by always decrementing vmem/exp and lgkm only for normal flat instructions. Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5	2023-10-10 09:48:16 -05:00
Matthew Poremba	60f071d09a	gpu-compute,arch-vega: Implement flat scratch insts Flat scratch instructions (aka private) are the 3rd and final segment of flat instructions in gfx9 (Vega) and beyond. These are used for things like spills/fills and thread local storage. This commit enables two forms of flat scratch instructions: (1) flat_load/flat_store instructions where the memory address resolves to private memory and (2) the new scratch_load/scratch_store instructions in Vega. The first are similar to older generation ISAs where the aperture is unknown until address translation. The second are instructions guaranteed to go to private memory. Since these are very similar to flat global instructions there are minimal changes needed: - Ensure a flat instruction is either regular flat, global, XOR scratch - Rename the global op_encoding methods to GlobalScratch to indicate they are for both and are intentionally used. - Flat instructions in segment 1 output scratch_ in the disassembly - Flat instruction executed as private use similar mem helpers as global - Flat scratch cannot be an atomic This was tested using a modified version of the 'square' application: template <typename T> __global__ void scratch_square(T C_d, T A_d, size_t N) { size_t offset = (blockIdx.x * blockDim.x + threadIdx.x); size_t stride = blockDim.x * gridDim.x ; volatile int foo; // Volatile ensures scratch / unoptimized code for (size_t i=offset; i<N; i+=stride) { foo = A_d[i]; C_d[i] = foo * foo; } } Change-Id: Icc91a7f67836fa3e759fefe7c1c3f6851528ae7d	2023-08-26 13:40:12 -05:00
Matthew Poremba	90a518e885	gpu-compute,arch-vega: Fix ALU-only LDS counters There are a few LDS instructions that perform local ALU operations and writeback which are marked as loads. These are marked as loads because they fit in the pipeline logic better, according to a several year old comment. In the VEGA ISA these instructions (swizzle, permute, bpermute) are not decrementing the LDS load counter. As a result, the counter will gradually increase over time. Since wavefront slots are persistent, this can cause applications with a few thousand kernels to eventually hang thinking there are not enough resources. This changeset fixes this by decrementing the LDS load counter for these instructions. This fix was already integrated in the GCN3 ISA in the exact same way. This changeset moves it near a similar comment about scheduling register file writes. Change-Id: Ife5237a2cae7213948c32ef266f4f8f22917351c	2023-08-23 19:30:24 -05:00
Matthew Poremba	3589a4c11f	arch-vega: Implement translate further Starting with ROCm 5.4+, MI100 and MI200 make use of the translate further bit in the page table. This bit enables mixing 4kiB and 2MiB pages and is functionally equivalent to mixing page sizes using the PDE.P bit for which gem5 currently has support. With PDE.P bit set, we stop walking and the page size is equal to the level in the page table we stopped at. For example, stopping at level 2 would be a 1GiB page, stopping at level 3 would be a 2MiB page. This assumes most pages are 4kiB. When the F bit is used, it is assumed most pages are 2MiB and we will stop walking at the 3rd level of the page table unless the F bit is set. When the F bit is set, the 2nd level PDE contains a block fragment size representing the page size of the next PDE in the form of 2^(12+size). If the next page has the F bit set we continue walking to the 4th level. The block fragment size is hardcoded to 9 in the driver therefore we assert that the block fragment size must be 0 or 9. This enables MI200 with ROCm 5.4+ in gem5. This functionality was determine by examining the driver source code in Linux and there is no public documentation about this feature or why the change is made in or around ROCm 5.4. Change-Id: I603c0208cd9e821f7ad6eeb1d94ae15eaa146fb9	2023-07-30 13:17:05 -05:00
Matthew Poremba	618b2a60de	arch-vega, dev-amdgpu: Fix for memory leaks (#129 ) When using the new operator, delete should be called on any allocated memory after it's use is complete. Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019	2023-07-30 10:48:17 -07:00
Ranganath (Bujji) Selagamsetty	ede4d89a83	arch-vega, dev-amdgpu: Fix for memory leaks When using the new operator, delete should be called on any allocated memory after it's use is complete. Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019	2023-07-28 19:14:46 -05:00
Matthew Poremba	c722b0c73d	arch-vega: Fix vop2Helper scalar support A previous change added a vop2Helper to remove 100s of lines of common code from VOP2 instructions related to processing SDWA and DPP support. That change inadvertently changed the type of operand source 0 from const to non-const. The vector container operator[] does not allow reading a scalar value such as a constant, a dword literal, etc. The error shows up in the form of: assert(!scalar) in operand.hh. Since the SDWA and DPP cases need to modify the source vector and non-SDWA/DPP cases might require const, we make a non-const copy of the const source 0 vector and place it in a tempoary non-const vector. This non-const vector is passed to the lambda function implementation of the instruction. This prevents needing a const and non-const version of the lambda and avoids needing to propagate the template parameters through the various SDWA/DPP helper methods which seems like it will not work anyways as they need to modify the vector. As a result of this, as more VOP2 instructions are implemented using this helper,they will need to specify the const and non-const template parameters of the vector container needed for the instruction. Change-Id: Ia0b3c550d7de32b830040007a110f4821e3385aa	2023-07-28 13:47:55 -05:00
Bobby R. Bruce	753933d471	gpu-compute, tests: Fix GPU_X86 compilation, add compiler tests (#64 ) * gpu-compute: Remove use of 'std::random_shuffle' This was deprecated in C++14 and removed in C++17. This has been replaced with std::random. This has been implemented to ensure reproducible results despite (pseudo)random behavior. Change-Id: Idd52bc997547c7f8c1be88f6130adff8a37b4116 * dev-amdgpu: Add missing 'overrides' This causes warnings/errors in some compilers. Change-Id: I36a3548943c030d2578c2f581c8985c12eaeb0ae * dev: Fix Linux specific includes to be portable This allows for compilation in non-linux systems (e.g., Mac OS). Change-Id: Ib6c9406baf42db8caaad335ebc670c1905584ea2 * tests: Add 'VEGA_X86' build target to compiler-tests.sh Change-Id: Icbf1d60a096b1791a4718a7edf17466f854b6ae5 * tests: Add 'GCN3_X86' build target to compiler-tests.sh Change-Id: Ie7c9c20bb090f8688e48c8619667312196a7c123	2023-07-11 14:35:03 -07:00
Matthew Poremba	841e6fe978	arch-vega: Add Vega D16 decodings and fix V_SWAP_B32 Vega adds multiple new D16 instructions which load a byte or short into the lower or upper 16 bits of a register for packed math. The decoder table has subDecode tables for FLAT instructions which represents 32 opcodes in each subDecode table. The subDecode table for opcodes 32-63 is missing so it is added here. The opcode for V_SWAP_B32 is also off by one- In the ISA manual this instruction is opcode 81, the instruction before is 79, and there is no opcode 80, so the decoder entry is swapped with the invalid decoding below it. Change-Id: I278fea574ea684ccc6302d5b4d0f5dd8813a88ad Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71899 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-06-29 19:56:56 +00:00
Matthew Poremba	db903f4fd4	arch-vega: Helper methods for SDWA/DPP for VOP2 Many of the outstanding issues with the GPU model are related to instructions not having SDWA/DPP implementations and executing by ignoring the special registers leading to incorrect executiong. Adding SDWA/DPP is current very cumbersome as there is a lot of boilerplate code. This changeset adds helper methods for VOP2 with one instruction changed as an example. This review is intended to get feedback before applying this change to all VOP2 instructions that support SDWA/DPP. Change-Id: I1edbc3f3bb166d34f151545aa9f47a94150e1406 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70738 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-06-15 23:02:39 +00:00
Matthew Poremba	ae7476bcdc	arch-gcn3,arch-vega: Fix ds_read2st64_b32 This instruction has two issues. The first is that it should write two consecutive registers, starting with vdst because it is writing two dwords. The second is that the data assignment to the lanes from the dynamic instruction should cast to a U32 type otherwise the array index goes out of bounds and returns the wrong data. The first issue was fixed in GCN3 a few years ago in this review: https://gem5-review.googlesource.com/c/public/gem5/+/32236. This changeset makes the same change for Vega and applies the U32 cast in both ISAs. Tested with rocPRIM unit test. The test was failing before this changeset and now passes. Change-Id: Ifb110fc9a36ad198da7eaf86b1e3e37eccd3bb10 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70577 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-05-13 20:09:37 +00:00
Matthew Poremba	cd76f92c94	arch-vega: Add decodings for new MI100 VOP2 insts VOP2 with opcodes 55-61 were added in MI100 and are not in Vega10. This changeset adds the decodings for these instructions. The changeset does not implement the instructions, however the fatal message is much more helpful for debugging compared so a generic decode_invalid handler. Change-Id: Ibde0880c35ff915bf8e50772df9ce263e55ca893 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70042 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-04-28 00:48:35 +00:00
Matthew Poremba	f028bd55e0	arch-vega: Update API for some flat atomics Some recently submitted atomic instructions were using two older APIs. Update these to use the newer APIs to support all apertures and avoid compilation issue. Change-Id: Ibd6bc00177d33236946f54ef8e5c7544af322852 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67977 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-02-16 03:28:25 +00:00
Matthew Poremba	ea9239ae09	dev-amdgpu: Update deprecated ports Change-Id: Icbc5636c33b437c7396ee27363eed1cf006f8882 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67837 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-02-14 18:57:33 +00:00
Alexandru Dutu	bb8f370e4d	arch-vega: Implementing global_atomic_smax Change-Id: Id4053424c98eec1e98eb555bb35b48f0b5d2407b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64513 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-02-14 18:57:14 +00:00
Alexandru Dutu	8375058e73	arch-vega: Implementing global_atomic_smin Change-Id: Iffb366190f9e3f7ffbacde5dbb3abc97226926d4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64512 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2023-02-14 18:57:14 +00:00
Alexandru Dutu	d7516a26dc	arch-vega: Implementing global_atomic_or Change-Id: I13065186313ca784054956e1165b1b2fd8ce4a19 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64511 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-02-14 18:57:14 +00:00
Matthew Poremba	bc9e90d65e	arch-vega: Make VGPR-offset for global SGPR-base signed The VGPR-offset used when SGPR-base addressing is used can be signed in Vega. These are global instructions of the format: `global_load_dword v0, v1, s[0:1]`. This is not explicitly stated in the ISA manual however based on compiler output the offset can be negative. This changeset assigns the offset to a signed 32-bit integer and the compiler takes care of the signedness in the expression which calculates the final address. This fixes a bad address calculation in a rocPRIM unit test. Change-Id: I271edfbb4c6344cb1a6a69a0fd3df58a6198d599 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67412 Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu> Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2023-02-09 17:08:34 +00:00
Matthew Poremba	905b8ebd22	arch-vega: Implement ds_write_b8_d16_hi Writes a byte to the upper 16-bit input word to an address. Change-Id: I0bfd573526b9c46585d0008cde07c769b1d29ebd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67411 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-02-09 17:08:34 +00:00
Gabriel Busnot	7f4c92c910	mem,arch-arm,mem-ruby,cpu: Remove use of deprecated base port owner Change-Id: I29214278c3dd4829c89a6f7c93214b8123912e74 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67452 Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>	2023-02-03 06:11:45 +00:00
Matthew Poremba	03083ba5e3	arch-vega: Implement ds_write2st64_b64 Write two qwords at offsets multiplied by 8 * 64 bytes. Change-Id: I0d0e05f3e848c2fd02d32095e32b7f023bd8803b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67078 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2023-01-05 23:12:10 +00:00
Matthew Poremba	450bc254bd	arch-vega: Read one dword for SGPR base global insts Global instructions in Vega can either use a VGPR base address plus instruction offset or SGPR base address plus VGPR offset plus instruction offset. Currently the VGPR address/offset is always read as two dwords. This causes problems if the VGPR number is the last VGPR allocated to a wavefront since the second dword would be beyond the allocation and trip an assert. This changeset sets the operand size of the VGPR operand to one dword when SGPR base is used and two dwords otherwise so initDynOperandInfo does not assert. It also moves the read of the VGPR into the calcAddr method so that the correct ConstVecOperandU## is used to prevent another assertion failure when reading from the register file. These two changes are made to all flat instructions, as global instructions are a subsegement of flat instructions. Change-Id: I79030771aa6deec05ffa5853ca2d8b68943ee0a0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67077 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-01-05 23:12:10 +00:00
Matthew Poremba	3bfa220e4e	arch-vega: Implement ds_read_i8 Read one byte with sign extended from LDS. Change-Id: I9cb9b4033c6f834241cba944bc7e6a7ebc5401be Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67076 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-01-05 23:12:10 +00:00
Matthew Poremba	b83457df0b	arch-vega: Implement ds_add_u64 This instruction does an atomic add of an unsigned 64-bit data with a VGPR and value in LDS atomically without return. Change-Id: I6a7d6713b256607c4e69ddbdef5c83172493c077 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67075 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-01-05 23:12:10 +00:00
Matthew Poremba	892e3057f7	arch-vega: Implement ds_add_f32 atomic This instruction does an atomic add of a 32-bit float with a VGPR and value in LDS atomically without return. Change-Id: Id4f23a1ab587a23edfd1d88ede1cbcc5bdedc0cb Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67074 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-01-05 23:12:10 +00:00
Matthew Poremba	022a48f9f6	arch-vega: Implement ds_add_u32 atomic This instruction does an atomic add of unsigned 32-bit data with a VGPR and value in LDS atomically, without return. Change-Id: I87579a94f6200a9a066f8f7390e57fb5fb6eff8e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67072 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-01-03 22:53:30 +00:00
Matthew Poremba	e392603d81	arch-vega: Add missing operand size for ds_write2st64_b64 This instruction takes three operands (address, and two datas) but there were only operand sizes for two operands tripping assert in default case. Change-Id: I3f505b6432aee5f3f265acac46b83c0c7daff3e7 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67071 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-01-03 22:53:30 +00:00
Matthew Poremba	6449633827	arch-vega: Add DPP support for V_AND_B32 A DPP variant of V_AND_B32 was found in rocPRIM. With this changeset the unit tests for rocPRIM scan_inclusive are passing. Change-Id: I5a65f2cf6b56ac13609b191e3b3dfeb55e630942 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66753 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2023-01-03 22:53:30 +00:00
Matthew Poremba	bbdebc25da	arch-vega: Fix several issues with DPP DPP processing has several issues which are fixed in this changeset: 1) Incorrect comment is updated 2) newLane calculation for shift/rotate instructions is corrected 3) A copy of original data is made so that a copy of a copy is not made 4) Reset all booleans (OOB, zeroSrc, laneDisabled) after each lane iteration The shift, rotate, and broadcast variants were tested by implementing them in assembly and running on silicon. Change-Id: If86fbb26c87eaca4ef0587fd846978115858b168 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66752 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-01-03 22:53:30 +00:00
Matthew Poremba	f99a3c1f96	arch-vega: Fix signed BFE instructions The bitfield extract instructions come in unsigned and signed variants. The documentation on this is not correct, however the GCN3 documentation gives some clues. The instruction should extract an N-bit integer where N is defined in a source operand starting at some bit also defined by a source operand. For signed variants of this instruction, the N-bit integer should be sign extended but is currently not. This changeset does sign extension using the runtime value of N by ORing the upper bits with ones if the most significant bit is one. This was verified by writing these instructions in assembly and running on a real GPU. Changes are made to v_bfe_i32, s_bfe_i32, and s_bfe_i64. Change-Id: Ia192f5940200c6de48867b02f709a7f1b2daa974 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66751 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-01-03 22:53:30 +00:00
Matthew Poremba	8693d725e2	arch-vega: Fix SOPK instruction sign extends See: https://gem5-review.googlesource.com/c/public/gem5/+/37495 Same patch but for vega. This fixes issues with lulesh and probably rodinia - heartwall as well in fullsystem. Change-Id: I3af36bb9b60d32dc96cc3b439bb1167be1b0945d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65432 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-11-09 15:47:50 +00:00
Matthew Poremba	6a4a12ebbd	arch-vega: Improve non-native page size support Vega allows for any integer multiple of 4kB pages. However, the current implementation is designed for 4kB page primarily. In order to support variable page sizes, the physical address calculation needs to be updated to add the virtual page offset to the base physical address rather than bitwise-OR. Bitwise-OR assumes physical pages are at aligned to the page size which is generally not the case for very large pages (1GB+). This changeset changes all of the physical address computations to add the virtual offset to the physical page address. This fixes many GPUFS applications which use larger pages. The support was tested by hipMalloc'ing ~5GB to induce a large page being created. The test application now passes verification with this change. Change-Id: Ic8d1475e001def443f3e4ab609449bca0c40b638 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64751 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-10-31 14:30:13 +00:00
Matthew Poremba	e48285c244	arch-vega: Implement PDE2 and PDE1 as PTE Page directory entries (PDEs) can be interpreted as leaf node page table entries (PTEs) if the "p" bit is set. This is used for flexible page sizes in Vega. Currently there is only support for PDE level 0 entries which can be interpreted as 2MB pages. This changeset adds support for PDE1 and PDE2 which can be used to represent 1GB and 512GB pages. PDE1-as-PTE entries can be tested and were verified on applications by allocating >2GB of data. PDE0 is untested due to being too large for simulation, but the implementation is similar to PDE0 and PDE1. Change-Id: I801cbb5ec79110d57d2db760cc689c2e5778f9bb Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64451 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2022-10-13 20:17:00 +00:00
Matthew Poremba	925b1b5c8e	arch-vega: Implement V_XAD_U32 instruction Used in rodinia:heartwall. This instruction is new in vega and does not exist in GCN3. Change-Id: I6127290d1c85688a7f82e149e97762ca55e05fc6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63972 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-10-06 16:42:40 +00:00
Matthew Poremba	b919d9c5c9	arch-vega: Improve disasm for GLOBAL insts with scalar offset The previous print statement was not clear that a scalar offset was being used when printing disassembly, which made it slightly more difficult to track down bugs related to this (relatively) rare usage of global load/store instructions. This change improves the disassembly to closer match the output of hipcc's assembly code output. Change-Id: I8514aedacb5b1db93d0586c408c4cf1ce77a7db3 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63175 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-09-09 04:13:49 +00:00
Matthew Poremba	f91abb9770	arch-vega: Allow unaligned large host pages The virtual and physical address for device memory are typically aligned to the page size. On the host (x86), however, the physical address may not be aligned to page size for large page sizes when mixed with 4kB pages. As a result, the physical address calculation must add, rather than bitwise-OR, the virtual page offset to the physical page number. The virtual page offset on the GPU continues to use the variable page bytes for masking and shifting. Change-Id: I6563a1eb43d9b59577d32268b8645a7436304bcb Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63034 Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-09-09 04:13:49 +00:00
Alexandru Dutu	241023329d	arch-vega: DS_OR_B32 does not return data The DS_OR_B32 instruction should not return data. This changeset updates its implementation. Change-Id: Ib5fccdbb69cd1ad2639bbf54824dc363a3fbf599 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62656 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2022-09-01 15:16:37 +00:00
Gabe Black	f4209bbdee	misc: Remove lingering uses of TheISA::. Change-Id: Ie55e0d79867fbc8f75a993fb456a58c84de5def4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62196 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>	2022-08-20 07:30:16 +00:00
Gabe Black	2da4a2cdca	cpu: Stop including arch/isa.hh. This header file is no longer needed. Also fix some places where the isa header file was being transitively included. Change-Id: Ib9a9d7db0c9808b29d7614bbd68e2052ea345e9f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51239 Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-08-20 01:17:06 +00:00
Alexandru Dutu	c6b38909e1	gpu-compute: Adding support for LDS atomics This changeset is adding support for LDS atomics and implementing DS_OR_B32 instruction. Change-Id: I84c5cf6ce0e9494726dc7299f360551cd2a485f5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61791 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-08-19 16:44:31 +00:00
Bobby R. Bruce	787204c92d	python: Apply Black formatter to Python files The command executed was `black src configs tests util`. Change-Id: I8dfaa6ab04658fea37618127d6ac19270028d771 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47024 Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-08-03 09:10:41 +00:00
Matthew Poremba	f65f5a8981	gpu-compute,arch-vega: Overhaul HWRegs, setreg, getreg These instructions are supposed to be read/writing special shader hardware registers. Currently they are getting/setting to an SGPR. This results in getting incorrect registers at best and clobbering an SGPR being used by an application at worst. Furthermore, some registers need to be set in the shader and the application will never (can never) set them. This patch overhauls the getreg/setreg instructions to use different storage in the shader. The values will be updated either via setreg from an application (e.g., mode register) or set by a PM4 MAP_PROCESS. Change-Id: Ie5e5d552bd04dc47f5b35b5ee40a569ae345abac Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61655 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-07-28 14:10:33 +00:00
Matthew Poremba	5c7514c81c	arch-vega: Fix S_GETREG_B32 masking/shifting Here the mask should not be inverted. We also need to shift by the offset to remove the padding as the consumer of the value expects the offset to be removed. This can be easily tested by running a GPU kernel with __shared__ variables. This will generate the following assembly: s_getreg_b32 s6, hwreg(HW_REG_SH_MEM_BASES, 16, 16) The current implementation returns the lower 16 bits (private memory aperture) while the correct behavior is the uppter 16 bits (shared/LDS memory aperture). Change-Id: Iea8f0adceeadb24cdcf46ef4183fcaa8262ab9e7 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61654 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-07-28 14:10:33 +00:00
Matthew Poremba	618d16d6fc	arch-vega: Fix V_READFIRSTLANE_B32 This instruction appears to be the only VOP1 instruction that has a scalar destination using VDST as the destination register number. However, since VDST is only 8 bits it cannot encode all possible registers. Therefore, use the opcode to determine if the destination is a scalar or vector destination. This issue manifests as a VGPR dest being out of range for a kernel where the number of SGPRs is more than the number of VGPRs and the intended SGPR dest is larger than the count of VGPRs Change-Id: I95a7de1ddb97f7171f48331fed36aef776fa0cb4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61649 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-07-28 14:10:33 +00:00
Matthew Poremba	af9ecf7920	arch-vega,arch-gcn3: Add support for VCC_HI as scalar source Currently there is only support for VCC_LO as a scalar source. Add support for VCC_HI as well. The op selector symbol is also changed to be vcc_hi/vcc_lo as it is in disassembly from LLVM. Change-Id: I19ea8e23873049c33ffe2eb4ec8504a18f371c0e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61329 Reviewed-by: Matt Sinclair <mattdsinclair.wisc@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-07-18 17:27:50 +00:00
Matthew Poremba	3a73aa3ac1	arch-vega: Implement new VOP2 using VOP3 insts Vega adds three new VOP2 instructions that may use VOP3 encoding that are not part of the GCN3 ISA: v_add_u32, v_sub_u32, v_subrev_u32. This changeset implements those three new instructions to fix errors related to "invalid encoding" when those instructions are seen. Tested using srad from Rodinia 3.0 HIP port which compiles a v_add_u32 instruction with VOP3 encoding. Change-Id: I409a9f72f5c37895c3a0ab7ceb14a4dd121874a4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61330 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2022-07-15 14:55:33 +00:00
Matthew Poremba	40077055cf	arch-vega: Fix disassembly for two dword VOPC Calling opSelectorToRegSym in the disassembly for VOPC when there is a second dword (SDWA, DPP, or Literal) causes a panic as those registers do not have a string symbol. This is fixed by checking for a second dword before printing similar to how VOP1, VOP2, SOP1, etc. function. Change-Id: I97b33e1e45abcf3ff1d0bc5754773b4eee961a98 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61269 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-07-12 19:05:32 +00:00

1 2 3

131 Commits