derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matthew Poremba	8249d6d1cd	arch-vega: Remove FP asserts in VOP3 lane manip insts The VOP3 instruction encoding generally states that ABS/NEG modifiers in the instruction encoding are only valid on floating point data types. This is currently coded in gem5 to mean floating point instructions. For untyped instructions like V_CNDMASK_B32, we don't actually know what the data type is. We must trust that the compiler did not attempt to apply these bits to non-FP data types. This commit simply removes the asserts. The ABS/NEG modifiers are therefore ignored which is consistent with the ISA documentation. This is done on the lane manipulation instructions V_CNDMASK_B32, V_READLINE_B32, and V_WRITELANE_B32 which are typically used to mask off or move data between registers. Other bitwise instructions (e.g., V_OR_B32) keep the asserts as bitwise operations on FP types are genernally illegal in languages like C++. Change-Id: I478c5272ba96383a063b2828de21d60948b25c8f	2024-05-03 14:31:17 -07:00
Matthew Poremba	0faa9510f9	arch-vega,gpu-compute: Fix misc ubsan runtime errors Three main fixes: - Remove the initDynOperandInfo. UBSAN errors and exits due to things not being captured properly. After a few failed attempts playing with the capture list, just move the lambda to a new method. - Invalid data type size for some thread mask instructions. This might actually have caused silent bugs when the thread id was > 31. - Alignment issues with the operands. Change-Id: I0297e10df0f0ab9730b6f1bd132602cd36b5e7ac	2024-05-03 14:26:46 -07:00
Matthew Poremba	3db6e86fea	arch-vega: Fix string check warnings on fast build gem5.fast does not currently build if the GPU model is built. This fixes the array-bounds warnings allowing gem5.fast to build again. Change-Id: I463c2847c3ecfd2257a70418fa247090b0493f9b	2024-04-14 12:22:57 -07:00
Matthew Poremba	78cf39bf63	arch-vega: Operand selectors for accumulation registers (#955 ) AMD's MI100 introduced a new register file called accumulation registers for the matrix cores. In MI200 these were recombined into the same register file according to the documentation. The accumulation register file is the same size as the architectural register file, hence the size is doubled. The ISA spec does not explicitly state the register selector values, however it does say that the accumulation offset from the kernel dispatch packet should be added to the architecture register file selector number when an instruction sets the ACC bit. Therefore we can infer that the value must simply be an extension beyond the architectural VGPRs. This fixes errors of the form "invalid register selector: 512" (or higher value). This was tested with the Learn the Basics tutorial example on pytorch.org Change-Id: I48ced1532fc166d2f5032fe21fbeba70ac77f258	2024-04-01 08:45:37 -07:00
Matthew Poremba	7d62da6d10	dev-amdgpu: Support for ROCm 6.0 (#926 ) Implement several features new in ROCm 6.0 and features required for future devices. Includes the following: - Support for multiple command processors - Improve handling of unknown register addresses - Use AddrRange for MMIO address regions - Handle GART writes through SDMA copy - Implement PCIe indirect reads and writes - Improve PM4 write to check dword count - Implement common MI300X instruction	2024-03-21 21:12:09 -07:00
Matthew Poremba	dca040983b	arch-vega: Various vega fixes to enable nanogpt (#950 ) This PR fixes some issues observed that were needed to get nanogpt working.	2024-03-21 21:11:44 -07:00
Michael Boyer	803dbbfdac	arch-vega: Implement flat_load_sbyte instruction (#953 ) Change-Id: I642a71c504e2d3afecd5d2dfd9db016945aed21b	2024-03-21 21:11:10 -07:00
Matthew Poremba	9ab004cccc	arch-vega: Implement V_LSHL_ADD_U64 This is a new instruction in MI300 and operates similar to V_LSHL_ADD_U32 but on 64-bit values. Change-Id: Ia4ac65160bdad748fccdcb28286ba03157cc4046	2024-03-21 10:10:01 -05:00
Matthew Poremba	f36be791aa	arch-vega: Expand FLAT subDecode range in main decoder The main decoder for GPU instructions looks at the first 9 bits of a dword to determine either the instruction or a subDecode table with more information for specific instructions types. For flat instructions the first 9 bits currently consist of 6 fixed encoding bits, a reserved bit, and the first two bits of the opcode. Hence to support all opcodes there are four indirections to the flat subDecode table. In MI300 the reserved bit is part of a field to determine memory scope and therefore may be non-zero. This commit adds four addition calls to the subDecode table for the cases where the scope bit is 1. See page 468 (PDF page 478) below: https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/ instruction-set-architectures/ amd-instinct-mi300-cdna3-instruction-set-architecture.pdf Change-Id: Ic3c786f0ca00a758cbe87f42c5e3470576f73a32	2024-03-21 10:10:01 -05:00
Matthew Poremba	e02f329d5d	arch-vega: Fix VOP3 decode table off-by-one There is no VOP3 opcode 667. Mark that invalid and move the opcodes after down by one. Change-Id: Ia4ccda91f6f501c1ce7c5898d7d0e924604a459a	2024-03-20 16:41:31 -05:00
Matthew Poremba	457d97ea52	arch-vega: Implement V_XNOR_B32 Change-Id: Id23a8d984f227ca23a92adb6c7fde3b4627af054	2024-03-20 16:37:37 -05:00
Matthew Poremba	1b15b2cc4b	arch-vega: Support negative modifiers for packed F32 math MI200 adds support for four FP32 packed math instructions. These are VOP3P instructions which have a negative input modifier field. The description made it unclear if these were used for F32 packed math however the assembly of some Tensile kernels are using these modifiers therefore adding support for them. Tested with PyTorch nn.Dropout kernel which is using negative modifiers. Change-Id: I568a18c084f93dd2a88439d8f451cf28a51dfe79	2024-03-20 16:37:23 -05:00
Matthew Poremba	3f8d0e1ef8	arch-vega: Fix V_FMAC_F32 data type The datatype is U32 but should be F32. This is causing an implicit cast leading to incorrect results. This fixes nn.Dropout in PyTorch. Change-Id: I546aa917fde1fd6bc832d9d0fa9ffe66505e87dd	2024-03-20 16:37:23 -05:00
Matthew Poremba	2ca7f48828	arch-vega: Accumulation offset for existing MFMA insts This commit update the two exiting MFMA instructions to support the accumulation offset for A, B, and C/D matrix. Additionally uses array indexed C/D matrix registers to reduce duplicate code. Future MFMA instructions have up to 16 registers for C/D and this reduces the amount of code being written. Change-Id: Ibdc3b6255234a3bab99f115c79e8a0248c800400	2024-02-26 14:30:50 -06:00
Matthew Poremba	e0e65221b4	arch-vega: Use accum offset for v_accvgpr_read/write The accum offset is used as an index into the unified VGPR register file in MI200 and is not the same as a move if accum_offset in the dispatch packet is non-zero. Change these instructions to use the stored accum_offset value. Change-Id: Ib661804f8f5b8392e4c586082c423645f539e641	2024-02-26 12:57:09 -06:00
Vishnu Ramadas	8054459df6	arch-vega: Add support for S_ICACHE_INV instruction Previously, the S_ICACHE_INV instruction was unimplemented and simulation panicked if it was encountered. This commit adds support for executing the instruction by injecting a memory barrier in the scalar pipeline and invalidating the ICACHE (or SQC) Change-Id: I0fbd4e53f630a267971a23cea6f17d4fef403d15	2024-02-09 12:19:08 -06:00
Ivana Mitrovic	24e0d71034	arch-gcn3: Remove gcn3 (#781 ) Related to issue #703 , this PR removes GCN3 related files and updates source code, documentation, and tests to switch over to Vega is that was not done already. Highlights are: - Remove all src/arch/amdgpu/gcn3 files and update Kconfigs. - Remove references to GCN3 and replace with Vega where applicable. - Update the build targets in the gcn-gpu Docker. This will need to be rebuilt but not urgently. - Remove the GCN3 tag in testlib. Most tests seem to be using Vega already, so that commit is small.	2024-01-25 10:14:46 -08:00
Matthew Poremba	44c78d843c	arch-vega: Implement memory aperture operands (#803 ) Vega (gfx900) introduced new memory aperture registers to get the base address and limit for LDS and private (scratch) memory. These have not commonly been used by the compiler until ROCm 6. Now that the compiler is generating reads from these special registers, implement the support for them. Tested with LULESH which is using the SHARED_BASE register (LDS) with ROCm 6.0. This assembly seems to replace S_GETREG_B32 emitted by the ROCm 5 compiler. Change-Id: Id2bd26ce8ef687c84a647fa2ac2da54d657913e5	2024-01-24 11:19:43 -08:00
Matthew Poremba	dfafc5792a	arch-vega: Remove deleted instruction.cc from build (#801 ) Change-Id: I03073d35a0d36788dfe8309e6ed466d0a496e31e	2024-01-23 18:47:01 -08:00
Matthew Poremba	a5757e7e01	arch-vega: Rename mismatched source/header files The files registers.cc, isa.cc, and decoder.cc do not match the header name. This is a minor cleanup to make development more straightforward. Change-Id: Ibab18dfe315b0ce84359939b490f8227ea43cac0	2024-01-19 13:32:24 -06:00
Matthew Poremba	cd91c6321f	arch-vega: Reorganize instructions to multiple files The Vega instructions.cc file is 47k lines long which results in both large compilation times whenever it is modified and long style check times. This makes iterating over more complex instruction implementations very time consuming. This commit moves the instruction definitions to multiple files based on the instruction encoding (SOP2, VOP2, FLAT, DS, etc.). The resulting files are much smaller (max is 8k lines) and compilation and style check times are much more reasonable. Other than moving code around, there are no functional changes in this commit. Change-Id: Id4ac8e98ef11a58de5fd328f8a0cd7ce60a11819	2024-01-19 13:32:24 -06:00
Matthew Poremba	57fb083f43	arch-gcn3: Remove all GCN3 files Change-Id: Ib7d9e8676a31e51a330e68d81099580e2509a90a	2024-01-17 10:44:44 -06:00
Matthew Poremba	70376d43a3	arch-vega: Fix upsize cast error in newer compilers (#774 ) Newer compilers error on -Warray-length in the recent MI200 patches due to casting from a 32-bit data type to a 64-bit type. Change it to cast the 32-bit integer first then 64-bit integer latter to remove the warning. Rerun of validation tests on the three instructions passed. Change-Id: I0309e5f7b5b8cc8ce1651660ddddb120fa6e7666	2024-01-16 09:41:23 -08:00
Matthew Poremba	31e63b01ad	arch-vega: Add vop3p DOT instructions Implemented according to the ISA spec. Validated with silion. In particular the sign extend is important for the signed variants and the unsigned variants seem to overflow lanes (hence why there is no mask() in the unsigned varints. FP16 -> FP32 continues using ARM's fplib. Tested vs. an MI210. Clamp has not been verified. Change-Id: Ifc09aecbc1ef2c92a5524a43ca529983018a6d59	2024-01-03 15:41:06 -06:00
Matthew Poremba	420cda1bef	arch-vega: Implement FP32 packed math Starting with MI200, packed math can operate on double dword inputs. In this case, 64-bits of inputs (two VGPRs per lane) contain two FP32 values. Add instructions to perform add, multiply, and FMA on packed FP32 types. Change-Id: Ib838bff91a10e02e013cc7c33ec3d91ff08647b0	2024-01-03 15:41:06 -06:00
Matthew Poremba	7b0c47d52f	arch-vega: Implement all global atomics up to gfx90a This change adds all of the missing flat/global atomics up to including the new atomics in gfx90a (MI200). Adds all decodings and instruction implementations with the exception of __half2 which does not have a corresponding data type in gem5. This refactors the execute() and completeAcc() methods by creating helper functions similar to what initiateAcc() uses. This reduces redundant code for global atomic instruction implementations. Validated all except PK_ADD_F16, ADD_F32, and ADD_F64 which will be done shortly. Verified the source/dest register sizes in the header are correct and the template parameters for the new execute()/completeAcc() methods are correct. Change-Id: I4b3351229af401a1a4cbfb97166801aac67b74e4	2024-01-03 15:41:06 -06:00
Matthew Poremba	472c697d88	arch-vega: Implement v_mfma_i32_16x16x16i8 Tested using AMD labs notes examples located on github: https://github.com/amd/amd-lab-notes/blob/release/matrix-cores/ src/mfma_i32_16x16x16i8.cpp Change-Id: Ib0e50162288528012b6d3395e1f629ebf12e8e54	2024-01-03 15:41:06 -06:00
Matthew Poremba	7e1b27969f	arch-vega: Improve FLAT disassembly Use the opSelectorToRegSym which will print the full range of VGPRs (e.g., will now print v[2:3] instead of v2 when the source / dest is 64-bits). Fixes atomic disassembly prints. Now shows "glc" if GLC bit is enabled. Fixes some VGPR fields being printed as an SGPR in places where the 9-bit register index bit is implied (e.g., VDST). This makes it easier to use a GPUExec trace to match with LLVM disassembly when debugging. Change-Id: Ia163774850f0054243907aca8fc8d0361e37fdd5	2024-01-03 10:40:34 -06:00
Matthew Poremba	bc69ab0a1f	arch-vega: Add VOP3P encodings and packed 16b insts This adds the VOP3P and VOP3P_MAI encodings from the MI200 spec. These instructions are used for packed math and miSIMD instructions. The first 19 VOP3P opcodes are implemented and validated against hardware. This includes all instructions which operate on one dword containing two packed 16-bit values of fp16, int16_t, or uint16_t. Implement one MFMA instruction for now which was also validated against hardware.	2024-01-03 10:40:34 -06:00
Bobby R. Bruce	d11c40dcac	misc: Run `pre-commit run --all-files` This ensures `isort` is applied to all files in the repo. Change-Id: Ib7ced1c924ef1639542bf0d1a01c5737f6ba43e9	2023-11-29 22:06:41 -08:00
Bobby R. Bruce	d4b7c8a26d	Merge branch 'develop' into develop-kconfig	2023-11-27 09:39:08 -08:00
Matthew Poremba	cc9f81b08a	arch-vega,arch-gcn3: Bugfix V_PERM_B32 and V_OR3_B32 (#599 ) The V_PERM_B32 instruction is selecting the correct byte, but is shifting into place moving by bits instead of bytes. The V_OR3_B32 instruction is calling the wrong instruction implementation in the decoder. This patch fixes both issues plus a bonus fix for GCN3's V_PERM_B32. (GCN3 does not have V_OR3_B32). Change-Id: Ied66c43981bc4236f680db42a9868f760becc284	2023-11-26 23:22:01 -08:00
Gabe Black	db3a6e8e84	scons: Use Kconfig to configure gem5. These are not yet consumed by anything, but convert all the settings from SCons variables to Kconfig variables. If you have existing SConsopts files which need to be converted, you should take a look at KCONFIG.md to learn about how kconfig is used in gem5. You should decide if any variables need to be available to C++ or kconfig itself, and whether those are options which should be detected automatically, or should be up to the user. Options which should be measured automatically should still be in SConsopts files, while user facing options should be added to new or existing Kconfig files. Generally, make sure you're storing c++/kconfig visible options in env['CONF'][...]. Also remove references to sticky_vars since persistent options should now be handled with kconfig, and export_vars since everything in env['CONF'] is now exported automatically. Switch SCons/gem5 to use Kconfig for configuration, except EXTRAS which is still a sticky SCons variable. This is necessary because EXTRAS also controls what config options exist. If it came from Kconfig itself, then there would be a circular dependency. This dependency could theoretically be handled by reparsing the Kconfig when EXTRAS directories were added or removed, but that would be complicated, and isn't supported by kconfiglib. It wouldn't be worth the significant effort it would take to add it, just to use Kconfig more purely. Change-Id: I29ab1940b2d7b0e6635a490452d05befe5b4a2c9	2023-11-23 08:26:10 +08:00
Matt Sinclair	75ca2c4282	gpu-compute: Fix typo with GPUTLB print (#529 ) gpu-compute: Fix typo with GPUTLB print Print was not properly ending in a newline, which caused confusion when looking a trace with GPUTLB enabled. This fixes that.	2023-11-13 14:40:27 -08:00
Matthew Poremba	4d336c0636	arch-vega: Implement buffer_atomic_cmpswap (#439 ) This is a standard compare and swap but implemented on vector memory buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with MUBUF's special address calculation). This was tested using a Tensile kernel, a backend for rocBLAS, which is used by PyTorch and Tensorflow. Prior to this patch both ML frameworks crashed. With this patch they both make forward progress. Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06	2023-10-12 07:33:40 -07:00
Matthew Poremba	4b7f25fcb6	arch-vega: Ignore s_setprio instruction instead of panic This instruction is used by ML frameworks to prioritize certain wavefronts. Since gem5 does not have any support for wavefront scheduling based on priority (besides wavefront age), we ignore this instruction and warn_once rather than calling panic. Since hardware can override this priority anyways, we can be sure that ignoring the value will not inhibit forward progress resulting in application hangs. Change-Id: Ic5eef14f9685dd2b316c5cf76078bb78d5bfe3cc	2023-10-11 15:55:16 -05:00
Matthew Poremba	4b85a1710e	arch-vega: Implement buffer_atomic_cmpswap This is a standard compare and swap but implemented on vector memory buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with MUBUF's special address calculation). This was tested using a Tensile kernel, a backend for rocBLAS, which is used by PyTorch and Tensorflow. Prior to this patch both ML frameworks crashed. With this patch they both make forward progress. Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06	2023-10-11 15:42:50 -05:00
Matthew Poremba	9f4d334644	gpu-compute: Update tokens for flat global/scratch Memory instructions acquire coalescer tokens in the schedule stage. Currently this is only done for buffer and flat instructions, but not flat global or flat scratch. This change now acquires tokens for flat global and flat scratch instructions. This provides back-pressure to the CUs and helps to avoid deadlocks in Ruby. The change also handles returning tokens for buffer, flat global, and flat scratch instructions. This was previously only being done for normal flat instructions leading to deadlocks in some applications when the tokens were exhausted. To simplify the logic, added a needsToken() method to GPUDynInst which return if the instruction is buffer or any flat segment. The waitcnts were also incorrect for flat global and flat scratch. We should always decrement vmem and exp count for stores and only normal flat instructions should decrement lgkm. Currently vmem/exp are not decremented for flat global and flat scratch which can lead to deadlock. This change set fixes this by always decrementing vmem/exp and lgkm only for normal flat instructions. Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5	2023-10-10 09:48:16 -05:00
Matthew Poremba	60f071d09a	gpu-compute,arch-vega: Implement flat scratch insts Flat scratch instructions (aka private) are the 3rd and final segment of flat instructions in gfx9 (Vega) and beyond. These are used for things like spills/fills and thread local storage. This commit enables two forms of flat scratch instructions: (1) flat_load/flat_store instructions where the memory address resolves to private memory and (2) the new scratch_load/scratch_store instructions in Vega. The first are similar to older generation ISAs where the aperture is unknown until address translation. The second are instructions guaranteed to go to private memory. Since these are very similar to flat global instructions there are minimal changes needed: - Ensure a flat instruction is either regular flat, global, XOR scratch - Rename the global op_encoding methods to GlobalScratch to indicate they are for both and are intentionally used. - Flat instructions in segment 1 output scratch_ in the disassembly - Flat instruction executed as private use similar mem helpers as global - Flat scratch cannot be an atomic This was tested using a modified version of the 'square' application: template <typename T> __global__ void scratch_square(T C_d, T A_d, size_t N) { size_t offset = (blockIdx.x * blockDim.x + threadIdx.x); size_t stride = blockDim.x * gridDim.x ; volatile int foo; // Volatile ensures scratch / unoptimized code for (size_t i=offset; i<N; i+=stride) { foo = A_d[i]; C_d[i] = foo * foo; } } Change-Id: Icc91a7f67836fa3e759fefe7c1c3f6851528ae7d	2023-08-26 13:40:12 -05:00
Matthew Poremba	90a518e885	gpu-compute,arch-vega: Fix ALU-only LDS counters There are a few LDS instructions that perform local ALU operations and writeback which are marked as loads. These are marked as loads because they fit in the pipeline logic better, according to a several year old comment. In the VEGA ISA these instructions (swizzle, permute, bpermute) are not decrementing the LDS load counter. As a result, the counter will gradually increase over time. Since wavefront slots are persistent, this can cause applications with a few thousand kernels to eventually hang thinking there are not enough resources. This changeset fixes this by decrementing the LDS load counter for these instructions. This fix was already integrated in the GCN3 ISA in the exact same way. This changeset moves it near a similar comment about scheduling register file writes. Change-Id: Ife5237a2cae7213948c32ef266f4f8f22917351c	2023-08-23 19:30:24 -05:00
Matthew Poremba	3589a4c11f	arch-vega: Implement translate further Starting with ROCm 5.4+, MI100 and MI200 make use of the translate further bit in the page table. This bit enables mixing 4kiB and 2MiB pages and is functionally equivalent to mixing page sizes using the PDE.P bit for which gem5 currently has support. With PDE.P bit set, we stop walking and the page size is equal to the level in the page table we stopped at. For example, stopping at level 2 would be a 1GiB page, stopping at level 3 would be a 2MiB page. This assumes most pages are 4kiB. When the F bit is used, it is assumed most pages are 2MiB and we will stop walking at the 3rd level of the page table unless the F bit is set. When the F bit is set, the 2nd level PDE contains a block fragment size representing the page size of the next PDE in the form of 2^(12+size). If the next page has the F bit set we continue walking to the 4th level. The block fragment size is hardcoded to 9 in the driver therefore we assert that the block fragment size must be 0 or 9. This enables MI200 with ROCm 5.4+ in gem5. This functionality was determine by examining the driver source code in Linux and there is no public documentation about this feature or why the change is made in or around ROCm 5.4. Change-Id: I603c0208cd9e821f7ad6eeb1d94ae15eaa146fb9	2023-07-30 13:17:05 -05:00
Matthew Poremba	618b2a60de	arch-vega, dev-amdgpu: Fix for memory leaks (#129 ) When using the new operator, delete should be called on any allocated memory after it's use is complete. Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019	2023-07-30 10:48:17 -07:00
Ranganath (Bujji) Selagamsetty	ede4d89a83	arch-vega, dev-amdgpu: Fix for memory leaks When using the new operator, delete should be called on any allocated memory after it's use is complete. Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019	2023-07-28 19:14:46 -05:00
Matthew Poremba	c722b0c73d	arch-vega: Fix vop2Helper scalar support A previous change added a vop2Helper to remove 100s of lines of common code from VOP2 instructions related to processing SDWA and DPP support. That change inadvertently changed the type of operand source 0 from const to non-const. The vector container operator[] does not allow reading a scalar value such as a constant, a dword literal, etc. The error shows up in the form of: assert(!scalar) in operand.hh. Since the SDWA and DPP cases need to modify the source vector and non-SDWA/DPP cases might require const, we make a non-const copy of the const source 0 vector and place it in a tempoary non-const vector. This non-const vector is passed to the lambda function implementation of the instruction. This prevents needing a const and non-const version of the lambda and avoids needing to propagate the template parameters through the various SDWA/DPP helper methods which seems like it will not work anyways as they need to modify the vector. As a result of this, as more VOP2 instructions are implemented using this helper,they will need to specify the const and non-const template parameters of the vector container needed for the instruction. Change-Id: Ia0b3c550d7de32b830040007a110f4821e3385aa	2023-07-28 13:47:55 -05:00
Bobby R. Bruce	753933d471	gpu-compute, tests: Fix GPU_X86 compilation, add compiler tests (#64 ) * gpu-compute: Remove use of 'std::random_shuffle' This was deprecated in C++14 and removed in C++17. This has been replaced with std::random. This has been implemented to ensure reproducible results despite (pseudo)random behavior. Change-Id: Idd52bc997547c7f8c1be88f6130adff8a37b4116 * dev-amdgpu: Add missing 'overrides' This causes warnings/errors in some compilers. Change-Id: I36a3548943c030d2578c2f581c8985c12eaeb0ae * dev: Fix Linux specific includes to be portable This allows for compilation in non-linux systems (e.g., Mac OS). Change-Id: Ib6c9406baf42db8caaad335ebc670c1905584ea2 * tests: Add 'VEGA_X86' build target to compiler-tests.sh Change-Id: Icbf1d60a096b1791a4718a7edf17466f854b6ae5 * tests: Add 'GCN3_X86' build target to compiler-tests.sh Change-Id: Ie7c9c20bb090f8688e48c8619667312196a7c123	2023-07-11 14:35:03 -07:00
Matthew Poremba	841e6fe978	arch-vega: Add Vega D16 decodings and fix V_SWAP_B32 Vega adds multiple new D16 instructions which load a byte or short into the lower or upper 16 bits of a register for packed math. The decoder table has subDecode tables for FLAT instructions which represents 32 opcodes in each subDecode table. The subDecode table for opcodes 32-63 is missing so it is added here. The opcode for V_SWAP_B32 is also off by one- In the ISA manual this instruction is opcode 81, the instruction before is 79, and there is no opcode 80, so the decoder entry is swapped with the invalid decoding below it. Change-Id: I278fea574ea684ccc6302d5b4d0f5dd8813a88ad Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71899 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-06-29 19:56:56 +00:00
Matthew Poremba	db903f4fd4	arch-vega: Helper methods for SDWA/DPP for VOP2 Many of the outstanding issues with the GPU model are related to instructions not having SDWA/DPP implementations and executing by ignoring the special registers leading to incorrect executiong. Adding SDWA/DPP is current very cumbersome as there is a lot of boilerplate code. This changeset adds helper methods for VOP2 with one instruction changed as an example. This review is intended to get feedback before applying this change to all VOP2 instructions that support SDWA/DPP. Change-Id: I1edbc3f3bb166d34f151545aa9f47a94150e1406 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70738 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-06-15 23:02:39 +00:00
Matthew Poremba	ae7476bcdc	arch-gcn3,arch-vega: Fix ds_read2st64_b32 This instruction has two issues. The first is that it should write two consecutive registers, starting with vdst because it is writing two dwords. The second is that the data assignment to the lanes from the dynamic instruction should cast to a U32 type otherwise the array index goes out of bounds and returns the wrong data. The first issue was fixed in GCN3 a few years ago in this review: https://gem5-review.googlesource.com/c/public/gem5/+/32236. This changeset makes the same change for Vega and applies the U32 cast in both ISAs. Tested with rocPRIM unit test. The test was failing before this changeset and now passes. Change-Id: Ifb110fc9a36ad198da7eaf86b1e3e37eccd3bb10 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70577 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-05-13 20:09:37 +00:00
Matthew Poremba	cd76f92c94	arch-vega: Add decodings for new MI100 VOP2 insts VOP2 with opcodes 55-61 were added in MI100 and are not in Vega10. This changeset adds the decodings for these instructions. The changeset does not implement the instructions, however the fatal message is much more helpful for debugging compared so a generic decode_invalid handler. Change-Id: Ibde0880c35ff915bf8e50772df9ce263e55ca893 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70042 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-04-28 00:48:35 +00:00
Matthew Poremba	f028bd55e0	arch-vega: Update API for some flat atomics Some recently submitted atomic instructions were using two older APIs. Update these to use the newer APIs to support all apertures and avoid compilation issue. Change-Id: Ibd6bc00177d33236946f54ef8e5c7544af322852 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67977 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-02-16 03:28:25 +00:00

1 2 3 4

165 Commits