derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matthew Poremba	7d46c50663	arch-vega: Swizzle multi-dword scratch requests (#1445 ) Scratch memory requests that are larger than one dword are using a different memory layout than global instructions. Rather than being placed contiguously, each dword is interleaved 64 lanes * 4 bytes away as described in Section 9.1.5.2. "Swizzled Buffer Addressing" in the MI300 specification. This was verified by comparing MI300 output (which uses scratch_ instructions) with MI200 (which uses buffer instructions). MI300 FashionMNIST bs=1 now matches CPU reference. This requires several changes to the instruction implementations: - For stores, data in the GPUDynInst can be swizzled before the data is written to memory. This is easy to do using a helper method. This is done in the template<int N> variant of initMemWrite. To use this x2 stores are changed to use template<int N> rather than loading a U64. The swizzle function is renamed to swizzleAddr to avoid confusion with swizzleData. - For loads, data is unswizzled in completeAcc when writing register values. This is not as easy to implement as a helper and is thus implemented for the three load instructions that load more than one dword. - Accessing swizzled data requires at least one packet per dword. A new GPU memory helper is added to create these packets for scratch requests specifically. This is called in the template<int N> variant of initMemRead / initMemWrite. Loads and stores of x2 are changed to use this variant instead of accessing a U64. The GPUDynInst status vector restrictions are increased to allow for swizzled x4 accesses. For simplicity this does not currently support misaligned swizzled accesses and will panic upon seeing such a case. Change-Id: Ic686c51e28e0af029a043d5a5b3d4069f2cb94f9	2024-08-12 06:58:48 -07:00
Matthew Poremba	f91d14fe46	gpu-compute: Add MFMA stats (#1248 ) Add dynamic instruction counts for MFMAs. Change-Id: I976b01344577cf011aeb3dd648a8c0017281c4e3	2024-06-15 13:04:00 -07:00
Matthew Poremba	9f4d334644	gpu-compute: Update tokens for flat global/scratch Memory instructions acquire coalescer tokens in the schedule stage. Currently this is only done for buffer and flat instructions, but not flat global or flat scratch. This change now acquires tokens for flat global and flat scratch instructions. This provides back-pressure to the CUs and helps to avoid deadlocks in Ruby. The change also handles returning tokens for buffer, flat global, and flat scratch instructions. This was previously only being done for normal flat instructions leading to deadlocks in some applications when the tokens were exhausted. To simplify the logic, added a needsToken() method to GPUDynInst which return if the instruction is buffer or any flat segment. The waitcnts were also incorrect for flat global and flat scratch. We should always decrement vmem and exp count for stores and only normal flat instructions should decrement lgkm. Currently vmem/exp are not decremented for flat global and flat scratch which can lead to deadlock. This change set fixes this by always decrementing vmem/exp and lgkm only for normal flat instructions. Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5	2023-10-10 09:48:16 -05:00
Matthew Poremba	60f071d09a	gpu-compute,arch-vega: Implement flat scratch insts Flat scratch instructions (aka private) are the 3rd and final segment of flat instructions in gfx9 (Vega) and beyond. These are used for things like spills/fills and thread local storage. This commit enables two forms of flat scratch instructions: (1) flat_load/flat_store instructions where the memory address resolves to private memory and (2) the new scratch_load/scratch_store instructions in Vega. The first are similar to older generation ISAs where the aperture is unknown until address translation. The second are instructions guaranteed to go to private memory. Since these are very similar to flat global instructions there are minimal changes needed: - Ensure a flat instruction is either regular flat, global, XOR scratch - Rename the global op_encoding methods to GlobalScratch to indicate they are for both and are intentionally used. - Flat instructions in segment 1 output scratch_ in the disassembly - Flat instruction executed as private use similar mem helpers as global - Flat scratch cannot be an atomic This was tested using a modified version of the 'square' application: template <typename T> __global__ void scratch_square(T C_d, T A_d, size_t N) { size_t offset = (blockIdx.x * blockDim.x + threadIdx.x); size_t stride = blockDim.x * gridDim.x ; volatile int foo; // Volatile ensures scratch / unoptimized code for (size_t i=offset; i<N; i+=stride) { foo = A_d[i]; C_d[i] = foo * foo; } } Change-Id: Icc91a7f67836fa3e759fefe7c1c3f6851528ae7d	2023-08-26 13:40:12 -05:00
Matthew Poremba	f375e79bcf	gpu-compute: Support Scalar and Vector access to system pages The amdgpu driver supports reading and writing scalar and vector memory addresses that reside in system memory. This is commonly used for things like blit kernels that perform host-to-device or device-to-host copies using GPU load/store instructions. This is done by utilizing the system hub device added in a prior changeset. Memory packets translated by the Scalar or VMEM TLBs will have the correspoding system request field set from the PTE in the TLB which can be used in the compute unit to determine if a request is for system memory or not. Another important change is to return global memory tokens for system requests. Since these do not flow through the GPU coalescer where the token is returned, the token can be returned once the request is known to be a system request. Change-Id: I35030e0b3698f10c63a397f96b81267271e3130e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57711 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-07 20:11:01 +00:00
Matthew Poremba	9313294efe	misc: Remove AMD license addition Remove the line "For use for simulation and test purposes only" in files were AMD is the only copyright holder listed in the header. This happens to be the case for all files where this line exists, removing it completely from gem5. Change-Id: I623f266b002f564301b28774f49081099cfc60fd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53943 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-12-11 04:00:56 +00:00
Matthew Poremba	c15e472199	arch-vega: Rework flat instructions to support global Global instructions are new in Vega and are essentially FLAT instructions from GCN3 but guaranteed to go to global memory where as flat can go to global or local memory. This reworks the flat instruction classes so that the initiateAcc / execute / completeAcc logic can be reused for flat, global, and later scratch subtypes of flat instructions. The decoder creates a flat instruction class which sets instruction flags based on the flat instruction's SEG field. There are new initOperandInfo and generateDissasmbly methods for flat and global. The number of operands and operand index getters are modified to check the flags and return the correct value for the subtype. Change-Id: I1db4a3742aeec62424189e54c38c59d6b1a8d3c1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47106 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Kyle Roarty <kyleroarty1716@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-10-04 22:51:37 +00:00
Matthew Poremba	16de253c15	arch-vega: Add missing functions referenced by insts Some instructions were referencing pc() and isExecMaskRegister() which were not defined. Change-Id: Ic5b3fa9057950ff85603fcb87447a81b6c7f274b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47103 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2021-09-27 22:30:30 +00:00
Daniel R. Carvalho	974a47dfb9	misc: Adopt the gem5 namespace Apply the gem5 namespace to the codebase. Some anonymous namespaces could theoretically be removed, but since this change's main goal was to keep conflicts at a minimum, it was decided not to modify much the general shape of the files. A few missing comments of the form "// namespace X" that occurred before the newly added "} // namespace gem5" have been added for consistency. std out should not be included in the gem5 namespace, so they weren't. ProtoMessage has not been included in the gem5 namespace, since I'm not familiar with how proto works. Regarding the SystemC files, although they belong to gem5, they actually perform integration between gem5 and SystemC; therefore, it deserved its own separate namespace. Files that are automatically generated have been included in the gem5 namespace. The .isa files currently are limited to a single namespace. This limitation should be later removed to make it easier to accomodate a better API. Regarding the files in util, gem5:: was prepended where suitable. Notice that this patch was tested as much as possible given that most of these were already not previously compiling. Change-Id: Ia53d404ec79c46edaa98f654e23bc3b0e179fe2d Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46323 Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu> Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-01 19:08:24 +00:00
Daniel R. Carvalho	4dd099ba3d	misc: Rename Enums namespace as enums As part of recent decisions regarding namespace naming conventions, all namespaces will be changed to snake case. ::Enums became ::enums. Change-Id: I39b5fb48817ad16abbac92f6254284b37fc90c40 Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45420 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-05-29 11:13:49 +00:00
Kyle Roarty	2bb8d6bc0c	gpu-compute: remove index-based operand access This commit removes functions that indexed into the vectors that held the operands. Instead, for-each loops are used, iterating through one of 6 vectors (src, dst, srcScalar, srcVec, dstScalar, dstVec) that all hold various (potentially overlapping) combinations of the operands. Change-Id: Ia3a857c8f6675be86c51ba2f77e3d85bfea9ffdb Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42212 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2021-04-01 02:58:31 +00:00
Tony Gutierrez	0e2564a629	arch-gcn3, gpu-compute: Update getRegisterIndex() API This change removes the GPUDynInstPtr argument from getRegisterIndex(). The dynamic inst was only needed to get access to its parent WF's state so it could determine the number of scalar registers the wave was allocated. However, we can simply pass the number of scalar registers directly. This cuts down on shared pointer usage. Change-Id: I29ab8d9a3de1f8b82b820ef421fc653284567c65 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42210 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2021-04-01 02:58:31 +00:00
Tony Gutierrez	236b4a502f	gpu-compute: Add operand info class to GPUDynInst This change adds a class that stores operand register info for the GPUDynInst. The operand info is calculated when the instruction object is created and stored for easy access by the RF, etc. Change-Id: I3cf267942e54fe60fcb4224d3b88da08a1a0226e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42209 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2021-04-01 02:58:31 +00:00
Kyle Roarty	c9415dc389	gpu-compute: Remove unused functions These functions were probably used for some stat collection, but they're no longer used, so they're being removed Change-Id: Ic99f22391c0d5ffb0e9963670efb35e503f9957d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42202 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2021-03-25 17:21:16 +00:00
Alexandru Dutu	14d6e8fac4	arch-gcn3: Implementation of s_sleep This changeset implements the s_sleep instruction in a similar way to s_waitcnt. Change-Id: I4811c318ac2c76c485e2bfd9d93baa1205ecf183 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39115 Maintainer: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-02-04 00:07:10 +00:00
Matthew Poremba	5323cccfdd	arch-gcn3,gpu-compute: Update stats style for GPU Convert all gpu-compute stats to Stats::Group style. Change-Id: I29116f1de53ae379210c6cfb5bed3fc74f50cca5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39135 Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-01-18 17:58:05 +00:00
Tuan Ta	173c1c6eb0	gpu-compute,mem-ruby: Replace ACQUIRE and RELEASE request flags This patch replaces ACQUIRE and RELEASE flags which are HSA-specific. ACQUIRE flag becomes INV_L1 in VIPER protocol. RELEASE flag is removed. Future protocols may support extra cache coherence flags like INV_L2 and WB_L2. Change-Id: I3d60c9d3625c898f4110a12d81742b6822728533 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32859 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-11-04 21:09:26 +00:00
Gabe Black	50a0b85367	arm,base,gpu: Use std::make_unique instead of m5::make_unique. Now that we're using c++14, we can just assume that std::make_unique exists. We no longer have to conditionally inject our own version. Change-Id: I5d851afb02dd05c7af93864ffec3b3184f3d4ec8 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35215 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-28 05:41:08 +00:00
Matt Sinclair	8177fc4392	arch-gcn3: add support for unaligned accesses Previously, with HSAIL, we were guaranteed by the HSA specification that the GPU will never issue unaligned accesses. However, now that we are directly running GCN this is no longer true. Accordingly, this commit adds support for unaligned accesses. Moreover, to reduce the replication of nearly identical code for the different request types, I also added new helper functions that are called by all the different memory request producing instruction types in op_encodings.hh. Adding support for unaligned instructions requires changing the statusBitVector used to track the status of the memory requests for each lane from a bit per lane to an int per lane. This is necessary because an unaligned access may span multiple cache lines. In the worst case, each lane may span multiple cache lines. There are corresponding changes in the files that use the statusBitVector. Change-Id: I319bf2f0f644083e98ca546d2bfe68cf87a5f967 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29920 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-19 20:41:18 +00:00
Tony Gutierrez	b8da9abba7	gpu-compute, mem-ruby, configs: Add GCN3 ISA support to GPU model Change-Id: Ibe46970f3ba25d62ca2ade5cbc2054ad746b2254 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29912 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-15 22:45:17 +00:00
Matthew Poremba	64134b6e66	base,arch-hsail: Fix GPU build The GPU build is currently broken due to recent changes. This fixes the build after changes to local access, removal of getSyscallArg, and creating of AMO header in base. Change-Id: I43506f6fb0a92a61a50ecb9efa7ee279ecb21d98 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27136 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: Gem5 Cloud Project GCB service account <345032938727@cloudbuild.gserviceaccount.com>	2020-04-03 21:51:57 +00:00
Gabe Black	71a868224c	gpu-compute: Delete authors lists from gpu-compute files. Change-Id: I72318eb885f9517de325ea9a9af263f36613bf6e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/25414 Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>	2020-02-17 10:05:52 +00:00
Giacomo Travaglini	2d2d579c4a	base, gpu-compute: Move gpu AMOs into the generic header Change-Id: I10d8aeaae83c232141ddd2fd21ee43bed8712539 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/23565 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Jason Lowe-Power <jason@lowepower.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-01-09 09:50:02 +00:00
Gabe Black	12311c5540	arch, base, cpu, gpu, mem: Replace assert(0 or false with panic. Neither assert(0) nor assert(false) give any hint as to why control getting to them is bad, and their more descriptive versions, assert(0 && "description") and assert(false && "description"), jury rig assert to add an error message when the utility function panic() already does that directly with better formatting options. This change replaces that flavor of call to assert with panic, except in the actual code which processes the formatting that panic uses (to avoid infinitely recurring error handling), and in some *.sm files since I don't know what rules those have to follow and don't want to accidentaly break them. Change-Id: I8addfbfaf77eaed94ec8191f2ae4efb477cefdd0 Reviewed-on: https://gem5-review.googlesource.com/c/14636 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>	2018-11-27 21:58:24 +00:00
Brandon Potter	28d65f8075	hsail-x86: fix gpu dynamic instruction error The gpu_dyn_inst.hh file was missing a clone method from inherited classes. (The clone method is the way to implement the prototype design pattern.) Because the inherited clone method was declare as pure virtual, the method needed to be implemented. Otherwise, the compiler complains that the class is abstract. Change-Id: I38782d5f7379f32be886401f7c127fe60d2f8811 Reviewed-on: https://gem5-review.googlesource.com/12108 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>	2018-08-17 16:58:05 +00:00
Giacomo Travaglini	2113b21996	misc: Substitute pointer to Request with aliased RequestPtr Every usage of Request* in the code has been replaced with the RequestPtr alias. This is a preparing patch for when RequestPtr will be the typdefed to a smart pointer to Request rather then a raw pointer to Request. Change-Id: I73cbaf2d96ea9313a590cdc731a25662950cd51a Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10995 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>	2018-06-11 16:55:30 +00:00
Tony Gutierrez	abb21ba99f	style: fix amd license and style issues Change-Id: I26136fb49f743c4a597f8021cfd27f78897267b5 Reviewed-on: https://gem5-review.googlesource.com/10463 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>	2018-05-16 15:32:01 +00:00
Tony Gutierrez	b63eb1302b	gpu-compute, hsail: pass GPUDynInstPtr to getRegisterIndex() for HSAIL an operand's indices into the register files may be calculated trivially, because the operands are always read from a register file, or are an immediate. for machine ISA, however, an op selector may specify special registers, or may specify special SGPRs with an alias op selector value. the location of some of the special registers values are dependent on the size of the RF in some cases. here we add a way for the underlying getRegisterIndex() method to know about the size of the RFs, so that it may find the relative positions of the special register values.	2016-10-26 22:47:49 -04:00
Tony Gutierrez	00a6346c91	hsail, gpu-compute: remove doGm/SmReturn add completeAcc we are removing doGmReturn from the GM pipe, and adding completeAcc() implementations for the HSAIL mem ops. the behavior in doGmReturn is dependent on HSAIL and HSAIL mem ops, however the completion phase of memory ops in machine ISA can be very different, even amongst individual machine ISA mem ops. so we remove this functionality from the pipeline and allow it to be implemented by the individual instructions.	2016-10-26 22:47:19 -04:00
Tony Gutierrez	7ac38849ab	gpu-compute: remove inst enums and use bit flag for attributes this patch removes the GPUStaticInst enums that were defined in GPU.py. instead, a simple set of attribute flags that can be set in the base instruction class are used. this will help unify the attributes of HSAIL and machine ISA instructions within the model itself. because the static instrution now carries the attributes, a GPUDynInst must carry a pointer to a valid GPUStaticInst so a new static kernel launch instruction is added, which carries the attributes needed to perform a the kernel launch.	2016-10-26 22:47:11 -04:00
jkalamat	3724fb15fa	gpu-compute: parametrize Wavefront size Eliminate the VSZ constant that defined the Wavefront size (in numbers of work items); replaced it with a parameter in the GPU.py configuration script. Changed all data structures dependent on the Wavefront size to be dynamically sized. Legal values of Wavefront size are 16, 32, 64 for now and checked at initialization time.	2016-06-09 11:24:55 -04:00
Tony Gutierrez	1a7d3f9fcb	gpu-compute: AMD's baseline GPU model	2016-01-19 14:28:22 -05:00

32 Commits