derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matthew Poremba	8fe975e57e	gpu-compute: Fatal on dynamic scratch allocation in GPUFS This is known not working in GPUFS. As a result, the simulation will never end. Rather than simulate forever, add a fatal for now to exit simulation until support for this functionality is added. Change-Id: I8e45996a7eb781575e8643baea05daf87bc5f1c3 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58472 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-08 17:12:32 +00:00
Matthew Poremba	e36a8dbd8a	gpu-compute: Handle GPUFS system store responses Requests in GPUFS which go to system memory will not generate the WriteCompleteResp packets that the VIPER protocol would normally created for device requests which go through the caches. Therefore, we need to callback the GM pipe handleResponse to complete the access and make forward progress. Change-Id: Ic00c430ce420a591fe5743f758b780d93afd2a38 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57989 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-07 20:11:01 +00:00
Matthew Poremba	6feaa88e27	gpu-compute: Command processor read path from device In full system mode, the AMDKernelCode object can reside in either the system memory or in the dGPU device memory. Currently only reading from the host/system memory is supported. This adds the necessary code to read from the dGPU device memory. Change-Id: I887fc706b3f9834db14e40f36fd29dd3d4602925 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57710 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-07 20:11:01 +00:00
Matthew Poremba	fcbc9afcd6	gpu-compute: Don't use emulated driver in full system The emulated driver is currently called in a few locations unconditionally. This changeset adds checks that we are not in full system before calling any emulated driver function. In full system the amdgpu driver running on the disk image handles these functions. Change-Id: Iea3546b574e29c649351c0fce9154530be89e9b1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57712 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-07 20:11:01 +00:00
Matthew Poremba	f375e79bcf	gpu-compute: Support Scalar and Vector access to system pages The amdgpu driver supports reading and writing scalar and vector memory addresses that reside in system memory. This is commonly used for things like blit kernels that perform host-to-device or device-to-host copies using GPU load/store instructions. This is done by utilizing the system hub device added in a prior changeset. Memory packets translated by the Scalar or VMEM TLBs will have the correspoding system request field set from the PTE in the TLB which can be used in the compute unit to determine if a request is for system memory or not. Another important change is to return global memory tokens for system requests. Since these do not flow through the GPU coalescer where the token is returned, the token can be returned once the request is known to be a system request. Change-Id: I35030e0b3698f10c63a397f96b81267271e3130e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57711 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-07 20:11:01 +00:00
Matthew Poremba	347364ab0f	gpu-compute: Handle mailbox/wakeup signals for GPUFS The current mailbox/wakeup signal uses the SE mode proxy port to write the event value. This is not available in full system mode so instead we need to issue a DMA write to the address. The value of event_val clears the event. Change-Id: I424469076e87e690ab0bb722bac4c3e7414fb150 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57709 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-07 20:11:01 +00:00
Matthew Poremba	91e8bbe299	configs,gpu-compute: Support fetch from system pages The amdgpu driver supports fetching instructions from pages which reside in system memory rather than device memory. This changeset adds support to do this by adding the system hub object added in a prior changeset to the fetch unit and issues requests to the system hub if the system bit in the memory page's PTE is set. Otherwise, the requestor ID is set to be device memory and the request is routed through the Ruby network / GPU caches to fetch the instructions. Change-Id: Ib2fb47c589fdd5e544ab6493d7dbd8f2d9d7b0e8 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57652 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-28 23:24:53 +00:00
Gabe Black	e6c0ba97db	scons: Put all config variables in an env['CONF'] sub-dict. This makes what are configuration and what are internal SCons variables explicit and separate, and makes it unnecessary to call out what variables to export to C++. These variables will also be plumbed into and out of kconfiglib in later changes. Change-Id: Iaf5e098d7404af06285c421dbdf8ef4171b3f001 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56892 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-28 20:31:21 +00:00
Matthew Poremba	51648570ea	gpu-compute: Add methods to read GPU memory requestor ID These methods are called from various places to override the requestor ID of a request in order to determine which Ruby network a request should be routed on. Change-Id: Ic0270ddd7123f0457a13144e69ef9132204d4334 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57651 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-25 19:51:29 +00:00
Matthew Poremba	581e451723	gpu-compute,dev-hsa: Update CP and HSAPP for full-system Make the necessary changes to connect Vega pagetable walkers for full-system mode. Previously the CP and HSA packet processor could only read AQL packets from system/host memory using proxy port. This allows for AQL to be read from device memory which is used for non-blit kernels. Change-Id: If28eb8be68173da03e15084765e77e92eda178e9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53077 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-25 19:51:29 +00:00
Matthew Poremba	539a2e2bcd	arch-vega: Add VEGA page tables and TLB Add the page table walker, page table format, TLB, TLB coalescer, and associated support in the AMDGPUDevice. This page table format used the hardware format for dGPU and is very different from APU/GCN3 which use the X86 page table format. In order to support either format for the GPU model, a common TranslationState called GpuTranslation state is created which holds the combined fields of both the APU and Vega translation state. Similarly the TlbEntry is cast at runtime by the corresponding arch files as they are the only files which touch the internals of the TlbEntry. The GPU model only checks if a TlbEntry is non-null and thus does not need to cast to peek inside the data structure. Change-Id: I4484c66239b48df5224d61caa6e968e56eea38a5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51848 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-17 00:11:14 +00:00
Gabe Black	06117275fa	scons: Make all sticky variables automatically exported. All sticky vars are exported, but not all exported vars are sticky. The vars which are exported but not sticky are (at least in general) found with Configure() style measurement. Change-Id: Idebf17e44c2eeca745cdfdd9f42eddcfdb0cf9ed Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56891 Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>	2022-03-15 00:45:30 +00:00
Matthew Poremba	45ad755511	gpu-compute: Fix default MTYPE initialization The default MTYPE initialization in the emulated GPU driver is currently doing a bitwise AND on an input integer param with other integers instead of using a bitmask. Change this to use bitset and test the bit positions corresponding to the values in the MTYPE enum that were previously being used as an operand for bitwise AND. This was causing invalid slicc transitions in some benchmarks for combinations of request type and mtype that are undefined. Change-Id: I93fee0eae1fff7141cd14c239c16d1d69925d08d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56367 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-13 21:50:21 +00:00
Kyle Roarty	78451f6685	gpu-compute: Fix register checking and allocation in dyn manager This patch updates the canAllocate function to account both for the number of regions of registers that need to be allocated, and for the fact that the registers aren't one continuous chunk. The patch also consolidates the registers as much as possible when a register chunk is freed. This prevents fragmentation from making it impossible to allocate enough registers Change-Id: Ic95cfe614d247add475f7139d3703991042f8149 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56909 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>	2022-02-18 18:46:33 +00:00
Kyle Roarty	4d2cbefd1e	gpu-compute: Set scratch_base, lds_base for gfx902 When updating how scratch_base and lds_base were set, gfx902 was left out. This adds in gfx902 to the case statement, allowing the apertures to be set and for simulations using gfx902 to not error out Change-Id: I0e1adbdf63f7c129186fb835e30adac9cd4b72d0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/54663 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-02-17 21:20:16 +00:00
Matthew Poremba	9313294efe	misc: Remove AMD license addition Remove the line "For use for simulation and test purposes only" in files were AMD is the only copyright holder listed in the header. This happens to be the case for all files where this line exists, removing it completely from gem5. Change-Id: I623f266b002f564301b28774f49081099cfc60fd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53943 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-12-11 04:00:56 +00:00
Matthew Poremba	c028af111a	arch-gcn3,gpu-compute: Move TLB to common folder in amdgpu This TLB is more of an "APU" TLB than anything GCN3 specific. It can be used with either GCN3 or Vega. With this change, VEGA_X86 builds and one can run binaries with Vega ISA code using the same steps as GCN3 but building the Vega ISA instead. Change-Id: I0c92bcd0379a18628dc05cb5af070bdc7e692c7c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53803 Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-12-09 17:26:15 +00:00
Gabe Black	1c233ee9d2	scons: Add sim_object and enums arguments to SimObject(). This will explicitly declare what SimObject and Enum types need to be set up in C++, which will make importing all the SimObject modules during the setup phase of SCons uneccessary. Change-Id: Id2d7603daf33b236ceaa0789e2f089f589d34e62 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49406 Reviewed-by: Gabe Black <gabe.black@gmail.com> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-12-08 08:01:23 +00:00
Matthew Poremba	9cabfd7a9b	dev-hsa,gpu-compute: Properly assign DmaVirtDevices in py These SimObjects are DmaVirtDevices in C++ but DmaDevices in the sim object's python file. Make the sim object python files consistent. Change-Id: I728ae737c5901e448628fc5ac877f261ca4c4393 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53704 Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu> Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-12-07 20:26:17 +00:00
Gabe Black	6107dd11c6	misc: Remove include of arch/page_size.hh, and fix up includes. Remove the only remaining use of arch/page_size.hh, and fix up a couple files which were using one of the constants defined in a specific arch version of it without including the file they needed directly. Change-Id: I6da5638ca10c788bd42197f4f5180e6b66f7b87f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50765 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>	2021-10-22 21:43:02 +00:00
Gabe Black	07c613ff5e	dev,gpu-compute: Use a TranslationGen in DmaVirtDevice. Use a TranslationGen to iterate over the translations for a region, rather than using a ChunkGenerator with a fixed page size the device needs to know. Change-Id: I5da565232bd5282074ef279ca74e556daeffef70 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50763 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Matthew Poremba <matthew.poremba@amd.com>	2021-10-22 21:43:02 +00:00
Matt Sinclair	96a86780ee	dev-hsa,gpu-compute: fix bug with gfx8 VAs for HSA Queues GFX7 (not supported in gem5) and GFX8 have a bug with how virtual addresses are calculated for their HSA queues. The ROCr component of ROCm solves this problem by doubling the HSA queue size that is requested, then mapping all virtual addresses in the second half of the queue to the same virtual addresses as the first half of the queue. This commit fixes gem5's support to mimic this behavior. Note that this change does not affect Vega's HSA queue support, because according to the ROCm documentation, Vega does not have the same problem as GCN3. Change-Id: I133cf1acc3a00a0baded0c4c3c2a25f39effdb51 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51371 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>	2021-10-12 17:01:52 +00:00
Matthew Poremba	3112a7f0d0	arch-gcn3,gpu-compute: Move GCN3 specific TLB to arch Move GpuTLB and TLBCoalescer to GCN3 as the TLB format is specific to GCN3 and SE mode / APU simulation. Vega will have its own TLB, coalescer, and walker suitable for a dGPU. This also adds a using alias for the TLB translation state to reduce the number of references to TheISA and X86ISA. X86 specific includes are also removed. Change-Id: I34448bb4e5ddb9980b34a55bc717bbcea0e03db5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49847 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-10-04 23:47:03 +00:00
Matthew Poremba	c15e472199	arch-vega: Rework flat instructions to support global Global instructions are new in Vega and are essentially FLAT instructions from GCN3 but guaranteed to go to global memory where as flat can go to global or local memory. This reworks the flat instruction classes so that the initiateAcc / execute / completeAcc logic can be reused for flat, global, and later scratch subtypes of flat instructions. The decoder creates a flat instruction class which sets instruction flags based on the flat instruction's SEG field. There are new initOperandInfo and generateDissasmbly methods for flat and global. The number of operands and operand index getters are modified to check the flags and return the correct value for the subtype. Change-Id: I1db4a3742aeec62424189e54c38c59d6b1a8d3c1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47106 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Kyle Roarty <kyleroarty1716@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-10-04 22:51:37 +00:00
Matthew Poremba	16de253c15	arch-vega: Add missing functions referenced by insts Some instructions were referencing pc() and isExecMaskRegister() which were not defined. Change-Id: Ic5b3fa9057950ff85603fcb87447a81b6c7f274b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47103 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2021-09-27 22:30:30 +00:00
Gabe Black	bec16fbc31	misc: Move MemPool based calls to the SEWorkload. These currently proxy to the System object, but this is one step towards moving the MemPool-s out of the System and into the SEWorkload where they really should have been from the start. Change-Id: Id27e7b874c283abf07bd892c8467a9cc52e2fdff Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50342 Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-09-21 02:05:32 +00:00
Gabe Black	00187b7bc3	x86,mem: Replace the x86 StoreCheck flag with READ_MODIFY_WRITE. X86 had a private/arch specific request flag called StoreCheck which it used to signal to the TLB that it should fault on a load if it would have faulted had it been a store. That way, you can detect whether a read-modify-write type of operation is going to fail due to a translation problem during the read, and don't have to worry about not doing anything architecturally visible until the store had succeeded, while also making sure not to do the store part if the modify part could fail. It seems that Ruby had hijacked that flag and had an architecture specific check which was looking for a load which was going to be followed by a store. The x86 flag was never intended to communicate that beyond the TLB, and this nominally architecture agnostic component shouldn't be reaching into the ISA specific flags to try to get that information. Instead, this change introduces a new Request flag called READ_MODIFY_WRITE which is used for the same purpose in x86, but in general means that a load will be followed by a write in the near future. With this new globally applicable flag, the ruby Sequencer class no longer needs to check what the arch is, nor does it need to access ISA private data in the request flags. Always doing this check should be no less efficient than before, because checking the arch involved calling into the system object, while checking the flag only requires masking a bit on the flags which the compiler probably already has floating around for other logic in this function. Change-Id: Ied5b744d31e7aa8bf25e399b6b321f9d2020a92f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48710 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Maintainer: Gabe Black <gabe.black@gmail.com>	2021-09-05 05:29:27 +00:00
Gabe Black	92288331c2	gpu-compute: Delete code related to X86PagetableWalker in X86GPUTLB.py. This code will never be executed since FULL_SYSTEM is not part of the build environment (and hasn't been for many years), and on top of that, this declaration redundantly (and incompletely) tries to set up the X86PagetableWalker that the ISA already sets up. Change-Id: I40cffbd7f60c1f741b1a14d9009f80185c9ce28c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49405 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com>	2021-08-20 08:19:57 +00:00
Jason Lowe-Power	eb24bca44e	Merge "misc: Merge branch 'release-staging-v21-1' into develop" into develop	2021-07-30 04:44:09 +00:00
Matt Sinclair	97760cb5a3	gpu-compute: fix typo in compute driver comments Change-Id: I550c6c81ffb2ee9143a2676f93385a8b90c4ddd5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48023 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-29 20:00:20 +00:00
Gabe Black	00876fff20	misc: Replace the GEM5_VAR_USED macro with [[maybe_unused]]. The [[maybe_unused]] attribute is now standard, so we can use that directly without hiding it behind a macro. Change-Id: If24ffd7e50bdb503cb3e6ea61f226ea794e84b8f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48511 Reviewed-by: Gabe Black <gabe.black@gmail.com> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-29 10:17:51 +00:00
Bobby R. Bruce	76ceda55f7	misc: Merge branch 'release-staging-v21-1' into develop Change-Id: I0f69d3d0863f77c02ac8089fb4dccee3aa70a4ea	2021-07-28 17:37:04 -07:00
Kyle Roarty	9a7fc4ff69	arch-gcn3: Implement LDS accesses in Flat instructions Add support for LDS accesses by allowing Flat instructions to dispatch into the local memory pipeline if the requested address is in the group aperture. This requires implementing LDS accesses in the Flat initMemRead/Write functions, in a similar fashion to the DS functions of the same name. Because we now can potentially dispatch to the local memory pipeline, this change also adds a check to regain any tokens we requested as a flat instruction. Change-Id: Id26191f7ee43291a5e5ca5f39af06af981ec23ab Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48343 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-26 18:36:16 +00:00
Bobby R. Bruce	c0a3c70304	misc: Merge branch 'release-staging-v21-1' into develop Change-Id: I6ba57d7f70be70ae43fab396780d18623679a59a	2021-07-26 09:48:25 -07:00
Gabe Black	59496b6136	mem,gpu-compute: Stop using the GEM5_NO_DISCARD macro. The [[nodiscard]] attribute is now standard, so we can use that directly. Change-Id: I57f59935858facb2a15bf4712be4bfd584bf0c7e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48509 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-24 21:57:04 +00:00
Kyle Roarty	f8578e4b05	gpu-compute: Fix TLB coalescer starvation Currently, we are storing coalesced accesses in an std::unordered_map indexed by a tick index, i.e. issue tick / coalescing window. If there are multiple coalesced requests, at different tick indexes, to the same virtual address, then the TLB coalescer will issue just the first one. However, std::unordered_map is not a sorted container and we issue coalesced requests by iterating through such container. This means that the coalesced request sent in TLBCoalescer::processProbeTLBEvent is not necessarly the oldest one. Because of this, in cases of high contention the oldest coalesced request will have a huge TLB access latency. To fix this issue, we will use an std::map which is a sorted container and therefore guarantees the oldest coalesced request will be sent first. Change-Id: I9c7ab32c038d5e60f6b55236266a27b0cae8bfb0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48340 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-24 17:27:02 +00:00
Gabe Black	83b14e569b	misc: Stop using getVirtProxy. The proxies are not used on the critical path, and it's usually implicit whether they should be the FS or SE version. Ideally in the future we won't need to worry about which version we need to use, but the differences haven't quite been abstracted away, and occasionally we need to decide between the two. Change-Id: Idb363d6ddc681f7c1ad5e7aba69865f40aa30dc8 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45907 Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2021-07-23 03:42:17 +00:00
Kyle Roarty	fb4a7e1e24	gpu-compute: Fix off-by-one when creating an AddrRange The end value of an AddrRange is already not included in the range, so subtracting one from the end creates an off-by-one error. This patch removes the extra -1 that was used when determining the end of an AddrRange in allocateGpuVma Change-Id: I75659e9a7fabd991bb37be9aa40f8e409eb21154 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48020 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-14 20:47:27 +00:00
Bobby R. Bruce	b2677990f6	gpu-compute: Add missing overrides These missing overrides were causing compilations errors with the Clang 11 compiler: https://www.mail-archive.com/gem5-dev@gem5.org/msg39683.html Change-Id: Ib5e7096ab9a7a8505bcc848ff3f08674f7f289ce Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47899 Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-13 00:16:51 +00:00
Kyle Roarty	e2e18d41e1	configs,gpu-compute: Add support for gfx902/Raven This patch adds support for a gfx902 Vega APU, ripping the appropriate values for device_id from the ROCm Thunk (src/topology.c). Note: gfx902 isn't officially supported by ROCm. This means that it may not work for all programs. In particular, rocBLAS is incompatible with gfx902, so anything that uses rocBLAS won't be able to run with gfx902. Change-Id: I48893e7cc9c7e52275fdfd22314f371a9db8e90a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47530 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-10 03:42:03 +00:00
Matthew Poremba	897c0c11ed	dev,dev-hsa,gpu-compute: Refactor dmaVirt calls Remove the duplicate dmaVirt calls from HSA packet processor and GPU command processor and move them into their own class. This removes some duplicate code and allows a DmaVirtDevice to be created which will be useful for upcoming full system GPU commits. The DmaVirtDevice is an abstraction of the base DmaDevice but iterates using ChunkGenerator over virtual addresses. Classes which inherit from DmaVirtDevice must provide a translation function to translate from virtual address to physical address. Once translated, the physical address is passed to DmaDevice to do the work. Change-Id: Idd59ccb4d9ba21c0b1150ee328ededf5a88d824e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47179 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-09 22:40:18 +00:00
Kyle Roarty	1812041dc0	gpu-compute: Update GET_PROCESS_APERTURES IOCTLs The apertures for non-gfx801 GPUs are set differently. If the apertures aren't set properly, ROCm will error out. This change sets the apertures appropriately based on the gfx version of the simulated GPU. It also adds in new functions to set the scratch and lds apertures in GFX9 to mimic the linux kernel. Change-Id: I1fa6f60bc20c7b6eb3896057841d96846460a9f8 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47529 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-09 16:22:07 +00:00
Kyle Roarty	ab9e28ddb8	configs,gpu-compute: Set proper dGPUPoolID defaults In GPU.py, dGPUPoolID is defined as an int, but was defaulted to False. Explicitly set it to 0, instead. In apu_se.py, dGPUPoolID was being set to 1, but that was resulting in crashes. Setting it to 0 avoids those crashes. Change-Id: I0f1161588279a335bbd0d8ae7acda97fc23201b5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47527 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-09 16:11:20 +00:00
Kyle Roarty	76888a9cca	gpu-compute: Add mmap functionality to GPURenderDriver dGPUs mmap the GPURenderDriver, however it doesn't appear that they do anything with it. This patch implements the mmap function by just returning the address provided, while not doing anything else Change-Id: Ia010a2aebcf7e2c75e22d93dfb440937d1bef3b1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47523 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-09 16:11:20 +00:00
Kyle Roarty	ebb6c4b99b	gpu-compute: Check for WAX dependences This adds checking if the destination registers are free or busy in the operandsReady() function for both scalar and vector registers. This allows us to catch WAX dependences between instructions. Change-Id: I0fb0b29e9608fca0d90c059422d4d9500d5b2a7d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47539 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-08 01:18:01 +00:00
Kyle Roarty	02dd6b77ff	arch-gcn3,arch-vega,gpu-compute: Move request counters When the Vega ISA got committed, it lacked the request counter tracking for memory requests that existed in the GCN3 code. Instead of copying over the same lines from the GCN3 code to the Vega code, this commit makes the various memory pipelines handle updating the request counter information instead, as every memory instruction calls a memory pipeline. This commit also adds an issueRequest in scalar_memory_pipeline, as previously, the gpuDynInsts were explicitly placed in the queue of issuedRequests. Change-Id: I5140d3b2f12be582f2ae9ff7c433167aeec5b68e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45347 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-08 01:18:01 +00:00
Kyle Roarty	3f9b03522c	arch-gcn3,gpu-compute: Set gpuDynInst exec_mask before use vector_register_file uses the exec_mask of a memory instruction in order to determine if it should mark a register as in-use or not. Previously, the exec_mask of memory instructions was only set on execution of that instruction, which occurs after the code in vector_register_file. This led to the code reading potentially garbage data, leading to a scenario where a register would be marked used when it shouldn't be. This fix sets the exec_mask of memory instructions in schedule_stage, which works because the only time the wavefront execMask() is updated is on a instruction executing, and we know the previous instruction will have executed by the time schedule_stage executes, due to the order the pipeline is executed in. This also undoes part of a patch from last year (`62ec973`) which treated the symptom of accidental register allocation, without preventing the registers from being allocated in the first place. This patch also removes now redundant code that sets the exec_mask in instructions.cc for memory instructions Change-Id: Idabd35020000764fb06133ac2458606c1aaf6f04 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45346 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-08 01:18:01 +00:00
Daniel R. Carvalho	5ff1fac819	misc: Rename Debug namespace as debug As part of recent decisions regarding namespace naming conventions, all namespaces will be changed to snake case. gem5::Debug became gem5::debug. Change-Id: Ic04606baab3317d2e58ab3ca9b37fc201c406ee8 Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47305 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-07 23:18:59 +00:00
Daniel R. Carvalho	60e4ad955d	mem-ruby: Add a ruby namespace Encapsulate all ruby-related files in a ruby namespace. Change-Id: If642c9751ecefc35b45c5dd69d85e67813cc5224 Issued-on: https://gem5.atlassian.net/browse/GEM5-984 Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47307 Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-07 23:18:59 +00:00
Giacomo Travaglini	d1cdcb311b	misc: Move Mode and Translation from BaseTLB to BaseMMU This is a step towards moving most of the TLB logic to the MMU class. Change-Id: Id6b1fb30aa89960705f165f9738f5b50aa1e6bdb Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46779 Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-07 08:44:13 +00:00

1 2 3 4 5

230 Commits