derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matthew Poremba	9f5c0f2822	gpu-compute: dprint instruction requesting translation When debugging strange addresses, it is extremely useful to know what instruction calculated that address. This make it much easier to follow assembly code backwards to find the source of an incorrect address. This change adds a DPRINTF for GPUTLB that by default prints the disassembly when a virtual address translation is sent to the TLB. Change-Id: I5066c064a48c5c48696863eeccd8d011245ef7b2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63176 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-09-09 04:13:49 +00:00
Matthew Poremba	e36a8dbd8a	gpu-compute: Handle GPUFS system store responses Requests in GPUFS which go to system memory will not generate the WriteCompleteResp packets that the VIPER protocol would normally created for device requests which go through the caches. Therefore, we need to callback the GM pipe handleResponse to complete the access and make forward progress. Change-Id: Ic00c430ce420a591fe5743f758b780d93afd2a38 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57989 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-07 20:11:01 +00:00
Matthew Poremba	fcbc9afcd6	gpu-compute: Don't use emulated driver in full system The emulated driver is currently called in a few locations unconditionally. This changeset adds checks that we are not in full system before calling any emulated driver function. In full system the amdgpu driver running on the disk image handles these functions. Change-Id: Iea3546b574e29c649351c0fce9154530be89e9b1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57712 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-07 20:11:01 +00:00
Matthew Poremba	f375e79bcf	gpu-compute: Support Scalar and Vector access to system pages The amdgpu driver supports reading and writing scalar and vector memory addresses that reside in system memory. This is commonly used for things like blit kernels that perform host-to-device or device-to-host copies using GPU load/store instructions. This is done by utilizing the system hub device added in a prior changeset. Memory packets translated by the Scalar or VMEM TLBs will have the correspoding system request field set from the PTE in the TLB which can be used in the compute unit to determine if a request is for system memory or not. Another important change is to return global memory tokens for system requests. Since these do not flow through the GPU coalescer where the token is returned, the token can be returned once the request is known to be a system request. Change-Id: I35030e0b3698f10c63a397f96b81267271e3130e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57711 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-07 20:11:01 +00:00
Matthew Poremba	91e8bbe299	configs,gpu-compute: Support fetch from system pages The amdgpu driver supports fetching instructions from pages which reside in system memory rather than device memory. This changeset adds support to do this by adding the system hub object added in a prior changeset to the fetch unit and issues requests to the system hub if the system bit in the memory page's PTE is set. Otherwise, the requestor ID is set to be device memory and the request is routed through the Ruby network / GPU caches to fetch the instructions. Change-Id: Ib2fb47c589fdd5e544ab6493d7dbd8f2d9d7b0e8 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57652 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-28 23:24:53 +00:00
Matthew Poremba	51648570ea	gpu-compute: Add methods to read GPU memory requestor ID These methods are called from various places to override the requestor ID of a request in order to determine which Ruby network a request should be routed on. Change-Id: Ic0270ddd7123f0457a13144e69ef9132204d4334 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57651 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-25 19:51:29 +00:00
Matthew Poremba	539a2e2bcd	arch-vega: Add VEGA page tables and TLB Add the page table walker, page table format, TLB, TLB coalescer, and associated support in the AMDGPUDevice. This page table format used the hardware format for dGPU and is very different from APU/GCN3 which use the X86 page table format. In order to support either format for the GPU model, a common TranslationState called GpuTranslation state is created which holds the combined fields of both the APU and Vega translation state. Similarly the TlbEntry is cast at runtime by the corresponding arch files as they are the only files which touch the internals of the TlbEntry. The GPU model only checks if a TlbEntry is non-null and thus does not need to cast to peek inside the data structure. Change-Id: I4484c66239b48df5224d61caa6e968e56eea38a5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51848 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-17 00:11:14 +00:00
Matthew Poremba	9313294efe	misc: Remove AMD license addition Remove the line "For use for simulation and test purposes only" in files were AMD is the only copyright holder listed in the header. This happens to be the case for all files where this line exists, removing it completely from gem5. Change-Id: I623f266b002f564301b28774f49081099cfc60fd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53943 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-12-11 04:00:56 +00:00
Matthew Poremba	c028af111a	arch-gcn3,gpu-compute: Move TLB to common folder in amdgpu This TLB is more of an "APU" TLB than anything GCN3 specific. It can be used with either GCN3 or Vega. With this change, VEGA_X86 builds and one can run binaries with Vega ISA code using the same steps as GCN3 but building the Vega ISA instead. Change-Id: I0c92bcd0379a18628dc05cb5af070bdc7e692c7c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53803 Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-12-09 17:26:15 +00:00
Matthew Poremba	3112a7f0d0	arch-gcn3,gpu-compute: Move GCN3 specific TLB to arch Move GpuTLB and TLBCoalescer to GCN3 as the TLB format is specific to GCN3 and SE mode / APU simulation. Vega will have its own TLB, coalescer, and walker suitable for a dGPU. This also adds a using alias for the TLB translation state to reduce the number of references to TheISA and X86ISA. X86 specific includes are also removed. Change-Id: I34448bb4e5ddb9980b34a55bc717bbcea0e03db5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49847 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-10-04 23:47:03 +00:00
Matthew Poremba	c15e472199	arch-vega: Rework flat instructions to support global Global instructions are new in Vega and are essentially FLAT instructions from GCN3 but guaranteed to go to global memory where as flat can go to global or local memory. This reworks the flat instruction classes so that the initiateAcc / execute / completeAcc logic can be reused for flat, global, and later scratch subtypes of flat instructions. The decoder creates a flat instruction class which sets instruction flags based on the flat instruction's SEG field. There are new initOperandInfo and generateDissasmbly methods for flat and global. The number of operands and operand index getters are modified to check the flags and return the correct value for the subtype. Change-Id: I1db4a3742aeec62424189e54c38c59d6b1a8d3c1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47106 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Kyle Roarty <kyleroarty1716@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-10-04 22:51:37 +00:00
Gabe Black	00876fff20	misc: Replace the GEM5_VAR_USED macro with [[maybe_unused]]. The [[maybe_unused]] attribute is now standard, so we can use that directly without hiding it behind a macro. Change-Id: If24ffd7e50bdb503cb3e6ea61f226ea794e84b8f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48511 Reviewed-by: Gabe Black <gabe.black@gmail.com> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-29 10:17:51 +00:00
Giacomo Travaglini	d1cdcb311b	misc: Move Mode and Translation from BaseTLB to BaseMMU This is a step towards moving most of the TLB logic to the MMU class. Change-Id: Id6b1fb30aa89960705f165f9738f5b50aa1e6bdb Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46779 Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-07 08:44:13 +00:00
Daniel R. Carvalho	974a47dfb9	misc: Adopt the gem5 namespace Apply the gem5 namespace to the codebase. Some anonymous namespaces could theoretically be removed, but since this change's main goal was to keep conflicts at a minimum, it was decided not to modify much the general shape of the files. A few missing comments of the form "// namespace X" that occurred before the newly added "} // namespace gem5" have been added for consistency. std out should not be included in the gem5 namespace, so they weren't. ProtoMessage has not been included in the gem5 namespace, since I'm not familiar with how proto works. Regarding the SystemC files, although they belong to gem5, they actually perform integration between gem5 and SystemC; therefore, it deserved its own separate namespace. Files that are automatically generated have been included in the gem5 namespace. The .isa files currently are limited to a single namespace. This limitation should be later removed to make it easier to accomodate a better API. Regarding the files in util, gem5:: was prepended where suitable. Notice that this patch was tested as much as possible given that most of these were already not previously compiling. Change-Id: Ia53d404ec79c46edaa98f654e23bc3b0e179fe2d Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46323 Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu> Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-01 19:08:24 +00:00
Daniel R. Carvalho	98ac080ec4	base-stats,misc: Rename Stats namespace as statistics As part of recent decisions regarding namespace naming conventions, all namespaces will be changed to snake case. ::Stats became ::statistics. "statistics" was chosen over "stats" to avoid generating conflicts with the already existing variables (there are way too many "stats" in the codebase), which would make this patch even more disturbing for the users. Change-Id: If877b12d7dac356f86e3b3d941bf7558a4fd8719 Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45421 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-05-29 11:13:49 +00:00
Daniel R. Carvalho	4dd099ba3d	misc: Rename Enums namespace as enums As part of recent decisions regarding namespace naming conventions, all namespaces will be changed to snake case. ::Enums became ::enums. Change-Id: I39b5fb48817ad16abbac92f6254284b37fc90c40 Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45420 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-05-29 11:13:49 +00:00
Gabe Black	fb3befcc6d	misc: Replace M5_VAR_USED with GEM5_VAR_USED. Change-Id: I64a874ccd1a9ac0541dfa01971d7d620a98c9d32 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45231 Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Gabe Black <gabe.black@gmail.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>	2021-05-11 20:16:31 +00:00
Michael LeBeane	ad43083bb3	gpu-compute: Implement per-request MTYPEs GPU MTYPE is currently set using a global config passed to the PACoalescer. This patch enables MTYPE to be set by the shader on a per-request bases. In real hardware, the MTYPE is extracted from a GPUVM PTE during address translation. However, our current simulator only models x86 page tables which do not have the appropriate bits for GPU MTYPES. Rather than hacking non-x86 bits into our x86 page table models, this patch instead keeps an interval tree of all pages that request custom MTYPES in the driver itself. This is currently only used to map host pages to the GPU as uncacheable, but is easily extensible to other MTYPES. Change-Id: I7daab0ffae42084b9131a67c85cd0aa4bbbfc8d6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42216 Maintainer: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-04-24 15:54:15 +00:00
Gabe Black	3f67faec83	arch,dev,gpu-compute,sim: Rename isa_traits.hh page_size.hh. The only thing left in isa_traits.hh are two constants, one for the number of bytes in a page, and one for how far to shift an address to get the page number. To make it clear that this is the only thing isa_traits.hh should be used for from this point forward (until it is entirely eliminated), this change renames it to the much less generic page_size.hh. Also, because isa_traits.hh used to have much more stuff in it, it was included in a lot of places it didn't need to be. This change also clears out all these legacy includes while updating the actually needed ones to the new name. Change-Id: I939b01b117c53d620b6b0a98982f6f21dc2ada72 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40179 Reviewed-by: Gabe Black <gabe.black@gmail.com> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-03-30 10:17:48 +00:00
Matthew Poremba	5323cccfdd	arch-gcn3,gpu-compute: Update stats style for GPU Convert all gpu-compute stats to Stats::Group style. Change-Id: I29116f1de53ae379210c6cfb5bed3fc74f50cca5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39135 Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-01-18 17:58:05 +00:00
gauravjain14	c29523665e	gpu-compute: Support for dynamic register alloc SimplePoolManager doesn't allow mapping of two WGs simultaneously on the same Compute Unit (provided the previous WG has been mapped to all the SIMDs) even if there is sufficient VRF and SRF space available. DynPoolManager takes care of that by dynamically allocating and deallocating register file space to wavefronts Change-Id: I2255c68d4b421615d7b231edc05d3ebb27cbd66c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32034 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>	2021-01-14 17:04:27 +00:00
Tuan Ta	173c1c6eb0	gpu-compute,mem-ruby: Replace ACQUIRE and RELEASE request flags This patch replaces ACQUIRE and RELEASE flags which are HSA-specific. ACQUIRE flag becomes INV_L1 in VIPER protocol. RELEASE flag is removed. Future protocols may support extra cache coherence flags like INV_L2 and WB_L2. Change-Id: I3d60c9d3625c898f4110a12d81742b6822728533 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32859 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-11-04 21:09:26 +00:00
Gabe Black	d05a0a4ea1	misc: Delete the now unnecessary create methods. Most create() methods are no longer necessary. This change deletes them, and occasionally moves some code from them into the constructors they call. Change-Id: Icbab29ba280144b892f9b12fac9e29a0839477e5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36536 Reviewed-by: Gabe Black <gabe.black@gmail.com> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-30 04:00:20 +00:00
Gabe Black	3a49ed0156	gpu: Use X86ISA instead of TheISA in src/gpu-compute. These files are nominally not tied to the X86ISA, but in reality they are because they reach into the GPU TLB, which is defined unchangeably in the X86ISA namespaces, and uses data structures within it. Rather than try to pretend that these structures are generic, we'll instead just use X86ISA instead of TheISA. If this really does become generic in the future, a base class with the ISA agnostic essentials defined in it can be used instead, and the ISA specific TLBs can defined their own derived class which has whatever else they need. Really the compute unit shouldn't be communicating with the TLB using sender state since those are supposed to be little notes for the sender to keep with a transaction, not for communicating between entities across a port. Change-Id: Ie6573396f6c77a9a02194f5f4595eefa45d6d66b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34174 Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-26 20:32:43 +00:00
Kyle Roarty	b20cc7e6d8	gpu-compute,mem-ruby: Properly create/handle WriteCompletePkts There is a flow of packets as so: WriteResp -> WriteReq -> WriteCompleteResp These packets share some variables, in particular senderState and a status vector. One issue was the WriteResp packet decremented the status vector, which was used by the WriteCompleteResp packets to determine when to handle the global memory response. This could lead to multiple WriteCompleteResp packets attempting to handle the global memory response. Because of that, the WriteCompleteResp packets needed to handle the status vector. this patch moves WriteCompleteResp packet handling back into ComputeUnit::DataPort::processMemRespEvent from ComputeUnit::DataPort::recvTimingResp. This helps remove some redundant code. This patch has the WriteResp packet return without doing any status vector handling, and without deleting the senderState, which had previously caused a segfault. Another issue was WriteCompleteResp packets weren't being issued for each active lane, as the coalesced request was being issued too early. In order to fix that, we have to ensure every active lane puts their request into their applicable coalesced request before issuing the coalesced request. Because of that change, we change the issuing of CoalescedRequests from GPUCoalescer::coalescePacket to GPUCoalescer::completeIssue. That change involves adding a new variable to store the CoalescedRequests that are created in the calls to coalescePacket. This variable is a map from instruction sequence number to coalesced requests. Additionally, the WriteCompleteResp packet was attempting to access physical memory in hitCallback while not having any data, which caused a crash. This can be resolved either by not allowing WriteCompleteResp packets to access memory, or by copying the data from the WriteReq packet. This patch denies WriteCompleteResp packets memory access in hitCallback. Finally, in VIPERCoalescer::writeCompleteCallback there was a map that held the WriteComplete packets, but no packets were ever being removed. This patch removes packets that match the address that was passed in to the function. Change-Id: I9a064a0def2bf6c513f5295596c56b1b652b0ca4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33656 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-15 17:52:51 +00:00
Gabe Black	91d83cc8a1	misc: Standardize the way create() constructs SimObjects. The create() method on Params structs usually instantiate SimObjects using a constructor which takes the Params struct as a parameter somehow. There has been a lot of needless variation in how that was done, making it annoying to pass Params down to base classes. Some of the different forms were: const Params & Params & Params * const Params * Params const* This change goes through and fixes up every constructor and every create() method to use the const Params & form. We use a reference because the Params struct should never be null. We use const because neither the create method nor the consuming object should modify the record of the parameters as they came in from the config. That would make consuming them not idempotent, and make it impossible to tell what the actual simulation configuration was since it would change from any user visible form (config script, config.ini, dot pdf output). Change-Id: I77453cba52fdcfd5f4eec92dfb0bddb5a9945f31 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35938 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-14 12:06:44 +00:00
Gabe Black	b877efa6d4	misc: Update attribute syntax, and reorganize compiler.hh. This change replaces the __attribute__ syntax with the now standard [[]] syntax. It also reorganizes compiler.hh so that all special macros have some explanatory text saying what they do, and each attribute which has a standard version can use that if available and what version of c++ it's standard in is put in a comment. Also, the requirements as far as where you put [[]] style attributes are a little more strict than the old school __attribute__ style. The use of the attribute macros was updated to fit these new, more strict requirements. Change-Id: Iace44306a534111f1c38b9856dc9e88cd9b49d2a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35219 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-28 21:52:59 +00:00
Shivani Parekh	392c1ced53	misc: Replaced master/slave terminology Change-Id: I4df2557c71e38cc4e3a485b0e590e85eb45de8b6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33553 Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-10 23:02:28 +00:00
Gabe Black	1d755b4ba1	misc: Clean up usage of arch/isa_traits.hh. isa_traits.hh used to have much more in it, but now it only has PageShift, PageBytes, and (for now) the guest endianness. These values should only be retrieved from the System class generally speaking, so only the system class should include arch/isa_traits.hh. Some gpu compute related files need PageBytes or PageShift. Even though those files don't advertise their ISA dependence, they are tied to x86. In those files, they can include arch/x86/isa_traits.hh. The only other file which legitimately needs arch/isa_traits.hh is the decoder cache since it uses PageBytes to size an array. Change-Id: I12686368715623e3140a68a7027c136bd52567b1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33203 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-28 07:20:58 +00:00
Tony Gutierrez	94000aefe6	gpu-compute: Create CU's ports in the standard way The CU would initialize its ports in getMasterPort(), which is not desirable as getMasterPort() may be called several times for the same port. This can lead to a fatal if the CU expects to only create a single port of a given type, and may lead to other issues where stat names are duplicated. This change instantiates and initializes the CU's ports in the CU constructor using the CU params. The index field is also removed from the CU's ports because the base class already has an ID field, which will be set to the default value in the base class's constructor for scalar ports. It doesn't make sense for scalar port's to take an index because they are scalar, so we let the base class initialize the ID to the invalid port ID. Change-Id: Id18386f5f53800a6447d968380676d8fd9bac9df Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32836 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-27 16:31:46 +00:00
Emily Brickey	6333e914d3	gpu-compute: update port terminology Change-Id: I3121c4afb1e137aebe09c1d694e9484844d02b9b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32313 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Poremba <chesp3@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-26 16:48:13 +00:00
Gabe Black	40e8cac306	misc: Make registerExitCallback use CallbackQueue2. Issue-on: https://gem5.atlassian.net/browse/GEM5-698 Change-Id: I526d4a19ca4e54a6469a4ee26693c1c0400fcc70 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32644 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-18 11:49:06 +00:00
Tony Gutierrez	63c76448eb	gpu-compute: Add pipeline stage interface classes This change separates the pipeline stage interfaces for the GPU's compute unit into their own classes with a well-defined interface. This helps to create a cleaner interface for users to extend the CU pipeline's capabilities and also helps consolidate all the pipeline communication code in one place in the source. Change-Id: I569d52bce84dc1b9fbf8f0f96d53a81a2b6773c6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29972 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-17 16:36:09 +00:00
Tony Gutierrez	5f0378b8d0	gpu-compute: Use refs to CU in pipe stages/mem pipes The pipe stages and memory pipes are changed to store a reference to their parent CU as opposed to a pointer. These objects will never change which CU they belong to, and they are constructed by their parent CU. Change-Id: Ie5476e1e2e124a024c2efebceb28cb3a9baa78c1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29969 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-17 16:34:36 +00:00
Tony Gutierrez	f64ff89212	gpu-compute: Don't track vector store insts in CU's headTailMap This change fixes a memory leak due to live GPUDynInstPtr references to vector store insts being stored in the CU's headTailMap and never released. This happened because store insts are not supposed to have their head-tail latencies tracked by the headTailMap; instead they use timing information from the GPUCoalescer. When updating the headTailLatency stat via the headTailMap, only loads were considered and removed from the headTailMap, however when inserting into the headTailMap loads and stores were considered, thus leading to the memory leak. This change fixes the issue by only adding loads to the headTailMap. Change-Id: I8a8f5b79f55e00481ae5e82519a9ed627a7ecbd1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29963 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-17 16:32:06 +00:00
Tony Gutierrez	0c5d671ea1	gpu-compute: Init CU object for pipe stages in their ctors This change updates the constructors of the CU's pipe stages/memory pipelines to accept a pointer to their parent CU. Because the CU creates these objects, and can pass a pointer to itself to these object via their constructors, this is the safer way to initalize these classes. Change-Id: I0b3732ce7c03781ee15332dac7a21c097ad387a4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29945 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-16 20:37:22 +00:00
Tony Gutierrez	af621cd6e6	gpu-compute, arch-gcn3: refactor barriers Barriers were not modeled properly. Firstly, barriers were allocated to each WG that was launched, which is not correct, and the CU would provide an infinite number of barrier slots. There are a limited number of barrier slots per CU in reality. In addition, the CU will not allocate barrier slots to WGs with a single WF (nothing to sync if only one WF). Beyond modeling problems, there also the issue of deadlock. The barrier could deadlock because not all WFs are freed from the barrier once it has been satisfied. Instead, we relied on the scoreboard stage to release them lazily, one-by-one. Under this implementation the scoreboard may not fully release all WFs participating in a barrier; this happens because the first WF to be freed from the barrier could reach an s_barrier instruction again, forever causing the barrier counts across WFs to be out-of-sync. This change refactors the barrier logic to: 1) Create a proper barrier slot implementation 2) Enforce (via a parameter) the number of barrier slots on the CU. 3) Simplify the logic and cleanup the code (i.e., we no longer iterate through the entire WF list each time we check if a barrier is satisfied). 4) Fix deadlock issues. Change-Id: If53955b54931886baaae322640a7b9da7a1595e0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29943 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-16 20:37:22 +00:00
Xianwei Zhang	024f978cff	gpu-compute: enable kernel-end WB functionality Change-Id: Ib17e1d700586d1aa04d408e7b924270f0de82efe Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29938 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com>	2020-07-13 23:32:37 +00:00
Matt Sinclair	8177fc4392	arch-gcn3: add support for unaligned accesses Previously, with HSAIL, we were guaranteed by the HSA specification that the GPU will never issue unaligned accesses. However, now that we are directly running GCN this is no longer true. Accordingly, this commit adds support for unaligned accesses. Moreover, to reduce the replication of nearly identical code for the different request types, I also added new helper functions that are called by all the different memory request producing instruction types in op_encodings.hh. Adding support for unaligned instructions requires changing the statusBitVector used to track the status of the memory requests for each lane from a bit per lane to an int per lane. This is necessary because an unaligned access may span multiple cache lines. In the worst case, each lane may span multiple cache lines. There are corresponding changes in the files that use the statusBitVector. Change-Id: I319bf2f0f644083e98ca546d2bfe68cf87a5f967 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29920 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-19 20:41:18 +00:00
Tony Gutierrez	b8da9abba7	gpu-compute, mem-ruby, configs: Add GCN3 ISA support to GPU model Change-Id: Ibe46970f3ba25d62ca2ade5cbc2054ad746b2254 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29912 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-15 22:45:17 +00:00
Matthew Poremba	3d57eaf9f5	gpu-compute,mem-ruby: Refactor GPU coalescer Remove the read/write tables and coalescing table and introduce a two levels of tables for uncoalesced and coalesced packets. Tokens are granted to GPU instructions to place in uncoalesced table. If tokens are available, the operation always succeeds such that the 'Aliased' status is never returned. Coalesced accesses are placed in the coalesced table while requests are outstanding. Requests to the same address are added as targets to the table similar to how MSHRs operate. Change-Id: I44983610307b638a97472db3576d0a30df2de600 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27429 Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Bradford Beckmann <brad.beckmann@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-05-11 21:25:19 +00:00
Matthew Poremba	5c2fb0c652	sim-se: Switch to new MemState API Switch over to the new MemState API by specifying memory regions for stack in each ISA, changing brkFunc to use MemState for heap memory, and calling the MemState fixup in fixupStackFault (renamed to just fixupFault). Change-Id: Ie3559a68ce476daedf1a3f28b168a8fbc7face5e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/25366 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-03-25 19:18:15 +00:00
Gabe Black	4dd00b0153	arch,cpu,gpu-compute,mem: Remove asid from Request objects. This is passed around a lot and set all over the place (usually to 0), but it's never actually used for anything. Change-Id: I38ca08387beabeaf9e339b4915ec7eba9e19eecb Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/26232 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Gabe Black <gabeblack@google.com>	2020-03-07 00:40:41 +00:00
Gabe Black	71a868224c	gpu-compute: Delete authors lists from gpu-compute files. Change-Id: I72318eb885f9517de325ea9a9af263f36613bf6e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/25414 Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>	2020-02-17 10:05:52 +00:00
Gabe Black	cdcc55a6a8	mem: Minimize the use of MemObject. MemObject doesn't provide anything beyond its base ClockedObject any more, so this change removes it from most inheritance hierarchies. Occasionally MemObject is replaced with SimObject when I was fairly confident that the extra functionality of ClockedObject wasn't needed. Change-Id: Ic014ab61e56402e62548e8c831eb16e26523fdce Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18289 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Gabe Black <gabeblack@google.com>	2019-04-28 01:19:40 +00:00
Alexandru Dutu	90d448080e	gpu-compute: Remove unneeded Request::setVirt call This sets the members of a Request object to the values they already hold, except the atomicOpFunctor which is set to nullptr. This call introduces a bug for atomics and is not useful for non-atomic requests. This changeset is also adding the wave PC and instruction sequence number to the Request object. Change-Id: I62f7b4a597483b0aa848a0cfbc72181e1063f56a Reviewed-on: https://gem5-review.googlesource.com/11549 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>	2018-06-26 16:46:40 +00:00
Giacomo Travaglini	f54020eb81	misc: Using smart pointers for memory Requests This patch is changing the underlying type for RequestPtr from Request* to shared_ptr<Request>. Having memory requests being managed by smart pointers will simplify the code; it will also prevent memory leakage and dangling pointers. Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10996 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>	2018-06-11 16:55:30 +00:00
Giacomo Travaglini	2113b21996	misc: Substitute pointer to Request with aliased RequestPtr Every usage of Request* in the code has been replaced with the RequestPtr alias. This is a preparing patch for when RequestPtr will be the typdefed to a smart pointer to Request rather then a raw pointer to Request. Change-Id: I73cbaf2d96ea9313a590cdc731a25662950cd51a Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10995 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>	2018-06-11 16:55:30 +00:00
Brandon Potter	b9f8a548a1	gpu-compute: use X86ISA::TlbEntry over GpuTlbEntry GpuTlbEntry was derived from a vanilla X86ISA::TlbEntry definition. It wrapped the class and included an extra member "valid". This member was intended to report on the validity of the entry, however it introduced bugs when folks forgot to set field properly in the code. So, instead of keeping the extra field which we might forget to set, we track validity by using nullptr for invalid tlb entries (as the tlb entries are dynamically allocated). This saves on the extra class definition and prevents bugs creeping into the code since the checks are intrinsically tied into accessing any of the X86ISA::TlbEntry members. This changeset fixes the issues introduced by a8d030522, `a4e722725`, and `2a15bfd79`. Change-Id: I30ebe3ec223fb833f3795bf0403d0016ac9a8bc2 Reviewed-on: https://gem5-review.googlesource.com/10481 Reviewed-by: Brandon Potter <Brandon.Potter@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>	2018-05-30 19:49:05 +00:00
Tony Gutierrez	abb21ba99f	style: fix amd license and style issues Change-Id: I26136fb49f743c4a597f8021cfd27f78897267b5 Reviewed-on: https://gem5-review.googlesource.com/10463 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>	2018-05-16 15:32:01 +00:00

1 2

66 Commits