derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Giacomo Travaglini	41928dac80	misc: Remove unused params() definitions Lots of times the params() helper has been defined but not used Change-Id: Id71829aca71341d46964d8f071099342b946b62f Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41613 Tested-by: kokoro <noreply+kokoro@google.com>	2021-02-19 23:27:34 +00:00
Alexander Klimov	92ba3ba843	misc: Use PARAMS The patch is using the newly defined PARAMS macro to replace custom params() getters in derived class. The patch is also removing redundant _params: Instead of creating yet another _params field, SimObject descendants should use params() to expose the real type of SimObject::_params they already have. Change-Id: I43394cebb9661fe747bdbb332236f0f0181b3dba Signed-off-by: Alexander Klimov <Alexander.Klimov@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39900 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-02-19 23:27:34 +00:00
Gabe Black	9a0b79459d	misc: Fix mismatched struct/class "tags" and reenable that warning. The mismatches were from places where Params structs had been declared as classes instead of structs, and ruby's MachineID struct. A comment describing why the warning had been disabled said that it was because of libstdc++ version 4.8. As far as I can tell, that version is old enough to be outside the window we support, and so that should no longer be a problem. It looks like the oldest version of gcc we support, 5.0, corresponds with approximately libstdc++ version 6.0.21. https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html#abi.versioning Change-Id: I75ad92f3723a1883bd47e3919c5572a353344047 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40953 Reviewed-by: Gabe Black <gabe.black@gmail.com> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-02-19 08:29:00 +00:00
Alexandru Dutu	14d6e8fac4	arch-gcn3: Implementation of s_sleep This changeset implements the s_sleep instruction in a similar way to s_waitcnt. Change-Id: I4811c318ac2c76c485e2bfd9d93baa1205ecf183 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39115 Maintainer: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-02-04 00:07:10 +00:00
Bobby R. Bruce	a4345ff324	gpu-compute,misc: Remove unused private variable Clang 9 fails to compile GCN3 due to the unused private variable, `_nxtFreeIdx`, in `src/gpu-compute/dyn_pool_manager.hh`. This variable has therefore been removed. Change-Id: I33f2e9634bbf8d5cea7a42ae2ac9f3ea8298d406 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40397 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-02-03 19:08:02 +00:00
Bobby R. Bruce	ae33daa8d7	gpu-compute,misc: Fix Clang missing override errors Clang fails to compile GCN3 due to missing overrides in `src/gpu-compute/gpu_command_processor.hh`. This commit fixes this errror. Change-Id: I6da9fce7c3eb86a5418a931ee4f225cceda488a5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40396 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-02-03 19:08:02 +00:00
Gabe Black	fc4caa6ad0	misc: Re-remove Authors lines from source files. These were universally removed a while ago, but a bunch have crept back in. Remove them. Change-Id: I3cb5b9f40c9c19aafb5e39a51d1baeae60a591c0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40335 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Gabe Black <gabe.black@gmail.com>	2021-02-03 12:55:17 +00:00
Sooraj Puthoor	965ad12b9a	dev-hsa: enable interruptible hsa signal support Event creation and management support from emulated drivers is required to support interruptible signals in HSA and this support was not available. This changeset adds the event creation and management support in the emulated driver. With this patch, each interruptible signal created by the HSA runtime is associated with a signal event. The HSA runtime can then put a thread waiting on a signal condition to sleep asking the driver to monitor the event associated with that signal. If the signal is modified by the GPU, the dispatcher notifies the driver about signal value change. If the modifier is a CPU thread, the thread will have to make HSA API calls to modify the signal and these API calls will notify the driver about signal value change. Once the driver is notified about a change in the signal value, the driver checks to see if any thread is sleeping on that signal and wake up the sleeping thread associated with that event. The driver has also implemented the time_out wakeup that can wake up the thread after a certain time period has expired. This is also true for barrier packets. Each signal has an event address in a kernel managed and allocated event page that can be used as a mailbox pointer to notify an event. However, this feature used by non-CPU agents to communicate with the driver is not implemented by this changeset because the non-CPU HSA agents in our model can directly communicate with driver in our implementation. Having said that, adding that feature should be trivial because the event address and event pages are correctly setup by this changeset and just adding the event page's virtual address to our PIO doorbell interface in the page tables and registering that pio address to the driver should be sufficient. Managing mailbox pointer for an event is based on event ID and using this event ID as an index into event page, this changeset already provides a unique mailbox pointer for each event. Change-Id: Ic62794076ddd47526b1f952fdb4c1bad632bdd2e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38335 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-01-31 03:25:05 +00:00
Kyle Roarty	68fe6fa6cb	gpu-compute: Simplify LGKM decrementing for Flat instructions This commit makes it so LGKM count is decremented in a single place (after completeAcc), which fixes a couple of potential bugs 1. Data is only written by completeAcc, not after initiateAcc. LGKM count is supposed to be decremented after data is written. 2. LGKM count is now properly decremented for atomics without return Change-Id: Ic791af3b42e04f7baaa0ce50cb2a2c6286c54f5a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39396 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-01-21 21:41:05 +00:00
Matthew Poremba	5323cccfdd	arch-gcn3,gpu-compute: Update stats style for GPU Convert all gpu-compute stats to Stats::Group style. Change-Id: I29116f1de53ae379210c6cfb5bed3fc74f50cca5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39135 Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-01-18 17:58:05 +00:00
Kyle Roarty	a4657f1b84	gpu-compute: Fix LGKM decrementing for flat atomic insts A prior commit (`f6ec145fc0`) fixed early LGKM decrementing for flat loads and stores, but failed to address flat atomics. Per the GCN3 ISA, LGKM count is decremented on flat atomics with return when the data has been returned. This patch checks if the flat instruction is an atomic with return, and decrements LGKM count if so. Change-Id: I5c0c2c205a8b21327d4c42ba71c59842c15bd63b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39155 Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-01-15 06:00:06 +00:00
gauravjain14	c29523665e	gpu-compute: Support for dynamic register alloc SimplePoolManager doesn't allow mapping of two WGs simultaneously on the same Compute Unit (provided the previous WG has been mapped to all the SIMDs) even if there is sufficient VRF and SRF space available. DynPoolManager takes care of that by dynamically allocating and deallocating register file space to wavefronts Change-Id: I2255c68d4b421615d7b231edc05d3ebb27cbd66c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32034 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>	2021-01-14 17:04:27 +00:00
Gabe Black	c6933a27da	misc: Fix missing includes. Change-Id: I545ff03041e8fe66dc489c6aa95c009e54df0970 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38995 Reviewed-by: Gabe Black <gabe.black@gmail.com> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-01-13 08:55:59 +00:00
Kyle Roarty	f6ec145fc0	gpu-compute: Fix FLAT insts decrementing lgkm count early FLAT instructions used to decrement lgkm count on execute, while the GCN3 ISA specifies that lgkm count should be decremented on data being returned or data being written. This patch changes it so that lgkm is decremented after initiateAcc (for stores) and after completeAcc (for loads) to better reflect the ISA definition. This fixes a bug where waitcnts would be satisfied even though the memory access wasn't completed, which lead to instructions using the wrong data. Change-Id: I596cb031af9cda8d47a1b5e146e4a4ffd793d36c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38696 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-01-07 17:12:31 +00:00
Kyle Roarty	e49a072c7d	gpu-compute: Use dict.get syntax for accessing buildEnv keys 37775 removed SmartDict, which is the type buildEnv used to be. Because of that change, doing buildEnv[key] with a key not in the dict returns KeyError instead of False. By using buildEnv(key, False), we are able to return False when the key isn't in the dict. Change-Id: I4aae29b95b082efb2b021f21d608f9cd1c196379 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38135 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-12-01 19:19:52 +00:00
Kyle Roarty	0bb385941b	gpu-compute: Add exp_cnt tracking for buffer store instructions exp_cnt (expInstsIssued in the code) is used in the waitcnt instruction to track that data has been read out of VGPRs in previous global memory instructions, making it safe to overwrite the VGPRs used in said global memory instructions. Previously, exp_cnt wasn't being tracked at all, which lead to the waitcnt finishing immediately, leading to the memory instruction's VPGRs getting overwritten by subsequent instructions, causing errors. This patch makes it so waitcnts waiting on exp_cnt will wait for MUBUF buffer store instructions to read their VGPRs before completing Change-Id: Idd2b59511bc086cf316217da27b7a228272b0b0f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37555 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-11-30 20:59:31 +00:00
Daniel Gerzhoy	9a01d3e927	dev-hsa,gpu-compute: Agent Packet handler implemented. HSA packet processor will now accept and process agent packets. Type field in packet is command type. For now: AgentCmd::Nop = 0 AgentCmd::Steal = 1 Steal command steals the completion signal for a running kernel. This enables a benchmark to use hsa primitives to send an agent packet to steal the signal, then wait on that signal. Minimal working example to be added in gem5-resources. Change-Id: I37f8a4b7ea1780b471559aecbf4af1050353b0b1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37015 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-11-16 16:12:48 +00:00
Tuan Ta	173c1c6eb0	gpu-compute,mem-ruby: Replace ACQUIRE and RELEASE request flags This patch replaces ACQUIRE and RELEASE flags which are HSA-specific. ACQUIRE flag becomes INV_L1 in VIPER protocol. RELEASE flag is removed. Future protocols may support extra cache coherence flags like INV_L2 and WB_L2. Change-Id: I3d60c9d3625c898f4110a12d81742b6822728533 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32859 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-11-04 21:09:26 +00:00
Gabe Black	d05a0a4ea1	misc: Delete the now unnecessary create methods. Most create() methods are no longer necessary. This change deletes them, and occasionally moves some code from them into the constructors they call. Change-Id: Icbab29ba280144b892f9b12fac9e29a0839477e5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36536 Reviewed-by: Gabe Black <gabe.black@gmail.com> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-30 04:00:20 +00:00
Gabe Black	3a49ed0156	gpu: Use X86ISA instead of TheISA in src/gpu-compute. These files are nominally not tied to the X86ISA, but in reality they are because they reach into the GPU TLB, which is defined unchangeably in the X86ISA namespaces, and uses data structures within it. Rather than try to pretend that these structures are generic, we'll instead just use X86ISA instead of TheISA. If this really does become generic in the future, a base class with the ISA agnostic essentials defined in it can be used instead, and the ISA specific TLBs can defined their own derived class which has whatever else they need. Really the compute unit shouldn't be communicating with the TLB using sender state since those are supposed to be little notes for the sender to keep with a transaction, not for communicating between entities across a port. Change-Id: Ie6573396f6c77a9a02194f5f4595eefa45d6d66b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34174 Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-26 20:32:43 +00:00
Gabe Black	463cb28ca5	misc: Use compiler.hh macros when available. Some places were hand coding __attribute__s when macros in compiler.hh were available to do that job. Using the macros helps abstract away compiler specific details and should be used when possible. Change-Id: I94befebcfde2d673e874e9959588f69781bd9021 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35975 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-19 05:52:40 +00:00
Kyle Roarty	b20cc7e6d8	gpu-compute,mem-ruby: Properly create/handle WriteCompletePkts There is a flow of packets as so: WriteResp -> WriteReq -> WriteCompleteResp These packets share some variables, in particular senderState and a status vector. One issue was the WriteResp packet decremented the status vector, which was used by the WriteCompleteResp packets to determine when to handle the global memory response. This could lead to multiple WriteCompleteResp packets attempting to handle the global memory response. Because of that, the WriteCompleteResp packets needed to handle the status vector. this patch moves WriteCompleteResp packet handling back into ComputeUnit::DataPort::processMemRespEvent from ComputeUnit::DataPort::recvTimingResp. This helps remove some redundant code. This patch has the WriteResp packet return without doing any status vector handling, and without deleting the senderState, which had previously caused a segfault. Another issue was WriteCompleteResp packets weren't being issued for each active lane, as the coalesced request was being issued too early. In order to fix that, we have to ensure every active lane puts their request into their applicable coalesced request before issuing the coalesced request. Because of that change, we change the issuing of CoalescedRequests from GPUCoalescer::coalescePacket to GPUCoalescer::completeIssue. That change involves adding a new variable to store the CoalescedRequests that are created in the calls to coalescePacket. This variable is a map from instruction sequence number to coalesced requests. Additionally, the WriteCompleteResp packet was attempting to access physical memory in hitCallback while not having any data, which caused a crash. This can be resolved either by not allowing WriteCompleteResp packets to access memory, or by copying the data from the WriteReq packet. This patch denies WriteCompleteResp packets memory access in hitCallback. Finally, in VIPERCoalescer::writeCompleteCallback there was a map that held the WriteComplete packets, but no packets were ever being removed. This patch removes packets that match the address that was passed in to the function. Change-Id: I9a064a0def2bf6c513f5295596c56b1b652b0ca4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33656 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-15 17:52:51 +00:00
Gabe Black	91d83cc8a1	misc: Standardize the way create() constructs SimObjects. The create() method on Params structs usually instantiate SimObjects using a constructor which takes the Params struct as a parameter somehow. There has been a lot of needless variation in how that was done, making it annoying to pass Params down to base classes. Some of the different forms were: const Params & Params & Params * const Params * Params const* This change goes through and fixes up every constructor and every create() method to use the const Params & form. We use a reference because the Params struct should never be null. We use const because neither the create method nor the consuming object should modify the record of the parameters as they came in from the config. That would make consuming them not idempotent, and make it impossible to tell what the actual simulation configuration was since it would change from any user visible form (config script, config.ini, dot pdf output). Change-Id: I77453cba52fdcfd5f4eec92dfb0bddb5a9945f31 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35938 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-14 12:06:44 +00:00
Matthew Poremba	53807c8276	configs,gpu-compute: Fixes to connect gmTokenPort When the TokenPort was moved from the GCN3 staging branch to develop the TokenPort was changed from being the port connecting the ComputeUnit to Ruby's vector memory port to a sideband port which inhibits requests to Ruby's vector memory port. As such, it needs to be explicitly connected as a new port. This changes the getPort method in ComputeUnit to be aware of the port as well as modifying the example config to connect to TCPs. The iteration to connect in the config file was modified since it was not properly connecting to TCPs each time and Ruby.py does not explicitly return a list of each MachineType. Change-Id: Ia70a6756b2af54d95e94d19bec5d8aadd3c2d5c0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35096 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-30 20:19:21 +00:00
Gabe Black	b877efa6d4	misc: Update attribute syntax, and reorganize compiler.hh. This change replaces the __attribute__ syntax with the now standard [[]] syntax. It also reorganizes compiler.hh so that all special macros have some explanatory text saying what they do, and each attribute which has a standard version can use that if available and what version of c++ it's standard in is put in a comment. Also, the requirements as far as where you put [[]] style attributes are a little more strict than the old school __attribute__ style. The use of the attribute macros was updated to fit these new, more strict requirements. Change-Id: Iace44306a534111f1c38b9856dc9e88cd9b49d2a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35219 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-28 21:52:59 +00:00
Gabe Black	50a0b85367	arm,base,gpu: Use std::make_unique instead of m5::make_unique. Now that we're using c++14, we can just assume that std::make_unique exists. We no longer have to conditionally inject our own version. Change-Id: I5d851afb02dd05c7af93864ffec3b3184f3d4ec8 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35215 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-28 05:41:08 +00:00
Kyle Roarty	347d7644eb	gpu-compute: replace uint32_t* casts with bits API calls The uint32_t* casting was challenging to fully understand what was being done at a glance. Replaced with calls to various bits functions as it's functionally equivalent and much more clear. This also fixes a segfault in GPUInitAbi DPRINTFs from a mis-typed uint32_t* cast. Change-Id: Id5d1863942848dd7a9e5e17e8180c33adbc72f15 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34677 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-24 14:53:16 +00:00
Gabe Black	24e87cb1c5	gpu: Stop using TheISA in the GPU TLB. This class is defined inside the X86ISA namespace, so there's no point in pretending it's generic. Remove TheISA and let the code access what it needs from X86ISA naturally since it's there already. Change-Id: I21b5d2d2b9af6aa0c10ddbb5b3ddca1692188dcc Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34173 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2020-09-18 13:48:45 +00:00
Kyle Roarty	be3bcd1629	gpu-compute: Fix deadlock in fetch_unit after branch instruction The following deadlock was occuring in fetch_unit w/timingSim: 1. exec() is called, a wave is ready to fetch, so it sets pendingFetch 2. A packet is sent to ITLB to fetch for that wave 3. The wave executes a branch, causing the fetch buffer to be cleared 4. The packet is handled, and fetch() is called. However, because the fetch buffer was cleared, it returns doing nothing. 5. exec() gets called again, but the wave will never be scheduled to fetch, as pendingFetch is still set to true. This patch clears pendingFetch (and dropFetch) before returning in fetch() when the fetch buffer has been cleared. dropFetch needed to be cleared otherwise gem5 would crash. Change-Id: Iccbac7defc4849c19e8b17aa2492da641defb772 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34555 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-17 21:24:19 +00:00
Gabe Black	49a41da964	gpu: Fix a syntax error in X86GPUTLB.py. The recent changes which removed master/slave terminology also accidentally deleted an "=", making the syntax in that file illegal. Change-Id: I50aa945f0f66765db36775380b98a88caff23c13 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34576 Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-16 06:08:14 +00:00
Shivani Parekh	392c1ced53	misc: Replaced master/slave terminology Change-Id: I4df2557c71e38cc4e3a485b0e590e85eb45de8b6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33553 Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-10 23:02:28 +00:00
Kyle Roarty	b00b986353	misc: Use VPtr in hsa_driver.cc This change updates HSADriver::allocateQueue to take in a ThreadContext pointer as opposed to a PortProxy ref. This allows the TypedBufferArg to be replaced with VPtr. This also fixes building GCN3_X86 Change-Id: I1fea26b10c7344daf54a0cb05337e961f834a5fd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33655 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-31 17:44:11 +00:00
Gabe Black	1d755b4ba1	misc: Clean up usage of arch/isa_traits.hh. isa_traits.hh used to have much more in it, but now it only has PageShift, PageBytes, and (for now) the guest endianness. These values should only be retrieved from the System class generally speaking, so only the system class should include arch/isa_traits.hh. Some gpu compute related files need PageBytes or PageShift. Even though those files don't advertise their ISA dependence, they are tied to x86. In those files, they can include arch/x86/isa_traits.hh. The only other file which legitimately needs arch/isa_traits.hh is the decoder cache since it uses PageBytes to size an array. Change-Id: I12686368715623e3140a68a7027c136bd52567b1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33203 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-28 07:20:58 +00:00
Tony Gutierrez	94000aefe6	gpu-compute: Create CU's ports in the standard way The CU would initialize its ports in getMasterPort(), which is not desirable as getMasterPort() may be called several times for the same port. This can lead to a fatal if the CU expects to only create a single port of a given type, and may lead to other issues where stat names are duplicated. This change instantiates and initializes the CU's ports in the CU constructor using the CU params. The index field is also removed from the CU's ports because the base class already has an ID field, which will be set to the default value in the base class's constructor for scalar ports. It doesn't make sense for scalar port's to take an index because they are scalar, so we let the base class initialize the ID to the invalid port ID. Change-Id: Id18386f5f53800a6447d968380676d8fd9bac9df Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32836 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-27 16:31:46 +00:00
Emily Brickey	6333e914d3	gpu-compute: update port terminology Change-Id: I3121c4afb1e137aebe09c1d694e9484844d02b9b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32313 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Poremba <chesp3@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-26 16:48:13 +00:00
Kyle Roarty	b872f02ab1	configs,gpu-compute,mem-ruby: connect gmTokenPorts in apu_se This patch adds gmTokenPorts to the ComputeUnit and RubyGPUCoalescer python classes so the gmTokenPorts can be connected in apu_se. Change-Id: Icf3cb05c757754d6935b46f14e4b1b1d5072c4ca Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32677 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-18 23:47:16 +00:00
Gabe Black	40e8cac306	misc: Make registerExitCallback use CallbackQueue2. Issue-on: https://gem5.atlassian.net/browse/GEM5-698 Change-Id: I526d4a19ca4e54a6469a4ee26693c1c0400fcc70 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32644 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-18 11:49:06 +00:00
Matthew Poremba	9b95f32b12	arch-gcn3,gpu-compute: Fix GCN3 related compiler errors Fix all errors that were revealed using the util/compiler-test.sh script. Change-Id: Ie0d35568624e5e1405143593f0677bbd0b066b61 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31154 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-20 14:53:13 +00:00
Tony Gutierrez	4d737462c2	gpu-compute, arch-gcn3: Change how waitcnts are implemented Use single counters per memory operation type and increment them upon issue, not execute. Change-Id: I6afc0b66b21882538ef90a14a57a3ab3cc7bd6f3 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29973 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-17 16:36:23 +00:00
Tony Gutierrez	63c76448eb	gpu-compute: Add pipeline stage interface classes This change separates the pipeline stage interfaces for the GPU's compute unit into their own classes with a well-defined interface. This helps to create a cleaner interface for users to extend the CU pipeline's capabilities and also helps consolidate all the pipeline communication code in one place in the source. Change-Id: I569d52bce84dc1b9fbf8f0f96d53a81a2b6773c6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29972 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-17 16:36:09 +00:00
Alexandru Dutu	7d50d5d972	gpu-compute: No RF scheduling in case of SKIP or EMPTY In case of flat memory instructions the status for the LM pipe execution unit is set to SKIP or EMPTY, as the bus between the VRF and the GM and LM pipe is shared. The destination operands should not be scheduled for the LM pipe, event if the wave is in the dispatch list. This can lead to deadlock in the destination cache as DCEs are reused and the slotsAvailableForBank count gets artificially incremented. Change-Id: I2230c53e3bc1032d2cccbe00fab62c99ab8de6cd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29970 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-17 16:34:59 +00:00
Tony Gutierrez	5f0378b8d0	gpu-compute: Use refs to CU in pipe stages/mem pipes The pipe stages and memory pipes are changed to store a reference to their parent CU as opposed to a pointer. These objects will never change which CU they belong to, and they are constructed by their parent CU. Change-Id: Ie5476e1e2e124a024c2efebceb28cb3a9baa78c1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29969 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-17 16:34:36 +00:00
Michael LeBeane	83fe4754e7	gpu-compute: Fix Y-dimension ABI decode We currently have a bug in decoding workitem ID from the kernel descriptor with multiple dimensions. The enable_vgpr_workitem_id bits are currently seperated into x and y components, when they should be treated as a single 2 bit value, where y is enabled when it is > 0, and z is enabled when it is > 1. The current setup allows a kernel launch with vgprs reserved for the z dimension and not the y dimension, which is incorrect. Change-Id: Iee64b207feb95bcf064898d5db33b8f201e25323 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29965 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-17 16:32:56 +00:00
Tony Gutierrez	f64ff89212	gpu-compute: Don't track vector store insts in CU's headTailMap This change fixes a memory leak due to live GPUDynInstPtr references to vector store insts being stored in the CU's headTailMap and never released. This happened because store insts are not supposed to have their head-tail latencies tracked by the headTailMap; instead they use timing information from the GPUCoalescer. When updating the headTailLatency stat via the headTailMap, only loads were considered and removed from the headTailMap, however when inserting into the headTailMap loads and stores were considered, thus leading to the memory leak. This change fixes the issue by only adding loads to the headTailMap. Change-Id: I8a8f5b79f55e00481ae5e82519a9ed627a7ecbd1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29963 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-17 16:32:06 +00:00
Michael LeBeane	1d816250f8	gpu_compute: Support loading BLIT kernels The BLIT kernels used to implement DMA through the shaders don't fill out all of the standard fields in an amd_kernel_code_t object. This patch modifies the code object parsing logic to support these new kernels. BLIT kernels are used in APUs when using ROCm memcopies for certain size buffers, and are used for dGPUs when the SDMA engines are disabled. Change-Id: Id4e667474d05e311097dbec443def07dfad14a79 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29959 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-17 16:13:59 +00:00
Matt Sinclair	13079629a1	arch-gcn3: convert vALU instruction counters from 32 to 64-bit The vALU instruction counters were previously 32 bits, but for some workloads this value wraps around and triggers an assert failure because the max vALU operations are reached. To resolve this, this commit increases the counter size to 64 bits. Change-Id: I90ed4514669485cfea7ccc37ba9d69665277bccb Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29950 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-16 20:37:22 +00:00
Tony Gutierrez	0c5d671ea1	gpu-compute: Init CU object for pipe stages in their ctors This change updates the constructors of the CU's pipe stages/memory pipelines to accept a pointer to their parent CU. Because the CU creates these objects, and can pass a pointer to itself to these object via their constructors, this is the safer way to initalize these classes. Change-Id: I0b3732ce7c03781ee15332dac7a21c097ad387a4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29945 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-16 20:37:22 +00:00
Tony Gutierrez	af621cd6e6	gpu-compute, arch-gcn3: refactor barriers Barriers were not modeled properly. Firstly, barriers were allocated to each WG that was launched, which is not correct, and the CU would provide an infinite number of barrier slots. There are a limited number of barrier slots per CU in reality. In addition, the CU will not allocate barrier slots to WGs with a single WF (nothing to sync if only one WF). Beyond modeling problems, there also the issue of deadlock. The barrier could deadlock because not all WFs are freed from the barrier once it has been satisfied. Instead, we relied on the scoreboard stage to release them lazily, one-by-one. Under this implementation the scoreboard may not fully release all WFs participating in a barrier; this happens because the first WF to be freed from the barrier could reach an s_barrier instruction again, forever causing the barrier counts across WFs to be out-of-sync. This change refactors the barrier logic to: 1) Create a proper barrier slot implementation 2) Enforce (via a parameter) the number of barrier slots on the CU. 3) Simplify the logic and cleanup the code (i.e., we no longer iterate through the entire WF list each time we check if a barrier is satisfied). 4) Fix deadlock issues. Change-Id: If53955b54931886baaae322640a7b9da7a1595e0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29943 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-16 20:37:22 +00:00
Tony Gutierrez	701f026ba5	gpu-compute: Fix LDS out-of-bounds behavior The LDS is capable of handling out-of-bounds accesses, that is, accesses that are outside the bounds of the chunk allocated to a WG. Currently, the simulator asserts on these accesses. This patch changes the behavior of the LDS to return 0 for reads and dropping writes that are out-of-bounds. Change-Id: I5f467d0f52113e8565e1a3029e82fb89cc6f07ea Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29940 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-16 20:37:22 +00:00
Xianwei Zhang	024f978cff	gpu-compute: enable kernel-end WB functionality Change-Id: Ib17e1d700586d1aa04d408e7b924270f0de82efe Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29938 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com>	2020-07-13 23:32:37 +00:00

1 2 3

142 Commits