derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matt Sinclair	a030ff2745	mem-ruby: fix atomic deadlock with WB GPU L2 caches By default the GPU VIPER coherence protocol uses a WT L2 cache. However it has support for using WB caches (although this is not tested currently). When using a WB L2 cache for the GPU, this results in deadlocks with atomics. Specifically, when an atomic reaches the L2 and the line is currently in M or W, the line must be written back before the atomic can be performed. However, the current support has two issues: a) it never performs the atomic operation -- while VIPER current assumes all atomics are system scope atomics and thus cannot be performed at the L2 and this transition requires the dirty line be written back before performing the atomic, the transition never performs the atomic nor does the response path handle it. b) putting the atomic action right after the write back is not safe because we need to ensure the requests are ordered when they reach memory -- thus we have to wait until the write back is acknowledged before it's safe to send/perform the atomic. To fix this, this change modifies the transition in question to put the atomic on the stalled requests buffer, which the WBAck will check when it returns to the L2 (and thus perform the atomic, which will result in the atomic being sent on to the directory). This fix has been tested and verified with both the per-checkin and nightly GPU Ruby Random tester tests (with a WB L2 cache). Change-Id: I9a43fd985dc71297521f4b05c47288d92c314ac7 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/68978 Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-03-22 04:00:38 +00:00
Matt Sinclair	92d920f994	mem-ruby: fix load deadlock with WB GPU L2 caches By default the GPU VIPER coherence protocol uses a WT L2 cache. However it has support for using WB caches (although this is not tested currently). When using a WB L2 cache for the GPU, this results in deadlocks with loads. Specifically, when a load reaches the L2 and the line is currently in the W state, that line must be written back before the load can be performed. However, the current transition for this in the L2 did not attempt to retry the load when the WB completes, resulting in a deadlock. This deadlock can be replicated by running the GPU Ruby random tester as is with a WB L2 cache instead of a WT L2 cache. To fix this, this change modifies the transition in question to put the load on the stalled requests buffer, which the WBAck will check when it returns to the L2 (and thus perform the load). This fix has been tested and verified with both the per-checkin and nightly GPU Ruby Random tester tests (with a WB L2 cache). Change-Id: Ieec4f61a3070cf9976b8c3ef0cdbd0cc5a1443c6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/68977 Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2023-03-22 04:00:38 +00:00
Vishnu Ramadas	c23d7bb3ee	gpu-compute, mem-ruby: Add p_popRequestQueue to some transitions Two W->WI transitions, on events RdBlk and Atomic in the GPU L2 cache coherence protocol do not clear the request from the request queue upon completing the transition. This action is not performed in the respone path. This update adds the p_popRequestQueue action to each of these transitions to remove the stale request from the queue. Change-Id: Ia2679fe3dd702f4df2bc114f4607ba40c18d6ff1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67192 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-01-05 23:41:00 +00:00
Vishnu Ramadas	ddf43726ef	gpu-compute, mem-ruby: Update GPU cache bypassing to use TBE An earlier commit added support for GLC and SLC AMDGPU instruction modifiers. These modifiers enable cache bypassing when set. The GLC/SLC flag information was being threaded through all the way to memory and back so that appropriate actions could be taken upon receiving a request and corresponding response. This commit removes the threading and adds the bypass flag information to TBE. Requests populate this entry and responses access it to determine the correct set of actions to execute. Change-Id: I20ffa6682d109270adb921de078cfd47fb4e137c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67191 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>	2023-01-05 23:38:32 +00:00
Vishnu Ramadas	66d4a15820	gpu-compute,mem-ruby: Add support for GPU cache bypassing The GPU cache models do not support cache bypassing when the GLC or SLC AMDGPU instruction modifiers are used in a load or store. This commit adds cache bypass support by introducing new transitions in the coherence protocol used by the GPU memory system. Now, instructions with the GLC bit set will not cache in the L1 and instructions with SLC bit set will not cache in L1 or L2. Change-Id: Id29a47b0fa7e16a21a7718949db802f85e9897c3 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66991 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-01-03 21:19:24 +00:00
Kyle Roarty	f876e60bc2	mem-ruby: Fix deadlock in GPU VIPER TCC A deadlock occured where we got a RdBlk while in W, which put us in WI while we wait for a writeback to complete. This would cause the request to be stalled while the writeback was occuring, but when the writeback completed (WBAck), we never woke up the requests and thus never completed the RdBlk. This commit adds a wakeup when we receive a WBAck while in WI. Change-Id: I01edf1d7a47757b4f680baf9f33a1a6aa37e7e25 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/59352 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-06-06 18:28:52 +00:00
Matthew Poremba	9313294efe	misc: Remove AMD license addition Remove the line "For use for simulation and test purposes only" in files were AMD is the only copyright holder listed in the header. This happens to be the case for all files where this line exists, removing it completely from gem5. Change-Id: I623f266b002f564301b28774f49081099cfc60fd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53943 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-12-11 04:00:56 +00:00
Matt Sinclair	118677218d	mem-ruby: fix typo in GPU VIPER TCC comment `72ee6d1a` fixed a deadlock in the GPU VIPER TCC. However, it inadvertently added a typo to the comments explaining the change. This commit fixes that. Change-Id: Ibba835aa907be33fc3dd8e576ad2901d5f8f509c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51687 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-10-17 04:07:49 +00:00
Matt Sinclair	1120931105	mem-ruby: Move VIPER TCC decrements to action from in_port Currently, the GPU VIPER TCC protocol handles races between atomics in the triggerQueue_in. This in_port does not check for resource availability, which can cause the trigger queue to execute multiple times. Although this is the expected behavior, the code for handling atomic races decrements the atomicDoneCnt flag in the trigger queue, which is not safe since resource contention may cause it to execute multiple times. To resolve this issue, this commit moves the decrementing of this counter to a new action that is called in an event that happens only when the race between atomics is detected. Change-Id: I552fd4f34fdd9ebeec99fb7aeb4eeb7b150f577f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51368 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-10-08 22:03:13 +00:00
Matt Sinclair	72ee6d1aad	mem-ruby: Update GPU VIPER TCC protocol to resolve deadlock In the GPU VIPER TCC, programs with mixes of atomics and data accesses to the same address, in the same kernel, can experience deadlock when large applications (e.g., Pannotia's graph analytics algorithms) are running on very small GPUs (e.g., the default 4 CU GPU configuration). In this situation, deadlocks occur due to resource stalls interacting with the behavior of the current implementation for handling races between atomic accesses. The specific order of events causing this deadlock are: 1. TCC is waiting on an atomic to return from directory 2. In the meantime it receives another atomic to the same address -- when this happens, the TCC increments number of atomics to this address (numAtomics = 2) that are pending in TBE, and does a write through of the atomic to the directory. 3. When the first atomic returns from the Directory, it decrements the numAtomics counter. numAtomics was at 2 though, because of step #2. So it doesn't deallocate the TBE entry and calls Event:AtomicNotDone. 4. Another request (a LD) to the same address comes along for the same address. The LD does z_stall since the second atomic is pending –- so the LD retries every cycle until the deadlock counter times out (or until the second atomic comes back). 5. The second atomic returns to the TCC. However, because there are so many LD's pending in the cache, all doing z_stall's and retrying every cycle, there are a lot of resource stalls. So, when the second atomic returns, it is forced to retry its operation multiple times -- and each time it decrements the atomicDoneCnt flag (which was added to catch a race between atomics arriving and leaving the TCC in `7246f70bfb`) repeatedly. As a result atomicDoneCnt becomes negative. 6. Since this atomicDoneCnt flag is used to determine when Event:AtomicDone happens, and since the resource stalls caused the atomicDoneCnt flag to become negative, we never complete the atomic. Which means the pending LD can never access the line, because it's stuck waiting for the atomic to complete. 7. Eventually the deadlock threshold is reached. To fix this issue, this commit changes the VIPER TCC protocol from using z_stall to using the stall_and_wait buffer method that the Directory-level of the SLICC already uses. This change effectively prevents resource stalls from dominating the TCC level, by putting pending requests for a given address in a per-address stall buffer. These requests are then woken up when the pending request returns. As part of this change, this change also makes two small changes to the Directory-level protocol (MOESI_AMD_BASE-dir): 1. Updated the names of the wakeup actions to match the TCC wakeup actions, to avoid confusion. 2. Changed transition(B, UnblockWriteThrough, U) to check all stall buffers, as some requests were being placed later in the stall buffer than was being checked. This mirrors the changes in `187c44fe44` to other Directory transitions to resolve races between GPU and DMA requests, but for transitions prior workloads did not stress. Change-Id: I60ac9830a87c125e9ac49515a7fc7731a65723c2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51367 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-10-08 22:03:13 +00:00
Matt Sinclair	0eef1069cb	ruby: fix typo in VIPER TCC triggerQueue The GPU VIPER TCC protocol accidentally used "TiggerMsg" instead of "TriggerMsg" for the triggerQueue_in port. This was a benign bug beacuse the msg type is not used in the in_port implementation but still makes the SLICC harder to understand, so fixing it is worthwhile. Change-Id: I88cbc72bac93bcc58a66f057a32f7bddf821cac9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/44905 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-04-28 01:25:45 +00:00
Matthew Poremba	7246f70bfb	mem-ruby: Fix race related to atomics in VIPER There is a race condition in VIPER where an atomic issued to the same address can occur resulting in multiple trigger messages signalling the compleition of the atomic operation. The first message was deallocating the TBE causing the second message to dereference a nullptr when looking up the TBE. A counter is added to track the number of in flight AtomicDone trigger messages. The AtomicDone is not called until the last in flight message arrives at the trigger queue. The remaining messages call AtomicNotDone which simply pops the message from the queue and keeps the TBE allocated. Change-Id: Ie1de0436861a7c393ad6d2fb2faceb83c18d4cc3 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39175 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-01-15 17:46:38 +00:00
Hoa Nguyen	4c42811ff3	mem-ruby: Move CacheMemory stats used in SLICC to a Stats group This change moves some stats that are used in SLICC to a separate Stats::Group. In order to use stats in SLICC, new functions are added in CacheMemory: - profileDemandHit() - profileDemandMiss() The functions increase the corresponding stat by 1. Change-Id: I52b6fefdf6579a49f626f2fca400641f90800017 Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37815 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Tiago Mück <tiago.muck@arm.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com>	2020-12-22 09:52:36 +00:00
Kyle Roarty	1339a1b080	mem-ruby: add cache hit/miss statistics for TCP and TCC Change-Id: Ifa6fdbb9dd062a3684b9620eac6683c57e651a72 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30174 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com> Maintainer: Bradford Beckmann <brad.beckmann@amd.com>	2020-06-20 04:20:45 +00:00
Tuan Ta	18ebe62598	mem-ruby: GCN3 and VIPER integration This patch modifies the Coalescer and VIPER protocol to support memory synchronization requests and write-completion responses that are required by upcoming GCN3 implementation. VIPER protocol is simplified to be a solely write-through protocol. Change-Id: Iccfa3d749a0301172a1cc567c59609bb548dace6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29913 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Bradford Beckmann <brad.beckmann@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-19 20:32:54 +00:00
Gabe Black	c08351f4d3	mem: Move ruby protocols into a directory called ruby_protocol. Now that the gem5 protocols are split out, it would be nice to put them in their own protocol directory. It's also confusing to have files called *_protocol which are not in the protocol directory. Change-Id: I7475ee111630050a2421816dfd290921baab9f71 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20230 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2019-08-23 21:13:07 +00:00

16 Commits