mem-ruby: fix load deadlock with WB GPU L2 caches

By default the GPU VIPER coherence protocol uses a WT L2 cache. However it has support for using WB caches (although this is not tested currently). When using a WB L2 cache for the GPU, this results in deadlocks with loads. Specifically, when a load reaches the L2 and the line is currently in the W state, that line must be written back before the load can be performed. However, the current transition for this in the L2 did not attempt to retry the load when the WB completes, resulting in a deadlock. This deadlock can be replicated by running the GPU Ruby random tester as is with a WB L2 cache instead of a WT L2 cache. To fix this, this change modifies the transition in question to put the load on the stalled requests buffer, which the WBAck will check when it returns to the L2 (and thus perform the load). This fix has been tested and verified with both the per-checkin and nightly GPU Ruby Random tester tests (with a WB L2 cache). Change-Id: Ieec4f61a3070cf9976b8c3ef0cdbd0cc5a1443c6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/68977 Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>
2023-03-15 17:06:42 -05:00
parent fb4eb86711
commit 92d920f994
1 changed files with 5 additions and 2 deletions
--- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
+++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
@@ -718,10 +718,13 @@ machine(MachineType:TCC, "TCC Cache")
    p_popRequestQueue;
  }
  transition(W, RdBlk, WI) {TagArrayRead, DataArrayRead} {
-    p_profileHit;
    t_allocateTBE;
    wb_writeBack;
-    p_popRequestQueue;
+    // need to try this request again after writing back the current entry -- to
+    // do so, put it with other stalled requests in a buffer to reduce resource
+    // contention since they won't try again every cycle and will instead only
+    // try again once woken up
+    st_stallAndWaitRequest;
  }

  transition(I, RdBlk, IV) {TagArrayRead} {