From 92d920f99419dc6b3f452175c53738132443c065 Mon Sep 17 00:00:00 2001 From: Matt Sinclair Date: Wed, 15 Mar 2023 17:06:42 -0500 Subject: [PATCH] mem-ruby: fix load deadlock with WB GPU L2 caches By default the GPU VIPER coherence protocol uses a WT L2 cache. However it has support for using WB caches (although this is not tested currently). When using a WB L2 cache for the GPU, this results in deadlocks with loads. Specifically, when a load reaches the L2 and the line is currently in the W state, that line must be written back before the load can be performed. However, the current transition for this in the L2 did not attempt to retry the load when the WB completes, resulting in a deadlock. This deadlock can be replicated by running the GPU Ruby random tester as is with a WB L2 cache instead of a WT L2 cache. To fix this, this change modifies the transition in question to put the load on the stalled requests buffer, which the WBAck will check when it returns to the L2 (and thus perform the load). This fix has been tested and verified with both the per-checkin and nightly GPU Ruby Random tester tests (with a WB L2 cache). Change-Id: Ieec4f61a3070cf9976b8c3ef0cdbd0cc5a1443c6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/68977 Reviewed-by: Matthew Poremba Maintainer: Bobby Bruce Tested-by: kokoro --- src/mem/ruby/protocol/GPU_VIPER-TCC.sm | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm index 0f93339827..0b7f5ed9ad 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm @@ -718,10 +718,13 @@ machine(MachineType:TCC, "TCC Cache") p_popRequestQueue; } transition(W, RdBlk, WI) {TagArrayRead, DataArrayRead} { - p_profileHit; t_allocateTBE; wb_writeBack; - p_popRequestQueue; + // need to try this request again after writing back the current entry -- to + // do so, put it with other stalled requests in a buffer to reduce resource + // contention since they won't try again every cycle and will instead only + // try again once woken up + st_stallAndWaitRequest; } transition(I, RdBlk, IV) {TagArrayRead} {