derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Jarvis Jia	e957a882ed	gpu-compute,mem-ruby: Add RubyHitMiss flag for TCP and TCC cache Add hit and miss print for TCP and TCC cache with RubyHitMiss debug flag Change-Id: I40ae3449020b917f39ac91d29fa4e1dd7c791e7b	2024-06-24 11:19:51 -07:00
Bobby R. Bruce	3138c8a8b1	gpu-compute,mem-ruby: Revert "Add RubyHitMiss flag for TCP and TCC cache" (#1254 ) Reverts gem5/gem5#1226	2024-06-18 07:58:54 -07:00
Jarvis Jia	3a2bf47d57	Add default value and change Ruby address format specifier Change-Id: I8fbaf34745e90589e610d3b9bd423937e7ebdc3d	2024-06-17 03:27:25 -05:00
Jarvis Jia	87c0d7732c	Merge branch 'develop' into rubyhitmiss	2024-06-12 17:30:35 -04:00
Jarvis Jia	edfc139c40	Change black format Change-Id: I3733b31baf187e0d3d38d971d9423a1b1afe2296 gpu-compute: add GPU RubyHitMiss for TCP and TCC Change-Id: I4430532b901811e03d9b077b61e2eca4557b34e1 gpu-compute: Add RubyHitMiss flag for TCP and TCC cache Change-Id: I4e5d1127c84b9eb1060ec9ba0b6638267449eda5 gpu-compute: Add RubyHitMiss flag for TCP and TCC cache Change-Id: I4e5d1127c84b9eb1060ec9ba0b6638267449eda5 Remove space Change-Id: I401f528c6f128ba0956bdbc232e8f2ae37bf648c	2024-06-12 16:04:36 -05:00
Vishnu Ramadas	42b9a9666e	mem-ruby: Add instSeqNum to atomic responses from GPU L2 caches This commit adds instSeqNum to the atomic responses in GPU_VIPER-TCC.sm. This will be useful when debugging issues related to GPU atomic transactions Change-Id: Ic05c8e1a1cb230abfca2759b51e5603304aadaa3	2024-06-11 20:35:43 -05:00
Vishnu Ramadas	943d1f1453	mem-ruby: Fix deadlock in GPU_VIPER when issuing atomic requests When a compute unit issues several requests to the same line, the requests wait in the L2 if it is a writeback cache. If the line is invalid initially and the first request is atomic in nature, the L2 cache issues a request to main memory. On data return, the cache line transitions to M but doesn't wake up the other requests, resulting in a deadlock. This commit adds a wakeup call on data return for atomics and fixes potential deadlocks. Change-Id: I8200ce6e77da7c8b4db285c0cc8b8ca0dfa7d720	2024-06-11 20:33:46 -05:00
Matthew Poremba	1d64669473	mem,gpu-compute: Implement GPU TCC directed invalidate The GPU device currently supports large BAR which means that the driver can write directly to GPU memory over the PCI bus without using SDMA or PM4 packets. The gem5 PCI interface only provides an atomic interface for BAR reads/writes, which means the values cannot go through timing mode Ruby caches. This causes bugs as the TCC cache is allowed to keep clean data between kernels for performance reasons. If there is a BAR write directly to memory bypassing the cache, the value in the cache is stale and must be invalidated. In this commit a TCC invalidate is generated for all writes over PCI that go directly to GPU memory. This will also invalidate TCP along the way if necessary. This currently relies on the driver synchonization which only allows BAR writes in between kernels. Therefore, the cache should only be in I or V state. To handle a race condition between invalidates and launching the next kernel, the invalidates return a response and the GPU command processor will wait for all TCC invalidates to be complete before launching the next kernel. This fixes issues with stale data in nanoGPT and possibly PENNANT. Change-Id: I8e1290f842122682c271e5508a48037055bfbcdf	2024-04-10 11:35:25 -07:00
Matt Sinclair	777ac91bb0	mem-ruby: Add categorization of bypassed atomics in TCC (#899 ) Adds categorization of bypassed atomics in TCC to the TBE as either return or no-return, which gets consumed in pa_performAtomic to determine if atomic logs should be stored. Reestablishes TCC bypassed atomics after #546. Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38	2024-02-28 14:26:09 -06:00
Daniel Kouchekinia	de615836f0	mem-ruby: Add categorization of bypassed atomics in TCC Adds categorization of bypassed atomics in TCC to the TBE as either return or no-return, which gets consumed in pa_performAtomic to determine if atomic logs should be stored. Reestablishes TCC bypassed atomics after #546. Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38	2024-02-27 23:12:45 -06:00
Daniel Kouchekinia	6374697a20	mem-ruby: Add missing transition for SLC writes to VIPER TCC Bypassed write though requests on invalid lines in the TCC should be written though to the directory. This transition was previously missing. Change-Id: I16b117c4e085ce6be0ed5297aa0129d52cd35a51	2024-02-26 13:13:06 -06:00
Matthew Poremba	6e433ed885	mem-ruby: Fixes for new AtomicWait event in VIPER TCC (#585 ) The AtomicWait event was not being woken up properly due to the numPending count in the TBE not being decremented. This patch decrements the count when Data is returned. Since that moves to a base state, the TBE should no longer be needed. Additionally added a transition which stalls and wait when an AtomicWait occurs while in WI state so that it retries. Change-Id: Ic8bfc700f9df3f95bea0799121898926a23d8163	2023-11-22 14:05:43 -08:00
Matt Sinclair	c3326c78e6	mem-ruby, gpu-compute: fix SQC/TCP requests to same line Currently, the GPU SQC (L1I$) and TCP (L1D$) have a performance bug where they do not behave correctly when multiple requests to the same cache line overlap one another. The intended behavior is that if the first request that arrives at the Ruby code for the SQC/TCP misses, it should send a request to the GPU TCC (L2$). If any requests to the same cache line occur while this first request is pending, they should wait locally at the L1 in the MSHRs (TBEs) until the first request has returned. At that point they can be serviced, and assuming the line has not been evicted, they should hit. For example, in the following test (on 1 GPU thread, in 1 WG): load Arr[0] load Arr[1] load Arr[2] The expected behavior (confirmed via profiling on real GPUs) is that we should get 1 miss (Arr[0]) and 2 hits (Arr[1], Arr[2]) for such a program. However, the current support in the VIPER SQC/TCP code does not model this correctly. Instead it lets all 3 concurrent requests go straight through to the TCC instead of stopping the Arr[1] and Arr[2] requests locally while Arr[0] is serviced. This causes all 3 requests to be classified as misses. To resolve this, this patch adds support into the SQC/TCP code to prevent subsequent, concurrent requests to a pending cache line from being sent in parallel with the original one. To do this, we add an additional transient state (IV) to indicate that a load is pending to this cache line. If a subsequent request of any kind to the same cache line occurs while this load is pending, the requests are put on the local wait buffer and woken up when the first request returns to the SQC/TCP. Likewise, when the first load is returned to the SQC/TCP, it transitions from IV --> V. As part of this support, additional transitions were also added to account for corner cases such as what happens when the line is evicted by another request that maps to the same set index while the first load is pending (the line is immediately given to the new request, and when the load returns it completes, wakes up any pending requests to the same line, but does not attempt to change the state of the line) and how GPU bypassing loads and stores should interact with the pending requests (they are forced to wait if they reach the L1 after the pending, non-bypassing load; but if they reach the L1 before the non-bypassing load then they make sure not to change the state of the line from IV if they return before the non-bypassing load). As part of this change, we also move the MSHR behavior from internally in the GPUCoalescer for loads to the Ruby code (like all other requests). This is important to get correct hits and misses in stats and other prints, since the GPUCoalescer MSHR behavior assumes all requests serviced out of its MSHR also miss if the original request to that line missed. Although the SQC does not support stores, the TCP does. Thus, we could have applied a similar change to the GPU stores at the TCP. However, since the TCP support assumes write-through caches and does not attempt to allocate space in the TCP, we elected not to add this support since it seems to run contrary to the intended behavior (i.e., the intended behavior seems to be that writes just bypass the TCP and thus should not need to wait for another write to the same cache line to complete). Additionally, making these changes introduced issues with deadlocks at the TCC. Specifically, some Pannotia applications have accesses to the same cache line where some of the accesses are GLC (i.e., they bypass the GPU L1 cache) and others are non-GLC (i.e., they want to be cached in the GPU L1 cache). We have support already per CU in the above code. However, the problem here is that these requests are coming from different CUs and happening concurrently (seemingly because different WGs are at different points in the kernel around the same time). This causes a problem because our support at the TCC for the TBEs overwrites the information about the GPU bypassing bits (SLC, GLC) every time. The problem is when the second (non-GLC) load reaches the TCC, it overwrites the SLC/GLC information for the first (GLC) load. Thus, when the the first load returns from the directory/memory, it no longer has the GLC bit set, which causes an assert failure at the TCP. After talking with other developers, it was decided the best way handle this and attempt to model real hardware more closely was to move the point at which requests are put to sleep on the wakeup buffer from the TCC to the directory. Accordingly, this patch includes support for that -- now when multiple loads (bypassing or non-bypassing) from different CUs reach the directory, all but the first one will be forced to wait there until the first one completes, then will be woken up and performed. This required updating the WTRequestor information at the TCC to pass the information about what CU performed the original request for loads as well (otherwise since the TBE can be updated by multiple pending loads, we can't tell where to send the final result to). Thus, I changed the field to be named CURequestor instead of WTRequestor since it is now used for more than stores. Moreover, I also updated the directory to take this new field and the GLC information from incoming TCC requests and then pass that information back to the TCC on the response -- without doing this, because the TBE can be updated by multiple pending, concurrent requests we cannot determine if this memory request was a bypassing or non-bypassing request. Finally, these changes introduced a lot of additional contention and protocol stalls at the directory, so this patch converted all directory uses of z_stall to instead put requests on the wakeup buffer (and wake them up when the current request completes) instead. Without this, protocol stalls cause many applications to deadlock at the directory. However, this exposed another issue at the TCC: other applications (e.g., HACC) have a mix of atomics and non-atomics to the same cache line in the same kernel. Since the TCC transitions to the A state when an atomic arrives. For example, after the first pending load returns to the TCC from the directory, which causes the TCC state to become V, but when there are still other pending loads at the TCC. This causes invalid transition errors at the TCC when those pending loads return, because the A state thinks they are atomics and decrements the pending atomic count (plus the loads are never sent to the TCP as returning loads). This patch fixes this by changing the TCC TBEs to model the number of pending requests, and not allowing atomics to be issued from the TCC until all prior, pending non-atomic requests have returned. Change-Id: I37f8bda9f8277f2355bca5ef3610f6b63ce93563	2023-11-15 19:23:51 -06:00
BujSet	65b44e6516	mem-ruby: Fix for not creating log entries on atomic no return requests (#546 ) Augmenting Datablock and WriteMask to support optional arg to distinguish between return and no return. In the case of atomic no return requests, log entries should not be created when performing the atomic. Change-Id: Ic3112834742f4058a7aa155d25ccc4c014b60199a	2023-11-14 07:54:42 -08:00
Daniel Kouchekinia	be5c03ea9f	mem-ruby,configs: Add GPU GLC Atomic Resource Constraints (#120 ) Added a resource constraint, AtomicALUOperation, to GLC atomics performed in the TCC. The resource constraint uses a new class, ALUFreeList array. The class assumes the following: - There are a fixed number of atomic ALU pipelines - While a new cache line can be processed in each pipeline each cycle, if a cache line is currently going through a pipeline, it can't be processed again until it's finished Two configuration parameters have been used to tune this behavior: - tcc-num-atomic-alus corresponds to the number of atomic ALU pipelines - atomic-alu-latency corresponds to the latency of atomic ALU pipelines Change-Id: I25bdde7dafc3877590bb6536efdf57b8c540a939	2023-11-14 07:48:48 -08:00
Matt Sinclair	48fde5a9c6	mem-ruby, gpu-compute: fix formatting of TCC (#536 ) mem-ruby, gpu-compute: fix formatting of TCC Fix several not properly indented prints and extraneous extra lines in the SLICC code for the GPU TCC (L2 cache).	2023-11-13 15:01:30 -08:00
Daniel Kouchekinia	1204267fd8	mem-ruby: SLICC Fixes to GLC Atomics in WB L2 (#397 ) Made the following changes to fix the behavior of GLC atomics in a WB L2: - Stored atomic write mask in TBE For GLC atomics on an invalid line that bypass to the directory, but have their atomics performed on the return path. - Replaced !presentOrAvail() check for bypassing atomics to directory (which will then be performed on return path), with check for invalid line state. - Replaced wdb_writeDirtyBytes action used when performing atomics with owm_orWriteMask action that doesn't write from invalid atomic request data block - Fixed atomic return path actions Change-Id: I6a406c313d2f9c88cd75bfe39187ef94ce84098f	2023-11-09 13:15:10 -08:00
Daniel Kouchekinia	4931fb0010	mem-ruby: Always pass on GPU atomics to dir in write-through TCC (#367 ) Added checks to ensure that atomics are not performed in the TCC when it is configured as a write-through cache. Also added SLC bit overwrite to ensure directory preforms atomics when there is a write-through TCC. Change-Id: I4514e6c8022aeb7785f2c59871cd9acec8161ed8	2023-10-14 06:39:50 -07:00
Vishnu Ramadas	085789d00c	mem-ruby: Add flush support to GPU_VIPER protocol This commit adds flush support to the GPU VIPER coherence protocol. The L1 cache will now initiate a flush request if the packet it receives is of type RubyRequestType_FLUSH. During the flush process, the L1 cache will a request to L2 if its in either V or I state. L2 will issue a flush request to the directory if its cache line is in the valid state before invalidating its copy. The directory, on receiving this request, writes data to memory and sends an ack back to the L2. L2 forwards this ack back to the L1, which then ends the flush by calling the write callback Change-Id: I9dfc0c7b71a1e9f6d5e9e6ed4977c1e6a3b5ba46	2023-10-02 19:05:10 -05:00
Ranganath (Bujji) Selagamsetty	f6a453362f	mem: Atomic ops to same address Augmenting the DataBlock class with a change log structure to record the effects of atomic operations on a data block and service these changes if the atomic operations require return values. Although the operations are atomic, the coalescer need not send unique memory requests for each operation. Atomic operations within a wavefront to the same address are now coalesced into a single memory request. The response of this request carries all the necessary information to provide the requesting lanes unique values as a result of their individual atomic operations. This helps reduce contention for request and response queues in simulation. Previously, only the final value of the datablock after all atomic ops to the same address was visible to the requesting waves. This change corrects this behavior by allowing each wave to see the effect of this individual atomic op is a return value is necessary. Change-Id: I639bea943afd317e45f8fa3bff7689f6b8df9395	2023-08-23 14:45:25 -05:00
Daniel Kouchekinia	984499329d	mem-ruby,configs: Add GLC Atomic Latency VIPER Parameter (#110 ) Added a GLC atomic latency parameter (glc-atomic-latency) used when enqueueing response messages regarding atomics directly performed in the TCC. This latency is added in addition to the L2 response latency (TCC_latency). This represents the latency of performing an atomic within the L2. With this change, the TCC response queue will receive enqueues with varying latencies as GLC atomic responses will have this added GLC atomic latency while data responses will not. To accommodate this in light of the queue having strict FIFO ordering (which would be violated here), this change also adds an optional parameter bypassStrictFIFO to the SLICC enqueue function which allows overriding strict FIFO requirements for individual messages on a case-by-case basis. This parameter is only being used in the TCC's atomic response enqueue call. Change-Id: Iabd52cbd2c0cc385c1fb3fe7bcd0cc64bdb40aac	2023-07-23 15:57:06 -05:00
Daniel Kouchekinia	1705853b12	mem-ruby: Added support for non-system-scope atomics in VIPER (#101 ) Added support for performing non-SLC-set atomics in the TCC. Previously, all atomics were being passed on by the TCC to the directory. With this change, atomics will only be passed on if the SLC bit is set or if the line isn't present or available in the TCC. If a non-SLC atomic is passed on to the directory because it is not present in the TCC, the atomic will be performed on the return path on the Data event. To accommodate the directory not performing the atomic in this case, this change also passes the SLC bit on to the directory. The previously-named "Atomic" action has been renamed to "AtomicPassOn", with the new "Atomic" corresponding to an atomic performed directly in the TCC. Change-Id: Ibf92f71ddceb38bd1b0da70b0a786cc4c3cf2669	2023-07-20 11:48:08 -05:00
Daniel Kouchekinia	f8f5dd98bf	mem-ruby: Added WIB State to VIPER TCC Cache (#67 ) Added WIB (Waiting on Writethrough Ack; Will be Bypassed) state which is transitioned to when a dirty line in the TCC is evicted in a bypassed read. Previously, we were transitioning to invalid. While a WI (Waiting on Writethrough Ack) state exists, transitions from it on WBAck deallocates the TBE, which contains SLC bit information needed to trigger the Bypass event when the read response from the directory comes in. Without this change, WB acknowledgements from the directory in read bypass evicts (with the SLC bit set) were being treated as if they were read responses, leading to an invalid transition panic. Change-Id: I703c3fe8af0366856552bb677810cb1a8f2896de	2023-07-17 10:17:47 -07:00
Matt Sinclair	a030ff2745	mem-ruby: fix atomic deadlock with WB GPU L2 caches By default the GPU VIPER coherence protocol uses a WT L2 cache. However it has support for using WB caches (although this is not tested currently). When using a WB L2 cache for the GPU, this results in deadlocks with atomics. Specifically, when an atomic reaches the L2 and the line is currently in M or W, the line must be written back before the atomic can be performed. However, the current support has two issues: a) it never performs the atomic operation -- while VIPER current assumes all atomics are system scope atomics and thus cannot be performed at the L2 and this transition requires the dirty line be written back before performing the atomic, the transition never performs the atomic nor does the response path handle it. b) putting the atomic action right after the write back is not safe because we need to ensure the requests are ordered when they reach memory -- thus we have to wait until the write back is acknowledged before it's safe to send/perform the atomic. To fix this, this change modifies the transition in question to put the atomic on the stalled requests buffer, which the WBAck will check when it returns to the L2 (and thus perform the atomic, which will result in the atomic being sent on to the directory). This fix has been tested and verified with both the per-checkin and nightly GPU Ruby Random tester tests (with a WB L2 cache). Change-Id: I9a43fd985dc71297521f4b05c47288d92c314ac7 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/68978 Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-03-22 04:00:38 +00:00
Matt Sinclair	92d920f994	mem-ruby: fix load deadlock with WB GPU L2 caches By default the GPU VIPER coherence protocol uses a WT L2 cache. However it has support for using WB caches (although this is not tested currently). When using a WB L2 cache for the GPU, this results in deadlocks with loads. Specifically, when a load reaches the L2 and the line is currently in the W state, that line must be written back before the load can be performed. However, the current transition for this in the L2 did not attempt to retry the load when the WB completes, resulting in a deadlock. This deadlock can be replicated by running the GPU Ruby random tester as is with a WB L2 cache instead of a WT L2 cache. To fix this, this change modifies the transition in question to put the load on the stalled requests buffer, which the WBAck will check when it returns to the L2 (and thus perform the load). This fix has been tested and verified with both the per-checkin and nightly GPU Ruby Random tester tests (with a WB L2 cache). Change-Id: Ieec4f61a3070cf9976b8c3ef0cdbd0cc5a1443c6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/68977 Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2023-03-22 04:00:38 +00:00
Vishnu Ramadas	c23d7bb3ee	gpu-compute, mem-ruby: Add p_popRequestQueue to some transitions Two W->WI transitions, on events RdBlk and Atomic in the GPU L2 cache coherence protocol do not clear the request from the request queue upon completing the transition. This action is not performed in the respone path. This update adds the p_popRequestQueue action to each of these transitions to remove the stale request from the queue. Change-Id: Ia2679fe3dd702f4df2bc114f4607ba40c18d6ff1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67192 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-01-05 23:41:00 +00:00
Vishnu Ramadas	ddf43726ef	gpu-compute, mem-ruby: Update GPU cache bypassing to use TBE An earlier commit added support for GLC and SLC AMDGPU instruction modifiers. These modifiers enable cache bypassing when set. The GLC/SLC flag information was being threaded through all the way to memory and back so that appropriate actions could be taken upon receiving a request and corresponding response. This commit removes the threading and adds the bypass flag information to TBE. Requests populate this entry and responses access it to determine the correct set of actions to execute. Change-Id: I20ffa6682d109270adb921de078cfd47fb4e137c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67191 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>	2023-01-05 23:38:32 +00:00
Vishnu Ramadas	66d4a15820	gpu-compute,mem-ruby: Add support for GPU cache bypassing The GPU cache models do not support cache bypassing when the GLC or SLC AMDGPU instruction modifiers are used in a load or store. This commit adds cache bypass support by introducing new transitions in the coherence protocol used by the GPU memory system. Now, instructions with the GLC bit set will not cache in the L1 and instructions with SLC bit set will not cache in L1 or L2. Change-Id: Id29a47b0fa7e16a21a7718949db802f85e9897c3 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66991 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-01-03 21:19:24 +00:00
Kyle Roarty	f876e60bc2	mem-ruby: Fix deadlock in GPU VIPER TCC A deadlock occured where we got a RdBlk while in W, which put us in WI while we wait for a writeback to complete. This would cause the request to be stalled while the writeback was occuring, but when the writeback completed (WBAck), we never woke up the requests and thus never completed the RdBlk. This commit adds a wakeup when we receive a WBAck while in WI. Change-Id: I01edf1d7a47757b4f680baf9f33a1a6aa37e7e25 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/59352 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-06-06 18:28:52 +00:00
Matthew Poremba	9313294efe	misc: Remove AMD license addition Remove the line "For use for simulation and test purposes only" in files were AMD is the only copyright holder listed in the header. This happens to be the case for all files where this line exists, removing it completely from gem5. Change-Id: I623f266b002f564301b28774f49081099cfc60fd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53943 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-12-11 04:00:56 +00:00
Matt Sinclair	118677218d	mem-ruby: fix typo in GPU VIPER TCC comment `72ee6d1a` fixed a deadlock in the GPU VIPER TCC. However, it inadvertently added a typo to the comments explaining the change. This commit fixes that. Change-Id: Ibba835aa907be33fc3dd8e576ad2901d5f8f509c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51687 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-10-17 04:07:49 +00:00
Matt Sinclair	1120931105	mem-ruby: Move VIPER TCC decrements to action from in_port Currently, the GPU VIPER TCC protocol handles races between atomics in the triggerQueue_in. This in_port does not check for resource availability, which can cause the trigger queue to execute multiple times. Although this is the expected behavior, the code for handling atomic races decrements the atomicDoneCnt flag in the trigger queue, which is not safe since resource contention may cause it to execute multiple times. To resolve this issue, this commit moves the decrementing of this counter to a new action that is called in an event that happens only when the race between atomics is detected. Change-Id: I552fd4f34fdd9ebeec99fb7aeb4eeb7b150f577f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51368 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-10-08 22:03:13 +00:00
Matt Sinclair	72ee6d1aad	mem-ruby: Update GPU VIPER TCC protocol to resolve deadlock In the GPU VIPER TCC, programs with mixes of atomics and data accesses to the same address, in the same kernel, can experience deadlock when large applications (e.g., Pannotia's graph analytics algorithms) are running on very small GPUs (e.g., the default 4 CU GPU configuration). In this situation, deadlocks occur due to resource stalls interacting with the behavior of the current implementation for handling races between atomic accesses. The specific order of events causing this deadlock are: 1. TCC is waiting on an atomic to return from directory 2. In the meantime it receives another atomic to the same address -- when this happens, the TCC increments number of atomics to this address (numAtomics = 2) that are pending in TBE, and does a write through of the atomic to the directory. 3. When the first atomic returns from the Directory, it decrements the numAtomics counter. numAtomics was at 2 though, because of step #2. So it doesn't deallocate the TBE entry and calls Event:AtomicNotDone. 4. Another request (a LD) to the same address comes along for the same address. The LD does z_stall since the second atomic is pending –- so the LD retries every cycle until the deadlock counter times out (or until the second atomic comes back). 5. The second atomic returns to the TCC. However, because there are so many LD's pending in the cache, all doing z_stall's and retrying every cycle, there are a lot of resource stalls. So, when the second atomic returns, it is forced to retry its operation multiple times -- and each time it decrements the atomicDoneCnt flag (which was added to catch a race between atomics arriving and leaving the TCC in `7246f70bfb`) repeatedly. As a result atomicDoneCnt becomes negative. 6. Since this atomicDoneCnt flag is used to determine when Event:AtomicDone happens, and since the resource stalls caused the atomicDoneCnt flag to become negative, we never complete the atomic. Which means the pending LD can never access the line, because it's stuck waiting for the atomic to complete. 7. Eventually the deadlock threshold is reached. To fix this issue, this commit changes the VIPER TCC protocol from using z_stall to using the stall_and_wait buffer method that the Directory-level of the SLICC already uses. This change effectively prevents resource stalls from dominating the TCC level, by putting pending requests for a given address in a per-address stall buffer. These requests are then woken up when the pending request returns. As part of this change, this change also makes two small changes to the Directory-level protocol (MOESI_AMD_BASE-dir): 1. Updated the names of the wakeup actions to match the TCC wakeup actions, to avoid confusion. 2. Changed transition(B, UnblockWriteThrough, U) to check all stall buffers, as some requests were being placed later in the stall buffer than was being checked. This mirrors the changes in `187c44fe44` to other Directory transitions to resolve races between GPU and DMA requests, but for transitions prior workloads did not stress. Change-Id: I60ac9830a87c125e9ac49515a7fc7731a65723c2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51367 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-10-08 22:03:13 +00:00
Matt Sinclair	0eef1069cb	ruby: fix typo in VIPER TCC triggerQueue The GPU VIPER TCC protocol accidentally used "TiggerMsg" instead of "TriggerMsg" for the triggerQueue_in port. This was a benign bug beacuse the msg type is not used in the in_port implementation but still makes the SLICC harder to understand, so fixing it is worthwhile. Change-Id: I88cbc72bac93bcc58a66f057a32f7bddf821cac9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/44905 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-04-28 01:25:45 +00:00
Matthew Poremba	7246f70bfb	mem-ruby: Fix race related to atomics in VIPER There is a race condition in VIPER where an atomic issued to the same address can occur resulting in multiple trigger messages signalling the compleition of the atomic operation. The first message was deallocating the TBE causing the second message to dereference a nullptr when looking up the TBE. A counter is added to track the number of in flight AtomicDone trigger messages. The AtomicDone is not called until the last in flight message arrives at the trigger queue. The remaining messages call AtomicNotDone which simply pops the message from the queue and keeps the TBE allocated. Change-Id: Ie1de0436861a7c393ad6d2fb2faceb83c18d4cc3 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39175 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-01-15 17:46:38 +00:00
Hoa Nguyen	4c42811ff3	mem-ruby: Move CacheMemory stats used in SLICC to a Stats group This change moves some stats that are used in SLICC to a separate Stats::Group. In order to use stats in SLICC, new functions are added in CacheMemory: - profileDemandHit() - profileDemandMiss() The functions increase the corresponding stat by 1. Change-Id: I52b6fefdf6579a49f626f2fca400641f90800017 Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37815 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Tiago Mück <tiago.muck@arm.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com>	2020-12-22 09:52:36 +00:00
Kyle Roarty	1339a1b080	mem-ruby: add cache hit/miss statistics for TCP and TCC Change-Id: Ifa6fdbb9dd062a3684b9620eac6683c57e651a72 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30174 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com> Maintainer: Bradford Beckmann <brad.beckmann@amd.com>	2020-06-20 04:20:45 +00:00
Tuan Ta	18ebe62598	mem-ruby: GCN3 and VIPER integration This patch modifies the Coalescer and VIPER protocol to support memory synchronization requests and write-completion responses that are required by upcoming GCN3 implementation. VIPER protocol is simplified to be a solely write-through protocol. Change-Id: Iccfa3d749a0301172a1cc567c59609bb548dace6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29913 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Bradford Beckmann <brad.beckmann@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-19 20:32:54 +00:00
Gabe Black	c08351f4d3	mem: Move ruby protocols into a directory called ruby_protocol. Now that the gem5 protocols are split out, it would be nice to put them in their own protocol directory. It's also confusing to have files called *_protocol which are not in the protocol directory. Change-Id: I7475ee111630050a2421816dfd290921baab9f71 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20230 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2019-08-23 21:13:07 +00:00

39 Commits