derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Bobby R. Bruce	d4b7c8a26d	Merge branch 'develop' into develop-kconfig	2023-11-27 09:39:08 -08:00
Bobby R. Bruce	0b2c56ef66	mem-cache: Revert "Prefetchers Improvements" (#581 ) Reverts gem5/gem5#564 to fix #580. Discussion in #581 showed there may be a fix to this but reverting for now until a better solution is found.	2023-11-26 18:43:21 -08:00
Gabe Black	db3a6e8e84	scons: Use Kconfig to configure gem5. These are not yet consumed by anything, but convert all the settings from SCons variables to Kconfig variables. If you have existing SConsopts files which need to be converted, you should take a look at KCONFIG.md to learn about how kconfig is used in gem5. You should decide if any variables need to be available to C++ or kconfig itself, and whether those are options which should be detected automatically, or should be up to the user. Options which should be measured automatically should still be in SConsopts files, while user facing options should be added to new or existing Kconfig files. Generally, make sure you're storing c++/kconfig visible options in env['CONF'][...]. Also remove references to sticky_vars since persistent options should now be handled with kconfig, and export_vars since everything in env['CONF'] is now exported automatically. Switch SCons/gem5 to use Kconfig for configuration, except EXTRAS which is still a sticky SCons variable. This is necessary because EXTRAS also controls what config options exist. If it came from Kconfig itself, then there would be a circular dependency. This dependency could theoretically be handled by reparsing the Kconfig when EXTRAS directories were added or removed, but that would be complicated, and isn't supported by kconfiglib. It wouldn't be worth the significant effort it would take to add it, just to use Kconfig more purely. Change-Id: I29ab1940b2d7b0e6635a490452d05befe5b4a2c9	2023-11-23 08:26:10 +08:00
Matthew Poremba	6e433ed885	mem-ruby: Fixes for new AtomicWait event in VIPER TCC (#585 ) The AtomicWait event was not being woken up properly due to the numPending count in the TBE not being decremented. This patch decrements the count when Data is returned. Since that moves to a base state, the TBE should no longer be needed. Additionally added a transition which stalls and wait when an AtomicWait occurs while in WI state so that it retries. Change-Id: Ic8bfc700f9df3f95bea0799121898926a23d8163	2023-11-22 14:05:43 -08:00
Hoa Nguyen	3009e0fb57	mem-ruby: Fix typo in CHI's Send_CompI (#579 ) The destination for the response is set twice.	2023-11-20 21:38:13 -08:00
Bobby R. Bruce	f26867a075	mem-cache: Revert "Prefetchers Improvements" Reverts PR https://github.com/gem5/gem5/pull/564 Reverts commits: * `047a494c2b` * `2abd65c270` * `38045d7a25` * `6416304e07` * `8598764a03` Change-Id: Id523acc1778c3f827637302a6465f5a9e539d6b5	2023-11-20 19:49:04 -08:00
Jason Lowe-Power	db6a869786	mem-cache: Prefetchers Improvements (#564 ) This pull request contains a set of small patches which fix some bugs in the gem5 prefetchers, and aligns out-of-the box prefetcher performance more closely with that which a typical user would expect. The performance patches have been tested with an out-of-the-box (untuned) Stride prefetcher configuration against a set of SPEC 2017 SimPoints, and show a modest IPC uplift across the board, with no IPC degradation. The new defaults were identified as part of work on gem5 prefetchers undertaken by Nikolaos Kyparissas while on internship at Arm.	2023-11-16 15:22:26 -08:00
Giacomo Travaglini	4ca2efac16	mem-ruby: AtomicNoReturn should check comp_anr instead of comp_wu (#545 ) The comp_anr parameter is currently unused. Both parameters (comp_wu and comp_anr) are set to false by default Change-Id: If09567504540dbee082191d46fcd53f1363d819f Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-11-16 15:20:51 -08:00
Giacomo Travaglini	047a494c2b	mem-cache: Optimize strided prefetcher address generation This commit optimizes the address generation logic in the strided prefetcher by introducing the following changes (d is the degree of the prefetcher) * Evaluate the fixed prefetch_stride only once (and not d-times) * Replace 2d multiplications (d * prefetch_stride and distance * prefetch_stride) with additions by updating the new base prefetch address while looping Change-Id: I49c52333fc4c7071ac3d73443f2ae07bfcd5b8e4 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Richard Cooper <richard.cooper@arm.com> Reviewed-by: Tiberiu Bucur <tiberiu.bucur@arm.com>	2023-11-16 09:48:15 +00:00
Nikolaos Kyparissas	2abd65c270	mem: added distance parameter to stride prefetcher The Stride Prefetcher will skip this number of strides ahead of the first identified prefetch, then generate `degree` prefetches at `stride` intervals. A value of zero indicates no skip (i.e. start prefetching from the next identified prefetch address). This parameter can be used to increase the timeliness of prefetches by starting to prefetch far enough ahead of the demand stream to cover the memory system latency. [Richard Cooper <richard.cooper@arm.com>: - Added detail to commit comment and `distance` Param documentation. - Changed `distance` Param from `Param.Int` to `Param.Unsigned`. ] Change-Id: I6c4e744079b53a7b804d8eab93b0f07b566f0c08 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Signed-off-by: Richard Cooper <richard.cooper@arm.com>	2023-11-16 09:48:09 +00:00
Matt Sinclair	c3326c78e6	mem-ruby, gpu-compute: fix SQC/TCP requests to same line Currently, the GPU SQC (L1I$) and TCP (L1D$) have a performance bug where they do not behave correctly when multiple requests to the same cache line overlap one another. The intended behavior is that if the first request that arrives at the Ruby code for the SQC/TCP misses, it should send a request to the GPU TCC (L2$). If any requests to the same cache line occur while this first request is pending, they should wait locally at the L1 in the MSHRs (TBEs) until the first request has returned. At that point they can be serviced, and assuming the line has not been evicted, they should hit. For example, in the following test (on 1 GPU thread, in 1 WG): load Arr[0] load Arr[1] load Arr[2] The expected behavior (confirmed via profiling on real GPUs) is that we should get 1 miss (Arr[0]) and 2 hits (Arr[1], Arr[2]) for such a program. However, the current support in the VIPER SQC/TCP code does not model this correctly. Instead it lets all 3 concurrent requests go straight through to the TCC instead of stopping the Arr[1] and Arr[2] requests locally while Arr[0] is serviced. This causes all 3 requests to be classified as misses. To resolve this, this patch adds support into the SQC/TCP code to prevent subsequent, concurrent requests to a pending cache line from being sent in parallel with the original one. To do this, we add an additional transient state (IV) to indicate that a load is pending to this cache line. If a subsequent request of any kind to the same cache line occurs while this load is pending, the requests are put on the local wait buffer and woken up when the first request returns to the SQC/TCP. Likewise, when the first load is returned to the SQC/TCP, it transitions from IV --> V. As part of this support, additional transitions were also added to account for corner cases such as what happens when the line is evicted by another request that maps to the same set index while the first load is pending (the line is immediately given to the new request, and when the load returns it completes, wakes up any pending requests to the same line, but does not attempt to change the state of the line) and how GPU bypassing loads and stores should interact with the pending requests (they are forced to wait if they reach the L1 after the pending, non-bypassing load; but if they reach the L1 before the non-bypassing load then they make sure not to change the state of the line from IV if they return before the non-bypassing load). As part of this change, we also move the MSHR behavior from internally in the GPUCoalescer for loads to the Ruby code (like all other requests). This is important to get correct hits and misses in stats and other prints, since the GPUCoalescer MSHR behavior assumes all requests serviced out of its MSHR also miss if the original request to that line missed. Although the SQC does not support stores, the TCP does. Thus, we could have applied a similar change to the GPU stores at the TCP. However, since the TCP support assumes write-through caches and does not attempt to allocate space in the TCP, we elected not to add this support since it seems to run contrary to the intended behavior (i.e., the intended behavior seems to be that writes just bypass the TCP and thus should not need to wait for another write to the same cache line to complete). Additionally, making these changes introduced issues with deadlocks at the TCC. Specifically, some Pannotia applications have accesses to the same cache line where some of the accesses are GLC (i.e., they bypass the GPU L1 cache) and others are non-GLC (i.e., they want to be cached in the GPU L1 cache). We have support already per CU in the above code. However, the problem here is that these requests are coming from different CUs and happening concurrently (seemingly because different WGs are at different points in the kernel around the same time). This causes a problem because our support at the TCC for the TBEs overwrites the information about the GPU bypassing bits (SLC, GLC) every time. The problem is when the second (non-GLC) load reaches the TCC, it overwrites the SLC/GLC information for the first (GLC) load. Thus, when the the first load returns from the directory/memory, it no longer has the GLC bit set, which causes an assert failure at the TCP. After talking with other developers, it was decided the best way handle this and attempt to model real hardware more closely was to move the point at which requests are put to sleep on the wakeup buffer from the TCC to the directory. Accordingly, this patch includes support for that -- now when multiple loads (bypassing or non-bypassing) from different CUs reach the directory, all but the first one will be forced to wait there until the first one completes, then will be woken up and performed. This required updating the WTRequestor information at the TCC to pass the information about what CU performed the original request for loads as well (otherwise since the TBE can be updated by multiple pending loads, we can't tell where to send the final result to). Thus, I changed the field to be named CURequestor instead of WTRequestor since it is now used for more than stores. Moreover, I also updated the directory to take this new field and the GLC information from incoming TCC requests and then pass that information back to the TCC on the response -- without doing this, because the TBE can be updated by multiple pending, concurrent requests we cannot determine if this memory request was a bypassing or non-bypassing request. Finally, these changes introduced a lot of additional contention and protocol stalls at the directory, so this patch converted all directory uses of z_stall to instead put requests on the wakeup buffer (and wake them up when the current request completes) instead. Without this, protocol stalls cause many applications to deadlock at the directory. However, this exposed another issue at the TCC: other applications (e.g., HACC) have a mix of atomics and non-atomics to the same cache line in the same kernel. Since the TCC transitions to the A state when an atomic arrives. For example, after the first pending load returns to the TCC from the directory, which causes the TCC state to become V, but when there are still other pending loads at the TCC. This causes invalid transition errors at the TCC when those pending loads return, because the A state thinks they are atomics and decrements the pending atomic count (plus the loads are never sent to the TCP as returning loads). This patch fixes this by changing the TCC TBEs to model the number of pending requests, and not allowing atomics to be issued from the TCC until all prior, pending non-atomic requests have returned. Change-Id: I37f8bda9f8277f2355bca5ef3610f6b63ce93563	2023-11-15 19:23:51 -06:00
Matt Sinclair	065ddf759f	mem-ruby, gpu-compute: fix bug with GPU bypassing loads The current GPU TCP (L1D$) Ruby SLICC code had a bug where a GPU load that wants to bypass the L1D$ (e.g., GLC or SLC bit was set) but the line is in Invalid when that request arrives, results in a non-bypassing load being sent to the GPU TCC (L2$) instead of a bypassing load. This issue was not caught by currently nightly or weekly tests, because the tests do not test for correctness in terms of hits and misses in the caches. However, tests for these corner cases expose this issue. To fix, this, this patch removes the check that the entry is valid when deciding what to do with a bypassing GPU load -- since the TCP Ruby code has transitions for bypassing loads in both I and V, we can simply call the LoadBypassEvict event in both cases and the appropriate transition will handle the bypassing load given the cache line's current state in the TCP. Change-Id: Ia224cefdf56b4318b2bcbd0bed995fc8d3b62a14	2023-11-15 19:23:51 -06:00
BujSet	4a5ec70e08	gpu-compute: Minor edits for atomic no returns and stores (#565 ) Since returned data is not needed for AtomicNoReturn and Store memory requests, the coalescer need not spend time writing in dummy data for packets of these types. Change-Id: Ie669e8c2a3bf44b5b0c290f62c49c5d4876a9a6a	2023-11-15 07:20:07 -08:00
Derek Christ	e95cab429f	configs,ext,stdlib: Update DRAMSys integration (#525 ) Recent breaking changes in the DRAMSys API require user code to be updated. These updates have been applied to the gem5 integration. Furthermore, as DRAMSys started to use CMake dependency management, it is no longer sensible to maintain two separate build systems for DRAMSys. The use of the DRAMSys integration in gem5 will therefore from now on require that CMake is installed on the target machine. Additionally, support for snapshots have been implemented into DRAMSys and coupled with gem5's checkpointing API.	2023-11-14 08:05:11 -08:00
BujSet	65b44e6516	mem-ruby: Fix for not creating log entries on atomic no return requests (#546 ) Augmenting Datablock and WriteMask to support optional arg to distinguish between return and no return. In the case of atomic no return requests, log entries should not be created when performing the atomic. Change-Id: Ic3112834742f4058a7aa155d25ccc4c014b60199a	2023-11-14 07:54:42 -08:00
Daniel Kouchekinia	be5c03ea9f	mem-ruby,configs: Add GPU GLC Atomic Resource Constraints (#120 ) Added a resource constraint, AtomicALUOperation, to GLC atomics performed in the TCC. The resource constraint uses a new class, ALUFreeList array. The class assumes the following: - There are a fixed number of atomic ALU pipelines - While a new cache line can be processed in each pipeline each cycle, if a cache line is currently going through a pipeline, it can't be processed again until it's finished Two configuration parameters have been used to tune this behavior: - tcc-num-atomic-alus corresponds to the number of atomic ALU pipelines - atomic-alu-latency corresponds to the latency of atomic ALU pipelines Change-Id: I25bdde7dafc3877590bb6536efdf57b8c540a939	2023-11-14 07:48:48 -08:00
Nikolaos Kyparissas	38045d7a25	mem-cache: Added clean eviction check for prefetchers. pkt->req->isCacheMaintenance() would not include a check for clean eviction before notifying the prefetcher, causing gem5 to crash. Change-Id: I1d082a87a3908b1ed46c5d632d45d8b09950b382 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-11-14 15:20:52 +00:00
Richard Cooper	6416304e07	mem-cache: Update default prefetch options. Update the default prefetch options to achieve out-of-the box prefetcher performance closer to that which a typical user would expect. Configurations that set these parameters explicitly will be unaffected. The new defaults were identified as part of work on gem5 prefetchers undertaken by Nikolaos Kyparissas while on internship at Arm. Change-Id: Id63868c7c8f00ee15a0b09a6550780a45ae67e55 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-11-14 15:20:52 +00:00
Richard Cooper	8598764a03	mem-cache: Squash prefetch queue entries by block address. Prefetch queue entries were being squashed by comparing the address of each queued prefetch against the block address of the demand access. Only prefetches that happen to fall on a cache-line block boundary would be squashed. This patch converts the prefetch addresses to block addresses before comparison. Change-Id: I55ecb4919e94ad314b91c7795bba257c550b1528 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-11-14 15:20:52 +00:00
Matt Sinclair	48fde5a9c6	mem-ruby, gpu-compute: fix formatting of TCC (#536 ) mem-ruby, gpu-compute: fix formatting of TCC Fix several not properly indented prints and extraneous extra lines in the SLICC code for the GPU TCC (L2 cache).	2023-11-13 15:01:30 -08:00
Matt Sinclair	7d0a1fb284	mem-ruby, gpu-compute: fix typo in GPU coalescer deadlock print (#535 ) mem-ruby, gpu-compute: fix typo in GPU coalescer deadlock print The GPU Coalescer's deadlock print did not previously print a newline at the end of each deadlock, which caused confusion when there were multiple deadlocks as each deadlock print would appear to go with the address after it. This patch fixes this issue.	2023-11-13 15:01:01 -08:00
Matt Sinclair	f312804364	mem-ruby: fix hex print in CacheMemory (#561 ) Update print in CacheMemory about clearing the lock to properly print in hex.	2023-11-13 14:34:33 -08:00
Matt Sinclair	3642bc4892	mem-ruby, gpu-compute: fix GPU SQC/TCP Ruby formatting (#538 ) mem-ruby, gpu-compute: fix GPU SQC/TCP Ruby formatting Fix several not properly indented prints and extraneous extra lines in the SLICC code for the GPU SQC (L1I$) and TCP (L1D$).	2023-11-13 14:20:54 -08:00
Matt Sinclair	f61d709321	mem-ruby: update RubyRequest print to include GPU fields (#537 ) mem-ruby: update RubyRequest print to include GPU fields The print function used for RubyRequests did not include the GPU specific fields (for the GLC and SLC bits, which are cache modifiers that specify what level of the memory hierarchy a request should be performed at). This causes confusion when the GPU Ruby SLICC code prints out RubyRequest messages, since important fields are missing. Thus this commit adds that support. Since these fields are already part of the RubyRequest class, and are always 0 for non-GPU requests, it should not affect other components beyond slightly longer prints. Change-Id: I31c9122b82dfa2c6415ce25d225ea82cb35c7333	2023-11-13 01:12:25 -06:00
Daniel Kouchekinia	1204267fd8	mem-ruby: SLICC Fixes to GLC Atomics in WB L2 (#397 ) Made the following changes to fix the behavior of GLC atomics in a WB L2: - Stored atomic write mask in TBE For GLC atomics on an invalid line that bypass to the directory, but have their atomics performed on the return path. - Replaced !presentOrAvail() check for bypassing atomics to directory (which will then be performed on return path), with check for invalid line state. - Replaced wdb_writeDirtyBytes action used when performing atomics with owm_orWriteMask action that doesn't write from invalid atomic request data block - Fixed atomic return path actions Change-Id: I6a406c313d2f9c88cd75bfe39187ef94ce84098f	2023-11-09 13:15:10 -08:00
Matt Sinclair	86131d4323	mem-ruby, gpu-compute: update GPU L1I$ MRU info (#530 ) Previously the GPU L1 I$ (SQC) was not updating the MRU information on hits in the SQC. This commit resolves that by adding support to the appropriate Ruby transition.	2023-11-08 10:13:15 -08:00
Daniel Carvalho	10374f2f05	Fix calculation of compressed size in bytes (#534 ) An integer division in the compression:Base:getSize() was being done, which led to rounding down instead of up. Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>	2023-11-07 08:58:32 -08:00
Matt Sinclair	76279fef59	mem-ruby: update RubyRequest print to include GPU fields The print function used for RubyRequests did not include the GPU specific fields (for the GLC and SLC bits, which are cache modifiers that specify what level of the memory hierarchy a request should be performed at). This causes confusion when the GPU Ruby SLICC code prints out RubyRequest messages, since important fields are missing. Thus this commit adds that support. Since these fields are already part of the RubyRequest class, and are always 0 for non-GPU requests, it should not affect other components beyond slightly longer prints. Change-Id: I31c9122b82dfa2c6415ce25d225ea82cb35c7333	2023-11-07 00:52:37 -06:00
Giacomo Travaglini	1b05c0050b	mem-ruby: Clear the atomic log from the DataBlock in CHI The new far atomics implementation [1] didn't take into consideration it was supposed to manually clear the atomic log. This caused a memory leak where the log queue was getting bigger and bigger as no cleaning was happening [1]: https://github.com/gem5/gem5/pull/177 Change-Id: I4a74fbf15d21e35caec69c29117e2d98cc86d5ff Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-29 09:26:09 +00:00
Bobby R. Bruce	d42eeb6b68	cpu: Explicitly define cache_line_size -> 64-bit unsigned int (#329 ) While it's plausible to define the cache_line_size as a 32-bit unsigned int, the use of cache_line_size is way out of its original scope. cache_line_size has been used to produce an address mask, which masking out the offset bits from an address. For example, [1], [2], [3], and [4]. However, since the cache_line_size is an "unsigned int", the type of the value is not guaranteed to be 64-bit long. Subsequently, the bit twiddling hacks in [1], [2], [3], and [4] produce 32-bit mask, i.e., 0x00000000FFFFFFC0. This behavior at least caused a problem in LLSC in RISC-V [5], where the load reservation (LR) relies on the mask to produce the cache block address. Two distinct 64-bit addresses can be mapped to the same cache block using the above mask. This patch explicitly defines cache_line_size as a 64-bit unsigned int so the cache block mask can be produced correctly for 64-bit addresses. [1] `3bdcfd6f7a/src/cpu/simple/atomic.hh (L147)` [2] `3bdcfd6f7a/src/cpu/simple/timing.hh (L224)` [3] `3bdcfd6f7a/src/cpu/o3/lsq_unit.cc (L241)` [4] `3bdcfd6f7a/src/cpu/minor/lsq.cc (L1425)` [5] `3bdcfd6f7a/src/arch/riscv/isa.cc (L787)`	2023-10-16 07:50:35 -07:00
Daniel Kouchekinia	4931fb0010	mem-ruby: Always pass on GPU atomics to dir in write-through TCC (#367 ) Added checks to ensure that atomics are not performed in the TCC when it is configured as a write-through cache. Also added SLC bit overwrite to ensure directory preforms atomics when there is a write-through TCC. Change-Id: I4514e6c8022aeb7785f2c59871cd9acec8161ed8	2023-10-14 06:39:50 -07:00
Vishnu Ramadas	8d54a5cbab	mem-ruby: Remove BUILD_GPU guards from ruby coalescer models A previous commit added BUILD_GPU guards to gpu coalescer models since a related cache recorder commit added GPU support. This is no longer needed since the cache recorder moved to using a vector of RubyPorts instead of Sequencer/GPUCoalescer pointers. This commit removes BUILD_GPU guards from the Ruby coalescer models Change-Id: I23a7957d82524d6cd3483d22edfb35ac51796eca	2023-10-12 14:53:29 -05:00
Vishnu Ramadas	08c1af1b16	mem-ruby: Use RubyPort vector to access Ruby in cache recorder Previously, the cache recorder used a vector of sequencer pointers to access Ruby objects. A recent commit updated the cache recorder to also maintain a vector of GPUCoalescer pointers in order for GPUs to support flushin. This added redundant code to the cache recorder. This commit replaces the sequencer and GPUCoalescer vectors with a vector of RubyPort pointers so that the code does not contain redundant lines Change-Id: Id5da33fb870f17bb9daef816cc43c0bcd70a8706	2023-10-12 14:49:06 -05:00
Bobby R. Bruce	298119e402	misc,python: Run `pre-commit run --all-files` Applies the `pyupgrade` hook to all files in the repo. Change-Id: I9879c634a65c5fcaa9567c63bc5977ff97d5d3bf	2023-10-10 21:47:07 -07:00
Bobby R. Bruce	ddf6cb88e4	misc: Run `pre-commit run --all-files` This is reflect the updates made to black when running `pre-commit autoupdate`. Change-Id: Ifb7fea117f354c7f02f26926a5afdf7d67bc5919	2023-10-10 14:01:58 -07:00
Matt Sinclair	ec633b3d68	dev-amdgpu,mem-ruby: Add support to checkpoint and restore between kernels in GPUFS (#377 ) Earlier, GPU checkpointing was working only if a checkpoint was created before the first kernel execution. This pull request adds support to checkpoint in-between any two kernel calls. It does so by doing the following. - Adds flush support in the GPU_VIPER protocol - Adds flush support in the GPUCoalescer - Updates cache recorder to use the GPUCoalescer during simulation cooldown and cache warmup times.	2023-10-10 09:41:21 -05:00
Giacomo Travaglini	00748c7901	mem-ruby: Fix CHI fromSequencer helper function This has been broken by #177 Change-Id: I52feff4b5ab2faf0aa91edd6572e3e767c88e257 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-06 14:51:11 +01:00
Vishnu Ramadas	a19667427a	mem-ruby: Add BUILD_GPU guard to ruby cooldown and warmup phases Ruby was recently updated to support flushes and warmup for GPUs. Since this support uses the GPUCoalescer, non-GPU builds face a compile time issue. This is because GPU code is not built for non-GPU builds. This commit addes "#if BUILD_GPU" guards around the GPU-related code in common files like AbstractController.hh, CacheRecorder.*, RubySystem.cc, GPUCoalescer.hh, and VIPERCoalescer.hh. This support allows GPU builds to use flushing while non-GPU builds compile without problems Change-Id: If8ee4ff881fe154553289e8c00881ee1b6e3f113	2023-10-05 18:59:54 -05:00
Víctor Soria	6411b2255c	mem-ruby,configs: Add CHI far atomics support Introduce far atomic operations in CHI protocol. Three configuration parameters have been used to tune this behavior: policy_type: sets the atomic policy to one of the described in our paper atomic_op_latency: simulates the AMO ALU operation latency comp_anr: configures the Atomic No return transaction to split CompDBIDResp into two different messages DBIDResp and Comp Change-Id: I087afad9ad9fcb9df42d72893c9e32ad5a5eb478	2023-10-04 19:19:08 +02:00
Víctor Soria	4fd9d66c53	tests,mem-ruby: Enhance ruby false sharing test with Atomics New ruby mem test includes a percentages of AMOs that will be executed randomly in ruby mem test Change-Id: Ie95ed78e59ea773ce6b59060eaece3701fe4478c	2023-10-04 19:11:01 +02:00
Vishnu Ramadas	ae5a51994c	mem-ruby: Update cache recorder to use GPUCoalescer port for GPUs Previously, the cache recorder used the Sequencer to issue flush requests and cache warmup requests. The GPU however uses GPUCoalescer to access the cache, and not the Sequencer. This commit adds a GPUCoalescer map to the cache recorder and uses it to send flushes and cache warmup requests to any GPU caches in the system Change-Id: I10490cf5e561c8559a98d4eb0550c62eefe769c9	2023-10-02 19:05:10 -05:00
Vishnu Ramadas	085789d00c	mem-ruby: Add flush support to GPU_VIPER protocol This commit adds flush support to the GPU VIPER coherence protocol. The L1 cache will now initiate a flush request if the packet it receives is of type RubyRequestType_FLUSH. During the flush process, the L1 cache will a request to L2 if its in either V or I state. L2 will issue a flush request to the directory if its cache line is in the valid state before invalidating its copy. The directory, on receiving this request, writes data to memory and sends an ack back to the L2. L2 forwards this ack back to the L1, which then ends the flush by calling the write callback Change-Id: I9dfc0c7b71a1e9f6d5e9e6ed4977c1e6a3b5ba46	2023-10-02 19:05:10 -05:00
Vishnu Ramadas	61e39d5b26	mem-ruby: Add cache cooldown and warmup support to GPUCoalescer The GPU Coalescer does not contain cache cooldown and warmup support. This commit updates the coalsecer to support cache cooldown during flush and warmup during checkpoint restore. Change-Id: I5459471dec20ff304fd5954af1079a7486ee860a	2023-10-02 19:05:04 -05:00
Vishnu Ramadas	a50ead5907	mem-ruby: Add Flush as a supported memory type in VIPERCoalescer This commit adds flush as a recognized memory type in VIPERCoalescer. Change-Id: I0f1b6f4518548e8e893ef681955b12a49293d8b4	2023-10-02 19:02:55 -05:00
Yu-hsin Wang	9ca2672cab	misc: fix g++13 overloaded-virtual warning There are two overloaded-virtual issues reported by g++13. 1. Copy assignment and move assignment overload is hidden in the derived class [ CXX] src/mem/cache/replacement_policies/weighted_lru_rp.cc -> ALL/mem/cache/replacement_policies/weighted_lru_rp.o In file included from src/mem/cache/base.hh:61, from src/mem/cache/base.cc:46: src/mem/cache/cache_blk.hh:172:5: error: ‘virtual gem5::CacheBlk& gem5::CacheBlk::operator=(gem5::CacheBlk&&)’ was hidden [-Werror=overloaded-virtual=] 172 \| operator=(CacheBlk&& other) \| ^~~~~~~~ src/mem/cache/cache_blk.hh:518:19: note: by ‘gem5::TempCacheBlk& gem5::TempCacheBlk::operator=(const gem5::TempCacheBlk&)’ 518 \| TempCacheBlk& operator=(const TempCacheBlk&) = delete; \| ^~~~~~~~ In this case, we can exiplict using parent operator= to keep the function overload. 2. Intended overload hidden in SystemC is reported as error. In file included from src/systemc/ext/tlm_utils/simple_initiator_socket.h:24, from src/systemc/tlm_bridge/gem5_to_tlm.hh:72, from build/ALL/python/_m5/param_Gem5ToTlmBridge256.cc:17: src/systemc/ext/tlm_utils/../tlm_core/2/sockets/initiator_socket.hh: In instantiation of ‘class tlm::tlm_base_initiator_socket<256, tlm::tlm_fw_transport_if<>, tlm::tlm_bw_transport_if<>, 1, sc_core::SC_ONE_OR_MORE_BOUND>’: src/systemc/ext/tlm_utils/../tlm_core/2/sockets/initiator_socket.hh:185:7: required from ‘class tlm::tlm_initiator_socket<256, tlm::tlm_base_protocol_types, 1, sc_core::SC_ONE_OR_MORE_BOUND>’ src/systemc/ext/tlm_utils/simple_initiator_socket.h:37:7: required from ‘class tlm_utils::simple_initiator_socket_b<sc_gem5::Gem5ToTlmBridge<256>, 256, tlm::tlm_base_protocol_types, sc_core::SC_ONE_OR_MORE_BOUND>’ src/systemc/ext/tlm_utils/simple_initiator_socket.h:156:7: required from ‘class tlm_utils::simple_initiator_socket<sc_gem5::Gem5ToTlmBridge<256>, 256, tlm::tlm_base_protocol_types>’ src/systemc/tlm_bridge/gem5_to_tlm.hh:147:46: required from ‘class sc_gem5::Gem5ToTlmBridge<256>’ /usr/include/c++/13/type_traits:1411:38: required from ‘struct std::is_base_of<sc_gem5::Gem5ToTlmBridgeBase, sc_gem5::Gem5ToTlmBridge<256> >’ ext/pybind11/include/pybind11/detail/../detail/common.h:880:59: required from ‘struct pybind11::class_<sc_gem5::Gem5ToTlmBridge<256>, sc_gem5::Gem5ToTlmBridgeBase, std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete> >::is_valid_class_option<sc_gem5::Gem5ToTlmBridgeBase>’ ext/pybind11/include/pybind11/detail/../detail/common.h:719:35: required by substitution of ‘template<class ... Ts> using pybind11::detail::all_of = pybind11::detail::bool_constant<(Ts::value && ...)> [with Ts = {pybind11::class_<sc_gem5::Gem5ToTlmBridge<256>, sc_gem5::Gem5ToTlmBridgeBase, std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete> >::is_valid_class_option<sc_gem5::Gem5ToTlmBridgeBase>, pybind11::class_<sc_gem5::Gem5ToTlmBridge<256>, sc_gem5::Gem5ToTlmBridgeBase, std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete> >::is_valid_class_option<std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete> >}]’ ext/pybind11/include/pybind11/pybind11.h:1506:70: required from ‘class pybind11::class_<sc_gem5::Gem5ToTlmBridge<256>, sc_gem5::Gem5ToTlmBridgeBase, std::unique_ptr<sc_gem5::Gem5ToTlmBridge<256>, pybind11::nodelete> >’ build/ALL/python/_m5/param_Gem5ToTlmBridge256.cc:34:179: required from here src/systemc/ext/tlm_utils/../core/sc_port.hh:125:18: error: ‘void sc_core::sc_port_b<IF>::bind(sc_core::sc_port_b<IF>&) [with IF = tlm::tlm_fw_transport_if<>]’ was hidden [-Werror=overloaded-virtual=] 125 \| virtual void bind(sc_port_b<IF> &p) { sc_port_base::bind(p); } \| ^~~~ In file included from src/systemc/ext/tlm_utils/simple_initiator_socket.h:27: src/systemc/ext/tlm_utils/../tlm_core/2/sockets/initiator_socket.hh:133:18: note: by ‘tlm::tlm_base_initiator_socket<256, tlm::tlm_fw_transport_if<>, tlm::tlm_bw_transport_if<>, 1, sc_core::SC_ONE_OR_MORE_BOUND>::bind’ 133 \| virtual void bind(bw_interface_type &ifs) { (get_base_export())(ifs); } \| ^~~~ src/systemc/ext/tlm_utils/../core/sc_port.hh:124:18: error: ‘void sc_core::sc_port_b<IF>::bind(IF&) [with IF = tlm::tlm_fw_transport_if<>]’ was hidden [-Werror=overloaded-virtual=] 124 \| virtual void bind(IF &i) { sc_port_base::bind(i); } \| ^~~~ src/systemc/ext/tlm_utils/../tlm_core/2/sockets/initiator_socket.hh:133:18: note: by ‘tlm::tlm_base_initiator_socket<256, tlm::tlm_fw_transport_if<>, tlm::tlm_bw_transport_if<>, 1, sc_core::SC_ONE_OR_MORE_BOUND>::bind’ 133 \| virtual void bind(bw_interface_type &ifs) { (get_base_export())(ifs); } \| ^~~~ From the code comment, it's intended in SystemC header. // The overloaded virtual is intended in SystemC, so we'll disable the warning. // Please check section 9.3 of SystemC 2.3.1 release note for more details. The issue is we should move the skip to the base class. Change-Id: I6683919e594ffe1fb3b87ccca1602bffdb788e7d	2023-09-27 13:43:28 +08:00
Giacomo Travaglini	f5968da41c	mem-ruby: start using txnid and DBID identifiers in CHI transactions (#288 ) With this PR our CHI implementation starts making use of the txnid and DBID identifiers. Note: we were already making use of the txnId for DVM messages to convey the DVM address. This is still the case. In the future we should realign the DVM logic so that the txnId is solely used as a transaction identifier.	2023-09-26 09:51:47 +01:00
Hoa Nguyen	1fc89bc8ae	cpu,mem,dev: Use Addr for cacheLineSize Change-Id: I2f056571dbf35081d58afda09726c600141d5a05 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2023-09-20 14:16:46 -07:00
Hoa Nguyen	ac5280fedc	mem,sim: Change the type of cache_line_size to Addr Change-Id: Id39e8249fef89c0d59bb39f8104650257ff00245 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2023-09-20 14:00:45 -07:00
Giacomo Travaglini	aec1d081c8	mem-ruby: Populate missing txnId field to CompDBID_Stale response Change-Id: I6861d27063b13cd710e09c153d15062640c887fe Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-09-18 15:23:21 +01:00
Bobby R. Bruce	3bdcfd6f7a	mem-ruby: patch fixes a protocol error in MOESI_CMP_Directory (#316 ) When there is race between FwdGetX and PUTX on owner. Owner in this case hands off ownership to GetX requestor and PUTX still goes through. But since owner has changed, state should go back to M and PUTX is essentially trashed. An Unblock to the Directory in this case will give an undefined transition. I have added transitions which indicate that when an Unblock is served to the Directory, it means that some kind of ownership transfer has happened while a PUTX/PUTO was in progress.	2023-09-15 13:25:51 -07:00

1 2 3 4 5 ...

3310 Commits