derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Jason Lowe-Power	97542c1a4c	mem-ruby,scons: Add scons option for multiple protocols This change does many things, but they must all be atomically done. USER FACING CHANGE: The Ruby protocols in Kconfig have changed names (they are now the same case as the SLICC file names). So, after this commit, your build configurations need to be updated. You can do so by running `scons menuconfig <build dir>` and selecting the right ruby options. Alternatively, if you're using a `build_opts` file, you can run `scons defconfig build/<ISA> build_opts/<ISA>` which should update your config correctly. Detailed changes are described below. Kconfig changes: - Kconfig files in ruby now must all be declared in the ruby/Kconfig file - All of the protocol names are changed to match their slicc file names including the case - A new option is available called "Use multiple protocols" which should be selected if multiple protocols are selected. This is only used to set the PROTOCOL variable to "MULTIPLE" when in multiple mode. - The PROTOCOL variable can now be "MULTIPLE" which means it will be ignored. If it's not "MULTIPLE" then it holds the "main" protocol, which is necessary for backwards compatibility with the Ruby.py files. Ruby config changes: To make this change backwards compatible with Ruby.py, this change adds a new "protocol" config called MULTIPLE.py which is used to allow the user to set a "--protocol" option on the command line. This is only needed if you are using a gem5 binary with multiple protocols but need to use Ruby.py. stdlib changes: - Make the coherence protocol file behave like the ISA file - Add a function to get the coherence protocol from the `CacheHierarchy` like we do with the ISA in the `Processor`. - Use this function where `get_runtime_coherence_protocol` was used - Update the requires code to work with the ne CoherenceProtocol - Fix a typo in the AMD Hammer name and also add the missing MSI protocol Scons changes: - In Ruby we now gather up all of the protocols and build them all if there are multiple protocols - There's some bending over backwards to tell the user if they are using an out of date gem5.build/config file and how to update it - Note that multiple ruby protocols adds a significant amount of time to the build since we have to run slicc twice for each file. build_opts: - Update all files with new names - Add a new NULL_All_Ruby that will be used for testing Signed-off-by: Jason Lowe-Power <jason@lowepower.com>	2024-11-19 11:00:34 -08:00
Jason Lowe-Power	9a904478eb	mem-ruby: Use runtime protocol instead of #defines This removes two #defines: PARTIAL_FUNC_READS and PROTOCOL_<protocol>. Instead, update the code to use the runtime information about which protocol we are using. Change-Id: Icb6f10fc2d3fd59128c62f9f6e37b52ef2581b61 Signed-off-by: Jason Lowe-Power <jason@lowepower.com>	2024-11-19 10:53:59 -08:00
Jason Lowe-Power	b7ce3040de	mem-ruby: Add ProtocolInfo class Add a ProtocolInfo class that is specialized (through inheritance) for each protocol. This class currently has the protocol's name and any protocol-specific options (partial_func_reads is the only one so far). Note that the SLICC language has been updated so that you can specify the options in the `protocol` statement in the `.slicc` file. Signed-off-by: Jason Lowe-Power <jason@lowepower.com>	2024-11-19 10:53:59 -08:00
Jason Lowe-Power	3ba16adeff	scons: Change scons for multiple protocols in SLICC This change is a step toward multiple protocols building at the same time in scons. Add functions and use lists instead of single protocol. Signed-off-by: Jason Lowe-Power <jason@lowepower.com>	2024-11-19 10:53:58 -08:00
Jason Lowe-Power	feb45c9cb9	mem-ruby: Move protocol files to subdir Move all generated protocol-specific files to a subdirectory with the protocol's name. This change also updates SLICC to have separate variables for the filename, c identifier and python identifier instead of just using variations of the c identifier. Change-Id: I62f69a4606b030ee23cb2d96493f3257a6923748 Signed-off-by: Jason Lowe-Power <jason@lowepower.com>	2024-11-19 10:53:58 -08:00
Jason Lowe-Power	3a4465d908	mem-ruby: Use namespaces for protocol types Wrap all protocol-specific types in `namespace <protocol>`. This will facilitate compiling multiple protocols into one binary. There is a one-time hack to the generated `MachineType.cc` file to use the namespace for the protocol until we generalize the machine types. Change-Id: I5947e8ac69afe6f7ed257d7c5980ad65e9338acf Signed-off-by: Jason Lowe-Power <jason@lowepower.com>	2024-11-19 10:53:58 -08:00
Jason Lowe-Power	1b84fbbeae	mem-ruby: Use shared and per-protocol SLICC files This changes extends SLICC to understand two different kinds of slicc files: files that are protocol-specific and files that are shared or included between different protocols. Each declaration in SLICC can now be shared or not. If it is shared, then we can take a different action in the code generation (e.g., wrap in a namespace). Developer facing change Removes the RubySlicc_interfaces.slicc file from the SLICC includes of every protocol. Changes required: If you have a custom protocol, you will need to remove the line `include "RubySlicc_interfaces.slicc" from your .slicc file. Change-Id: Ia6c2dafe2b8fe86749a13d17daa885bddd166855 Signed-off-by: Jason Lowe-Power <jason@lowepower.com>	2024-11-19 10:53:47 -08:00
aperais	b82ab5ac89	misc: Do not share the random number generator across components (#1534 ) Component that require randomness should not share their randomness source with other components to avoid simulation noise. For instance, the branch predictor of one core should not impact the random cache replacement policy of the cache of another core. This currently happens as all components share a single random number generator. This PR provides their own generators to relevant components, although a couple components still use rand(). Change-Id: I3fb7226111c9194ee457af0f0f2b83f8c7b69d1e Co-authored-by: Arthur Perais <arthur.perais@univ-grenoble-alpes.fr>	2024-11-18 01:37:12 -08:00
Marleson Graf	c31bc284a8	mem-ruby,sim-se: Fix functional reads for MESI protocols This commit fixes three issues in MESI_Three_Level and MESI_Two_Level implementations (MEI_Three_Level_HTM might still have issues). 1) Define functional read priorities for the cache controllers which have states with Maybe_Stale access permission (L1 > L2 > Directory). 2) Fix incorrect access permissions in MESI_Three_Level-L1cache: * S_IL0 is Read_Only, it is waiting for L0 to acknowledge the invalidation request before moving to SS, also a Read_Only state. * E_IL0 is Maybe_Stale, its contents might be valid, since there is a transition (E_IL0, L0_Ack, EE) with no writeback data. * M_IL0 is Maybe_Stale, its contents might be valid, since there is a transition (M_IL0, L0_Ack, MM) with no writeback data. 3) Add missing message types carrying valid data in functional reads: * INV_DATA is a writeback from L0 to L1. * DATA is a response to GET_S, but there are scenarios where it might be the only place with valid data (e.g. during L2 replacement). Change-Id: Ie44fa317027f9ede272967e7461d337e14355eec	2024-11-18 00:22:45 -08:00
Marleson Graf	63d110fb7a	mem-ruby,sim-se: Support Maybe_Stale in functional reads Functional reads can be satisfied by one of the following, in order: 1. Main memory (when the data is not present in the cache hierarchy); 2. Valid data block in cache; 3. Valid data block in coherence message; 4. Valid data block marked as Maybe_Stale; Number 4 is not handled by the current implementation. A Maybe_Stale block can be either truly stale or actually valid. When it is stale, the memory read will be satisfied by either number 2 or number 3. When it is valid, there will be no coherence message with valid data inside, and the Maybe_Stale block will transition to a valid state after receiving some kind of acknowledgement. The main challenge to handle number 4 is how to know from which Maybe_Stale block the data should be read from. For instance, in a two level cache hierarchy, we might have a block marked as Maybe_Stale in both L1 and L2. In this case, we should prioritize the cache controller that is closest to the CPU. To define this priority, a new virtual function 'functionalReadPriority' was added to the AbstractController class. Change-Id: I4774cd01aab7bb9ca53694cd9dc4f9416a8e4025	2024-11-18 00:22:36 -08:00
Vishnu Ramadas	d463868f28	dev-amdgpu, gpu-compute, mem-ruby: Add support for writeback L2 in GPU (#1692 ) Previously, GPU L2 caches could be configured in either writeback or writethrough mode when used in an APU. However, in a CPU+dGPU system, only writethrough worked. This is mainly because in CPU+dGPU system, the CPU sends either PCI or SDMA requests to transfer data from the GPU memory to CPU. When L2 cache is configured to be writeback, the dirty data resides in L2 when CPU transfers data from GPU memory. This leads to the wrong version being transferred. A similar issue also crops up when the GPU command processor reads kernel information before kernel dispatch, only to incorrect data. This PR contains a set of commits that fix both these issues.	2024-11-05 10:45:46 -08:00
Matthew Poremba	2ed724b670	mem-ruby: Fix two NetDest locals using default constructor (#1746 ) Two NetDest locally declared variables are using default constructor instead of constructor with RubySystem pointer. This will cause asserts when (1) garnet is used or (2) a protocol that uses `broadcast()` is built. Fix these two by passing the appropriate RubySystem pointers.	2024-11-02 08:37:04 -07:00
Bobby R. Bruce	d8e7c91127	mem-ruby: Remove unused variables/mark [maybe unused] (#1650 ) PR gem5#1453 left some unused variables in the ruby code that triggered "unused variable" warnings found comiling ALL/gem5.opt to use the CHI protocol. These have been removed.	2024-10-29 14:31:20 -07:00
Marleson Graf	7bddc764cc	mem-ruby: Prevent LL/SC livelock in MESI protocols (#1384 ) (#1399 ) Fix #1384. MESI_Two_Level and MESI_Three_Level protocols are susceptible to LL/SC livelocks when simulating boards with high core count. This fix is based on MOESI_CMP_directory's implementation of locked states, but tailors the solution to only apply it when a Load-Linked is initiated. There are two new states to act as locked states and stall any messages leading to eviction: * LLSC_E: equivalent to E state, go to E after timeout. * LLSC_M: equivalent to M state, go to M after timeout. The main new event is Load_Linked, which is very similar (in behavior) to a Store, reusing several transient states. When a controller receives the exclusive data, it differentiates a Load_Linked from a Store by checking a new field added to the TBE: 'isLoadLinked'. It triggers a different event when it is a Load_Linked, which in turn causes the transition to one of the locked states. The entire mechanism can be turned off by setting 'use_llsc_lock' to false, and the amount of time to keep locked is defined by 'llsc_lock_timeout_latency'. Change-Id: I13f415b6b7890d51d01f23001047d2363467a814	2024-10-28 09:57:10 -07:00
Matthew Poremba	16217f843f	mem-ruby: Fix issues in protocols due to multi-RubySystem (#1690 ) Starting with https://github.com/gem5/gem5/pull/1453 , some Ruby structures require a block size be set and other require a pointer to the Ruby system. This fixes some cases which were not covered by the per-checkin tests but seen in daily+ tests. In particular: - WriteMasks and PerfectCacheMemory must explicitly set a block size. - NetDest and RubyProxyPort require RubySystem pointer. - Classes inheriting Message now have a setRubySystem collecting all objects that need a RubySystem pointer and this should be called in the constructor of the Message. This commit makes sure all of these happen. This should fix daily arm_boot_tests and daily learning_gem5 tests.	2024-10-21 12:30:03 -07:00
Bobby R. Bruce	db47d20371	mem-ruby,misc: Remove redundant assignment (#1685 ) This caused a warning to be thrown in Clang 19.	2024-10-20 13:02:53 -07:00
Matthew Poremba	4f7b3ed827	mem-ruby: Remove static methods from RubySystem (#1453 ) There are several parts to this PR to work towards #1349 . (1) Make RubySystem::getBlockSizeBytes non-static by providing ways to access the block size or passing the block size explicitly to classes. The main changes are: - DataBlocks must be explicitly allocated. A default ctor still exists to avoid needing to heavily modify SLICC. The size can be set using a realloc function, operator=, or copy ctor. This is handled completely transparently meaning no protocol or config changes are required. - WriteMask now requires block size to be set. This is also handled transparently by modifying the SLICC parser to identify WriteMask types and call setBlockSize(). - AbstractCacheEntry and TBE classes now require block size to be set. This is handled transparently by modifying the SLICC parser to identify these classes and call initBlockSize() which calls setBlockSize() for any DataBlock or WriteMask. - All AbstractControllers now have a pointer to RubySystem. This is assigned in SLICC generated code and requires no changes to protocol or configs. - The Ruby Message class now requires block size in all constructors. This is added to the argument list automatically by the SLICC parser. (2) Relax dependence on common functions in src/mem/ruby/common/Address.hh so that RubySystem::getBlockSizeBits is no longer static. Many classes already have a way to get block size from the previous commit, so they simply multiple by 8 to get the number of bits. For handling SLICC and reducing the number of changes, define makeCacheLine, getOffset, etc. in RubyPort and AbstractController. The only protocol changes required are to change any "RubySystem::foo()" calls with "m_ruby_system->foo()". For classes which do not have a way to get access to block size but still used makeLineAddress, getOffset, etc., the block size must be passed to that class. This requires some changes to the SimObject interface for two commonly used classes: DirectoryMemory and RubyPrefecther, resulting in user-facing API changes User-facing API changes: - DirectoryMemory and RubyPrefetcher now require the cache line size as a non-optional argument. - RubySequencer SimObjects now require RubySystem as a non-optional argument. - TesterThread in the GPU ruby tester now requires the cache line size as a non-optional argument. (3) Removes static member variables in RubySystem which control randomization, cooldown, and warmup. These are mostly used by the Ruby Network. The network classes are modified to take these former static variables as parameters which are passed to the corresponding method (e.g., enqueue, delayHead, etc.) rather than needing a RubySystem object at all. Change-Id: Ia63c2ad5cf0bf9d1cbdffba5d3a679bb4d3b1220 (4) There are two major SLICC generated static methods: getNumControllers() on each cache controller which returns the number of controllers created by the configs at run time and the functions which access this method, which are MachineType_base_count and MachineType_base_number. These need to be removed to create multiple RubySystem objects otherwise NetDest, version value, and other objects are incorrect. To remove the static requirement, MachineType_base_count and MachineType_base_number are moved to RubySystem. Any class which needs to call these methods must now have a pointer to a RubySystem. To enable that, several changes are made: - RubyRequest and Message now require a RubySystem pointer in the constructor. The pointer is passed to fields in the Message class which require a RubySystem pointer (e.g., NetDest). SLICC is modified to do this automatically. - SLICC structures may now optionally take an "implicit constructor" which can be used to call a non-default constructor for locally defined variables (e.g., temporary variables within SLICC actions). A statement such as "NetDest bcast_dest;" in SLICC will implicitly append a call to the NetDest constructor taking RubySystem, for example. - RubySystem gets passed to Ruby network objects (Network, Topology).	2024-10-08 08:14:50 -07:00
Erin (Jianghua) Le	c10feed524	tests, configs, util, mem, python, systemc: Change base 10 units to base 2 (#1605 ) This commit changes metric units (e.g. kB, MB, and GB) to binary units (KiB, MiB, GiB) in various files. This PR covers files that were missed by a previous PR that also made these changes.	2024-10-01 11:18:05 -07:00
Jarvis Jia	9dfd66aca4	mem-ruby: Fix replacement policy in GPU_VIPER The current GPU_VIPER protocol's TCC cache update the MRU information twice with calling a_allocateBlock and ut_updateTag which affectgs the LIP and RRIP replacement polies. Remove ut_updateTag fixes the LIP and RRIP replacement polies. Change-Id: I79ad9392593e00425a7fe8828048465b2c2c2e1f	2024-09-14 23:22:22 -05:00
Marco Kurzynski	a8447b7fc0	arch-vega: Pass s_memtime through smem pipe (#1350 ) The Vega ISA's s_memtime instruction is used to obtain a cycle value from the GPU. Previously, this was implemented to obtain the cycle count when the memtime instruction reached the execute stage of the GPU pipeline. However, from microbenchmarking we have found that this under reports the latency for memtime instructions relative to real hardware. Thus, we changed its behavior to go through the scalar memory pipeline and obtain a latency value from the the SQC (L1 I$). This mirrors the suggestion of the AMD Vega ISA manual that s_memtime should be treated like a s_load_dwordx2. The default latency was set based on microbenchmarking. Change-Id: I5e251dde28c06fe1c492aea4abf9f34f05784420	2024-08-26 19:47:04 -07:00
Marleson Graf	b8001a861b	mem-ruby,sim-se: Clear LL/SC locks after functional writes (#1404 ) Functional writes atomically update all copies of a data block, so they should invalidate any pending LL/SC locks, just like a conventional write would. Change-Id: Ic79d2d8d24901f1b6a2ce81dc0e2decc84c0ebbc	2024-08-09 09:30:37 -07:00
Jarvis Jia	341c72839b	Fix hit issue Change-Id: I28745489de693591d5ad8453b035a8c782adaf1f	2024-06-24 11:19:51 -07:00
Jarvis Jia	21b69975a6	Fix compilation error Change-Id: I8273472b8d0cff8c02f2d1e1a9d66599af7c4866	2024-06-24 11:19:51 -07:00
Jarvis Jia	e957a882ed	gpu-compute,mem-ruby: Add RubyHitMiss flag for TCP and TCC cache Add hit and miss print for TCP and TCC cache with RubyHitMiss debug flag Change-Id: I40ae3449020b917f39ac91d29fa4e1dd7c791e7b	2024-06-24 11:19:51 -07:00
Bobby R. Bruce	3138c8a8b1	gpu-compute,mem-ruby: Revert "Add RubyHitMiss flag for TCP and TCC cache" (#1254 ) Reverts gem5/gem5#1226	2024-06-18 07:58:54 -07:00
hahaxxz	fef6a97f93	mem-ruby: This commit fixes MI_example protocol (#1236 ) fix two bugs in MI_example-dir.sm: 1. Directory cannot handle DMA_READ & DMA_WRITE events in M_DRDI state. 2. Directory cannot handle PUTX_NotOwner events in {M_DWR, M_DRD, M_DRDI, M_DWRI} state. Github Issue: https://github.com/gem5/gem5/issues/1210 Change-Id: I52a9d674ce0688dcfbbcc2b583f17de95afdeb87	2024-06-17 12:45:11 -07:00
Jarvis Jia	3a2bf47d57	Add default value and change Ruby address format specifier Change-Id: I8fbaf34745e90589e610d3b9bd423937e7ebdc3d	2024-06-17 03:27:25 -05:00
Jarvis Jia	87c0d7732c	Merge branch 'develop' into rubyhitmiss	2024-06-12 17:30:35 -04:00
Jarvis Jia	edfc139c40	Change black format Change-Id: I3733b31baf187e0d3d38d971d9423a1b1afe2296 gpu-compute: add GPU RubyHitMiss for TCP and TCC Change-Id: I4430532b901811e03d9b077b61e2eca4557b34e1 gpu-compute: Add RubyHitMiss flag for TCP and TCC cache Change-Id: I4e5d1127c84b9eb1060ec9ba0b6638267449eda5 gpu-compute: Add RubyHitMiss flag for TCP and TCC cache Change-Id: I4e5d1127c84b9eb1060ec9ba0b6638267449eda5 Remove space Change-Id: I401f528c6f128ba0956bdbc232e8f2ae37bf648c	2024-06-12 16:04:36 -05:00
Matthew Poremba	be0a7937c1	mem-ruby: Fix deadlock in GPU_VIPER when issuing atomic requests (#1216 ) When a compute unit issues several requests to the same line, the requests wait in the L2 if it is a writeback cache. If the line is invalid initially and the first request is atomic in nature, the L2 cache issues a request to main memory. On data return, the cache line transitions to M but doesn't wake up the other requests, resulting in a deadlock. This commit adds a wakeup call on data return for atomics and fixes potential deadlocks.	2024-06-12 10:10:32 -07:00
Vishnu Ramadas	42b9a9666e	mem-ruby: Add instSeqNum to atomic responses from GPU L2 caches This commit adds instSeqNum to the atomic responses in GPU_VIPER-TCC.sm. This will be useful when debugging issues related to GPU atomic transactions Change-Id: Ic05c8e1a1cb230abfca2759b51e5603304aadaa3	2024-06-11 20:35:43 -05:00
Vishnu Ramadas	943d1f1453	mem-ruby: Fix deadlock in GPU_VIPER when issuing atomic requests When a compute unit issues several requests to the same line, the requests wait in the L2 if it is a writeback cache. If the line is invalid initially and the first request is atomic in nature, the L2 cache issues a request to main memory. On data return, the cache line transitions to M but doesn't wake up the other requests, resulting in a deadlock. This commit adds a wakeup call on data return for atomics and fixes potential deadlocks. Change-Id: I8200ce6e77da7c8b4db285c0cc8b8ca0dfa7d720	2024-06-11 20:33:46 -05:00
NSurawar	efbfdeabd7	mem-ruby: Reduce handshaking between CorePair and dir (#1117 ) Currently when data is downgraded by MOESI_AMD_Base-CorePair (e.g. due to a replacement) this requires a 4-way handshake between the CorePair and the dir. Specifically, the CorePair send a message telling the dir it'd like to downgrade then, the dir sends an ACK back and then, the CorePair writes the data back, and finally, the dir ACKs the writeback. This is very inefficient and not representative of how modern protocols downgrade a request. Accordingly, this commits updates the downgrade support such that the CorePair writes back the data immediately and then the dir ACKs it. Thus, this approach requires only a 2-way handshake. Change-Id: I7ebc85bb03e8ce46a8847e3240fc170120e9fcd6 Co-authored-by: Neeraj Surawar <neerajs@hyrule.cs.wisc.edu>	2024-05-30 09:36:29 -07:00
Matthew Poremba	e82cf20150	mem-ruby: Remove VIPER StoreThrough temp cache storage (#1156 ) StoreThrough in VIPER when the TCP is disabled, GLC bit is set, or SLC bit is set will bypass the TCP, but will temporarily allocate a cache entry seemingly to handle write coalescing with valid blocks. It does not attempt to evict a block if the set is full and the address is invalid. This causes a panic if the set is full as there is no spare cache entry to use temporarily to use for DataBlk manipulation. However, a cache block is not required for this. This commit removes using a cache block for StoreThrough with invalid blocks as there is no existing data to coalesce with. It creates no allocate variants of the actions needed in StoreThrough and pulls the DataBlk information from the in_msg instead. Non-invalid blocks do not have this panic as they have a cache entry already. Fixes issues with StoreThroughs on more aggressive architectures like MI300. Change-Id: Id8687eccb991e967bb5292068cbe7686e0930d7d	2024-05-28 11:02:00 -07:00
Ivana Mitrovic	233135da81	mem-ruby: Fix NullPointerException in RubyRequest (#1118 ) This PR includes a check for `m_pkt` being null and appropriately handles that case. This issue was causing the Daily tests to fail. Change-Id: I87142ca14ca4ab3d8306153a1cf34c2629a119ba	2024-05-09 08:46:13 -07:00
Giacomo Travaglini	0df5635bdf	mem-ruby: Implement NS bit for CHI transactions (#1100 ) This patch is adding the NS bit to CHI requests to make sure they are properly tagged according to their security Change-Id: I33d3610edefbb5a05a6090e9125c35d4fb8bca58 Reviewed-by: Tiago Muck <tiago.muck@arm.com> Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-05-08 07:46:50 +02:00
Giacomo Travaglini	36c1ea9c61	mem-ruby: Implement MakeReadUnique in CHI (#1101 ) Change-Id: I64cd3c62804cca184d68287fc099534e9205f2b8 Reviewed-by: Tiago Muck <tiago.muck@arm.com> Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-05-06 08:30:59 +02:00
Nicholas Mosier	66decb2e93	mem-ruby: Fix functional reads for MESI Three-Level messages (#1045 ) Fix #1044. This patch adds checks for message types (PUTX_COPY, DATA, DATA_EXCLUSIVE) that contain data blocks but were missing from the original `functionalRead` method in MESI Three-Level messages. Change-Id: I0cedc314166c9cc037bf20f5b7fef5552dd1253c	2024-04-25 11:14:37 -07:00
Ivana Mitrovic	42ffa52907	mem-ruby: Implement no_alloc Far Atomics in CHI (#994 ) This PR introduces a missing pice of far atomic implementation. This pull request incorporates several changes: - Enable 2-level and 4-level (and N-level) cache hierarchies, removing Atomic_NoWait transactions - Fix Unique Near policy implementation that raised abort - Add support for alloc_on_atomic == False. Enables Far Atomics on systems where the HNF does not allocate evicted lines at LLC (Like in WriteUpdate).	2024-04-18 11:35:47 -07:00
Matthew Poremba	1d64669473	mem,gpu-compute: Implement GPU TCC directed invalidate The GPU device currently supports large BAR which means that the driver can write directly to GPU memory over the PCI bus without using SDMA or PM4 packets. The gem5 PCI interface only provides an atomic interface for BAR reads/writes, which means the values cannot go through timing mode Ruby caches. This causes bugs as the TCC cache is allowed to keep clean data between kernels for performance reasons. If there is a BAR write directly to memory bypassing the cache, the value in the cache is stale and must be invalidated. In this commit a TCC invalidate is generated for all writes over PCI that go directly to GPU memory. This will also invalidate TCP along the way if necessary. This currently relies on the driver synchonization which only allows BAR writes in between kernels. Therefore, the cache should only be in I or V state. To handle a race condition between invalidates and launching the next kernel, the invalidates return a response and the GPU command processor will wait for all TCC invalidates to be complete before launching the next kernel. This fixes issues with stale data in nanoGPT and possibly PENNANT. Change-Id: I8e1290f842122682c271e5508a48037055bfbcdf	2024-04-10 11:35:25 -07:00
Matthew Poremba	833392e7b2	mem-ruby,gpu-compute: Allow memory reqs without inst The GPUDynInst for sending memory requests through the CUs data port is required but only used for DPRINTFs. Relax this constraint so that the methods can be reused for requests such as probes generated by the GPU device. Change-Id: I16094e400968225596370b684d6471580888d98a	2024-04-10 11:35:24 -07:00
Víctor Soria Pardos	98358da968	mem-ruby: Implement Atomic No Alloc Policy Add alternative implementation to far atomics when the flag alloc_on_commit is false. The implementation fetches the data, performs the atomic and writes back the cache line to main memory. Co-authored-by: Fabian Schätzle <f.schaetzle@fz-juelich.de> Change-Id: I8797fbc68448e1866a292f4afeedd3613113dddd	2024-04-06 18:51:11 +02:00
Víctor Soria Pardos	5a6a3be6da	mem-ruby: Fix policy_type condition in CHI Fix if-else condition in CHI-cache-actions to correctly support policy_type Present Near (2) Change-Id: Ib776d847a908a8ac7693c2d10405bc0c4a9d767d	2024-04-04 10:55:56 +02:00
Víctor Soria Pardos	7ee574b309	mem-ruby: Remove AtomicReturn_NoWait from CHI To make Atomic transaction recursive and enable 2-level config, remove AtomicReturn_NoWait and other level-dependent code GitHub Issue: https://github.com/gem5/gem5/issues/882 Change-Id: Iac468cdb8a3b5914c8f05c5cedde866ce85f359a	2024-04-04 10:54:42 +02:00
Minje Jun	ffd0680a2c	mem-ruby: Copyback UD_RU line when evicted in CHI protocol (#945 ) This is a followed up fix to #791 mem-ruby: Fix possible dirty line loss in CHI when ReadShared hit on UD line. UD_RU line may have stale data since the upstream could have updated the line, so its local cache line data is treated as invalid (dataValid=false). But when the line is evicted, it must be written back to downstream because the upstream may have the line in clean state (UC). This change fixes it by performing copy back the UD_RU line while keeping its dataValid as false. Example error case: - L3 was in UD_RSC and being evicted without back-invalidation. LLC (HN) was in RU state. - Because there's still upstream sharer, L3 sends WriteClean. - Because the data state was unique and dirty, L3 sends CBWrData_UD_PD. - LLC becomes UD_RU. - When the line is evicted from LLC (LocalHN_Eviction), the line is just dropped, causing the loss of the dirty copy Co-authored-by: Minje Jun <minje.jun@samsung.com>	2024-04-03 08:33:22 -07:00
Matt Sinclair	777ac91bb0	mem-ruby: Add categorization of bypassed atomics in TCC (#899 ) Adds categorization of bypassed atomics in TCC to the TBE as either return or no-return, which gets consumed in pa_performAtomic to determine if atomic logs should be stored. Reestablishes TCC bypassed atomics after #546. Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38	2024-02-28 14:26:09 -06:00
Daniel Kouchekinia	de615836f0	mem-ruby: Add categorization of bypassed atomics in TCC Adds categorization of bypassed atomics in TCC to the TBE as either return or no-return, which gets consumed in pa_performAtomic to determine if atomic logs should be stored. Reestablishes TCC bypassed atomics after #546. Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38	2024-02-27 23:12:45 -06:00
Daniel Kouchekinia	6374697a20	mem-ruby: Add missing transition for SLC writes to VIPER TCC Bypassed write though requests on invalid lines in the TCC should be written though to the directory. This transition was previously missing. Change-Id: I16b117c4e085ce6be0ed5297aa0129d52cd35a51	2024-02-26 13:13:06 -06:00
Ivana Mitrovic	61ee36eee6	mem-ruby: Fix possible dirty line loss in CHI when ReadShared hit on UD line (#791 ) In case ReadShared hit on a UD line and there's no sharers, this chage makes the downstream passes Dirty to the requestor whenever possible even though it doesn't deallocate the line. This will make the requestor to SD and the downstream to UD_RSD. In the previous implementation, loosely exclusive intermediate cache can cause loss of dirty data. Example error condition is as below. Configurations L2 cache: Roughly inclusive to L1 without back-invalidation - dealloc_on_* = false - dealloc_backinv_* = false L3 cache: Roughly exclusive to L2 without back-invalidation - alloc_on_readshared = tue - alloc_on_readunique = false - dealloc_on_shared = false - dealloc_on_unique = true - dealloc_backinv_* = false - is_HN = false LLC: Same clusivity as L3 except is_HN = true For all caches, allow_SD = true and fwd_unique_on_readshared = false Example problem sequence: 1. L1 sends ReadUnique then becomes UD. L2 is UC_RU. L3 and LLC are RU. 2. L1 evicts the line to L2 by WriteBackFull (UD_PD). L2 becomes UD. 3. L2 evicts the line to L3 using WriteBackFull (UD_PD). L3 becomes UD. 4. L1 reads the line with ReadShared which misses on L2. 5. L2 reads the line with ReadShared which hits on L3. L3 becomes UD_RSC because it doesn't deallocate the line (dataToBeInvalid=false) 6. L3 evicts the line to LLC by WriteCleanFull (UD_PD) because L3 doesn't back-invalidate and still has sharer. The local cache line is invalidated by Deallocate_CacheBlock. L3 becomes RUSC and LLC becomes UD_RU. 7. When UD_RU is evicted at LLC, the UD_RU line is dropped expecting the upstream to writeback, causing loss of dirty data	2024-02-26 10:06:17 -08:00
Vishnu Ramadas	690b2b9462	gpu-compute, mem-ruby: Add comments and reformat code Change-Id: Id2b3886dce347fdcfcad22009a42b92febc00a6c	2024-02-09 12:17:24 -06:00

1 2 3 4 5 ...

1212 Commits