Commit Graph

1212 Commits

Author SHA1 Message Date
Jason Lowe-Power
97542c1a4c mem-ruby,scons: Add scons option for multiple protocols
This change does many things, but they must all be atomically done.

**USER FACING CHANGE**: The Ruby protocols in Kconfig have changed names
(they are now the same case as the SLICC file names). So, after this
commit, your build configurations need to be updated. You can do so by
running `scons menuconfig <build dir>` and selecting the right ruby
options. Alternatively, if you're using a `build_opts` file, you can run
`scons defconfig build/<ISA> build_opts/<ISA>` which should update your
config correctly.

Detailed changes are described below.

Kconfig changes:

- Kconfig files in ruby now must all be declared in the ruby/Kconfig
  file
- All of the protocol names are changed to match their slicc file names
  including the case
- A new option is available called "Use multiple protocols" which should
  be selected if multiple protocols are selected. This is only used to
  set the PROTOCOL variable to "MULTIPLE" when in multiple mode.
- The PROTOCOL variable can now be "MULTIPLE" which means it will be
  ignored. If it's not "MULTIPLE" then it holds the "main" protocol,
  which is necessary for backwards compatibility with the Ruby.py files.

Ruby config changes:

To make this change backwards compatible with Ruby.py, this change adds
a new "protocol" config called MULTIPLE.py which is used to allow the
user to set a "--protocol" option on the command line. This is only
needed if you are using a gem5 binary with multiple protocols but need
to use Ruby.py.

stdlib changes:

- Make the coherence protocol file behave like the ISA file
- Add a function to get the coherence protocol from the `CacheHierarchy`
  like we do with the ISA in the `Processor`.
  - Use this function where `get_runtime_coherence_protocol` was used
- Update the requires code to work with the ne CoherenceProtocol
- Fix a typo in the AMD Hammer name and also add the missing MSI
  protocol

Scons changes:

- In Ruby we now gather up all of the protocols and build them all if
  there are multiple protocols
- There's some bending over backwards to tell the user if they are using
  an out of date gem5.build/config file and how to update it
- Note that multiple ruby protocols adds a significant amount of time to
  the build since we have to run slicc twice for each file.

build_opts:

- Update all files with new names
- Add a new NULL_All_Ruby that will be used for testing

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2024-11-19 11:00:34 -08:00
Jason Lowe-Power
9a904478eb mem-ruby: Use runtime protocol instead of #defines
This removes two #defines: PARTIAL_FUNC_READS and PROTOCOL_<protocol>.
Instead, update the code to use the runtime information about which
protocol we are using.

Change-Id: Icb6f10fc2d3fd59128c62f9f6e37b52ef2581b61
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2024-11-19 10:53:59 -08:00
Jason Lowe-Power
b7ce3040de mem-ruby: Add ProtocolInfo class
Add a ProtocolInfo class that is specialized (through inheritance) for
each protocol. This class currently has the protocol's name and any
protocol-specific options (partial_func_reads is the only one so far).
Note that the SLICC language has been updated so that you can specify
the options in the `protocol` statement in the `.slicc` file.

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2024-11-19 10:53:59 -08:00
Jason Lowe-Power
3ba16adeff scons: Change scons for multiple protocols in SLICC
This change is a step toward multiple protocols building at the same
time in scons. Add functions and use lists instead of single protocol.

Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2024-11-19 10:53:58 -08:00
Jason Lowe-Power
feb45c9cb9 mem-ruby: Move protocol files to subdir
Move all generated protocol-specific files to a subdirectory with the
protocol's name.

This change also updates SLICC to have separate variables for the
filename, c identifier and python identifier instead of just using
variations of the c identifier.

Change-Id: I62f69a4606b030ee23cb2d96493f3257a6923748
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2024-11-19 10:53:58 -08:00
Jason Lowe-Power
3a4465d908 mem-ruby: Use namespaces for protocol types
Wrap all protocol-specific types in `namespace <protocol>`. This will
facilitate compiling multiple protocols into one binary.

There is a one-time hack to the generated `MachineType.cc` file to use
the namespace for the protocol until we generalize the machine types.

Change-Id: I5947e8ac69afe6f7ed257d7c5980ad65e9338acf
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2024-11-19 10:53:58 -08:00
Jason Lowe-Power
1b84fbbeae mem-ruby: Use shared and per-protocol SLICC files
This changes extends SLICC to understand two different kinds of slicc
files: files that are protocol-specific and files that are shared or
included between different protocols.

Each declaration in SLICC can now be shared or not. If it is shared,
then we can take a different action in the code generation (e.g., wrap
in a namespace).

*Developer facing change*
Removes the RubySlicc_interfaces.slicc file from the SLICC includes of
every protocol.

Changes required: If you have a custom protocol, you will need to remove
the line `include "RubySlicc_interfaces.slicc" from your .slicc file.

Change-Id: Ia6c2dafe2b8fe86749a13d17daa885bddd166855
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2024-11-19 10:53:47 -08:00
aperais
b82ab5ac89 misc: Do not share the random number generator across components (#1534)
Component that require randomness should not share their randomness
source with other components to avoid simulation noise. For instance,
the branch predictor of one core should not impact the random
cache replacement policy of the cache of another core. This currently
happens as all components share a single random number generator.
    
This PR provides their own generators to relevant components, although
a couple components still use rand().
    
Change-Id: I3fb7226111c9194ee457af0f0f2b83f8c7b69d1e

Co-authored-by: Arthur Perais <arthur.perais@univ-grenoble-alpes.fr>
2024-11-18 01:37:12 -08:00
Marleson Graf
c31bc284a8 mem-ruby,sim-se: Fix functional reads for MESI protocols
This commit fixes three issues in MESI_Three_Level and MESI_Two_Level
implementations (MEI_Three_Level_HTM might still have issues).

1) Define functional read priorities for the cache controllers which
have states with Maybe_Stale access permission (L1 > L2 > Directory).

2) Fix incorrect access permissions in MESI_Three_Level-L1cache:
* S_IL0 is Read_Only, it is waiting for L0 to acknowledge the
  invalidation request before moving to SS, also a Read_Only state.
* E_IL0 is Maybe_Stale, its contents might be valid, since there is a
  transition (E_IL0, L0_Ack, EE) with no writeback data.
* M_IL0 is Maybe_Stale, its contents might be valid, since there is a
  transition (M_IL0, L0_Ack, MM) with no writeback data.

3) Add missing message types carrying valid data in functional reads:
* INV_DATA is a writeback from L0 to L1.
* DATA is a response to GET_S, but there are scenarios where it might
  be the only place with valid data (e.g. during L2 replacement).

Change-Id: Ie44fa317027f9ede272967e7461d337e14355eec
2024-11-18 00:22:45 -08:00
Marleson Graf
63d110fb7a mem-ruby,sim-se: Support Maybe_Stale in functional reads
Functional reads can be satisfied by one of the following, in order:
1. Main memory (when the data is not present in the cache hierarchy);
2. Valid data block in cache;
3. Valid data block in coherence message;
4. Valid data block marked as Maybe_Stale;

Number 4 is not handled by the current implementation. A Maybe_Stale
block can be either truly stale or actually valid. When it is stale,
the memory read will be satisfied by either number 2 or number 3. When
it is valid, there will be no coherence message with valid data inside,
and the Maybe_Stale block will transition to a valid state after
receiving some kind of acknowledgement.

The main challenge to handle number 4 is how to know from which
Maybe_Stale block the data should be read from. For instance, in a two
level cache hierarchy, we might have a block marked as Maybe_Stale in
both L1 and L2. In this case, we should prioritize the cache controller
that is closest to the CPU. To define this priority, a new virtual
function 'functionalReadPriority' was added to the AbstractController
class.

Change-Id: I4774cd01aab7bb9ca53694cd9dc4f9416a8e4025
2024-11-18 00:22:36 -08:00
Vishnu Ramadas
d463868f28 dev-amdgpu, gpu-compute, mem-ruby: Add support for writeback L2 in GPU (#1692)
Previously, GPU L2 caches could be configured in either writeback or
writethrough mode when used in an APU. However, in a CPU+dGPU system,
only writethrough worked. This is mainly because in CPU+dGPU system, the
CPU sends either PCI or SDMA requests to transfer data from the GPU
memory to CPU. When L2 cache is configured to be writeback, the dirty
data resides in L2 when CPU transfers data from GPU memory. This leads
to the wrong version being transferred. A similar issue also crops up
when the GPU command processor reads kernel information before kernel
dispatch, only to incorrect data. This PR contains a set of commits that
fix both these issues.
2024-11-05 10:45:46 -08:00
Matthew Poremba
2ed724b670 mem-ruby: Fix two NetDest locals using default constructor (#1746)
Two NetDest locally declared variables are using default constructor
instead of constructor with RubySystem pointer. This will cause asserts
when (1) garnet is used or (2) a protocol that uses `broadcast()` is
built.

Fix these two by passing the appropriate RubySystem pointers.
2024-11-02 08:37:04 -07:00
Bobby R. Bruce
d8e7c91127 mem-ruby: Remove unused variables/mark [maybe unused] (#1650)
PR gem5#1453 left some unused variables in the ruby code that triggered
"unused variable" warnings found comiling ALL/gem5.opt to use the CHI
protocol. These have been removed.
2024-10-29 14:31:20 -07:00
Marleson Graf
7bddc764cc mem-ruby: Prevent LL/SC livelock in MESI protocols (#1384) (#1399)
Fix #1384.

MESI_Two_Level and MESI_Three_Level protocols are susceptible to LL/SC
livelocks when simulating boards with high core count.

This fix is based on MOESI_CMP_directory's implementation of locked
states, but tailors the solution to only apply it when a Load-Linked is
initiated.

There are two new states to act as locked states and stall any messages
leading to eviction:
* LLSC_E: equivalent to E state, go to E after timeout.
* LLSC_M: equivalent to M state, go to M after timeout.

The main new event is Load_Linked, which is very similar (in behavior)
to a Store, reusing several transient states. When a controller receives
the exclusive data, it differentiates a Load_Linked from a Store by
checking a new field added to the TBE: 'isLoadLinked'. It triggers a
different event when it is a Load_Linked, which in turn causes the
transition to one of the locked states.

The entire mechanism can be turned off by setting 'use_llsc_lock' to
false, and the amount of time to keep locked is defined by
'llsc_lock_timeout_latency'.

Change-Id: I13f415b6b7890d51d01f23001047d2363467a814
2024-10-28 09:57:10 -07:00
Matthew Poremba
16217f843f mem-ruby: Fix issues in protocols due to multi-RubySystem (#1690)
Starting with https://github.com/gem5/gem5/pull/1453 , some Ruby
structures require a block size be set
and other require a pointer to the Ruby system. This fixes some cases
which were not covered by the per-checkin tests but seen in daily+
tests. In particular:

 - WriteMasks and PerfectCacheMemory must explicitly set a block size.
 - NetDest and RubyProxyPort require RubySystem pointer.
 - Classes inheriting Message now have a setRubySystem collecting all
   objects that need a RubySystem pointer and this should be called in
   the constructor of the Message.

This commit makes sure all of these happen. This should fix daily
arm_boot_tests and daily learning_gem5 tests.
2024-10-21 12:30:03 -07:00
Bobby R. Bruce
db47d20371 mem-ruby,misc: Remove redundant assignment (#1685)
This caused a warning to be thrown in Clang 19.
2024-10-20 13:02:53 -07:00
Matthew Poremba
4f7b3ed827 mem-ruby: Remove static methods from RubySystem (#1453)
There are several parts to this PR to work towards #1349 .

(1) Make RubySystem::getBlockSizeBytes non-static by providing ways to
access the block size or passing the block size explicitly to classes.

The main changes are:
 - DataBlocks must be explicitly allocated. A default ctor still exists
   to avoid needing to heavily modify SLICC. The size can be set using a
   realloc function, operator=, or copy ctor. This is handled completely
   transparently meaning no protocol or config changes are required.
 - WriteMask now requires block size to be set. This is also handled
   transparently by modifying the SLICC parser to identify WriteMask
   types and call setBlockSize().
 - AbstractCacheEntry and TBE classes now require block size to be set.
   This is handled transparently by modifying the SLICC parser to
   identify these classes and call initBlockSize() which calls
   setBlockSize() for any DataBlock or WriteMask.
 - All AbstractControllers now have a pointer to RubySystem. This is
   assigned in SLICC generated code and requires no changes to protocol
   or configs.
 - The Ruby Message class now requires block size in all constructors.
   This is added to the argument list automatically by the SLICC parser.
   
(2) Relax dependence on common functions in
src/mem/ruby/common/Address.hh
so that RubySystem::getBlockSizeBits is no longer static. Many classes
already have a way to get block size from the previous commit, so they
simply multiple by 8 to get the number of bits. For handling SLICC and
reducing the number of changes, define makeCacheLine, getOffset, etc. in
RubyPort and AbstractController. The only protocol changes required are
to change any "RubySystem::foo()" calls with "m_ruby_system->foo()".

For classes which do not have a way to get access to block size but
still used makeLineAddress, getOffset, etc., the block size must be
passed to that class. This requires some changes to the SimObject
interface for two commonly used classes: DirectoryMemory and
RubyPrefecther, resulting in user-facing API changes

User-facing API changes:
 - DirectoryMemory and RubyPrefetcher now require the cache line size as
   a non-optional argument.
 - RubySequencer SimObjects now require RubySystem as a non-optional
   argument.
 - TesterThread in the GPU ruby tester now requires the cache line size
   as a non-optional argument.

(3) Removes static member variables in RubySystem which control
randomization, cooldown, and warmup. These are mostly used by the Ruby
Network. The network classes are modified to take these former static
variables as parameters which are passed to the corresponding method
(e.g., enqueue, delayHead, etc.) rather than needing a RubySystem object
at all.

Change-Id: Ia63c2ad5cf0bf9d1cbdffba5d3a679bb4d3b1220

(4) There are two major SLICC generated static methods:
getNumControllers()
on each cache controller which returns the number of controllers created
by the configs at run time and the functions which access this method,
which are MachineType_base_count and MachineType_base_number. These need
to be removed to create multiple RubySystem objects otherwise NetDest,
version value, and other objects are incorrect.

To remove the static requirement, MachineType_base_count and
MachineType_base_number are moved to RubySystem. Any class which needs
to call these methods must now have a pointer to a RubySystem. To enable
that, several changes are made:
 - RubyRequest and Message now require a RubySystem pointer in the
   constructor. The pointer is passed to fields in the Message class
   which require a RubySystem pointer (e.g., NetDest). SLICC is modified
   to do this automatically.
 - SLICC structures may now optionally take an "implicit constructor"
   which can be used to call a non-default constructor for locally
   defined variables (e.g., temporary variables within SLICC actions). A
   statement such as "NetDest bcast_dest;" in SLICC will implicitly
   append a call to the NetDest constructor taking RubySystem, for
   example.
 - RubySystem gets passed to Ruby network objects (Network, Topology).
2024-10-08 08:14:50 -07:00
Erin (Jianghua) Le
c10feed524 tests, configs, util, mem, python, systemc: Change base 10 units to base 2 (#1605)
This commit changes metric units (e.g. kB, MB, and GB) to binary units
(KiB, MiB, GiB) in various files. This PR covers files that were missed
by a previous PR that also made these changes.
2024-10-01 11:18:05 -07:00
Jarvis Jia
9dfd66aca4 mem-ruby: Fix replacement policy in GPU_VIPER
The current GPU_VIPER protocol's TCC cache update the MRU information
twice with calling a_allocateBlock and ut_updateTag which affectgs the
LIP and RRIP replacement polies. Remove ut_updateTag fixes the LIP and
RRIP replacement polies.

Change-Id: I79ad9392593e00425a7fe8828048465b2c2c2e1f
2024-09-14 23:22:22 -05:00
Marco Kurzynski
a8447b7fc0 arch-vega: Pass s_memtime through smem pipe (#1350)
The Vega ISA's s_memtime instruction is used to obtain a cycle value
from the GPU. Previously, this was implemented to obtain the cycle count
when the memtime instruction reached the execute stage of the GPU
pipeline. However, from microbenchmarking we have found that this under
reports the latency for memtime instructions relative to real hardware.
Thus, we changed its behavior to go through the scalar memory pipeline
and obtain a latency value from the the SQC (L1 I$). This mirrors the
suggestion of the AMD Vega ISA manual that s_memtime should be treated
like a s_load_dwordx2.

The default latency was set based on microbenchmarking.

Change-Id: I5e251dde28c06fe1c492aea4abf9f34f05784420
2024-08-26 19:47:04 -07:00
Marleson Graf
b8001a861b mem-ruby,sim-se: Clear LL/SC locks after functional writes (#1404)
Functional writes atomically update all copies of a data block, so they
should invalidate any pending LL/SC locks, just like a conventional
write would.

Change-Id: Ic79d2d8d24901f1b6a2ce81dc0e2decc84c0ebbc
2024-08-09 09:30:37 -07:00
Jarvis Jia
341c72839b Fix hit issue
Change-Id: I28745489de693591d5ad8453b035a8c782adaf1f
2024-06-24 11:19:51 -07:00
Jarvis Jia
21b69975a6 Fix compilation error
Change-Id: I8273472b8d0cff8c02f2d1e1a9d66599af7c4866
2024-06-24 11:19:51 -07:00
Jarvis Jia
e957a882ed gpu-compute,mem-ruby: Add RubyHitMiss flag for TCP and TCC cache
Add hit and miss print for TCP and TCC cache with RubyHitMiss debug flag

Change-Id: I40ae3449020b917f39ac91d29fa4e1dd7c791e7b
2024-06-24 11:19:51 -07:00
Bobby R. Bruce
3138c8a8b1 gpu-compute,mem-ruby: Revert "Add RubyHitMiss flag for TCP and TCC cache" (#1254)
Reverts gem5/gem5#1226
2024-06-18 07:58:54 -07:00
hahaxxz
fef6a97f93 mem-ruby: This commit fixes MI_example protocol (#1236)
fix two bugs in MI_example-dir.sm:
1. Directory cannot handle DMA_READ & DMA_WRITE events in M_DRDI state.
2. Directory cannot handle PUTX_NotOwner events in {M_DWR, M_DRD,
M_DRDI, M_DWRI} state.

Github Issue: https://github.com/gem5/gem5/issues/1210

Change-Id: I52a9d674ce0688dcfbbcc2b583f17de95afdeb87
2024-06-17 12:45:11 -07:00
Jarvis Jia
3a2bf47d57 Add default value and change Ruby address format specifier
Change-Id: I8fbaf34745e90589e610d3b9bd423937e7ebdc3d
2024-06-17 03:27:25 -05:00
Jarvis Jia
87c0d7732c Merge branch 'develop' into rubyhitmiss 2024-06-12 17:30:35 -04:00
Jarvis Jia
edfc139c40 Change black format
Change-Id: I3733b31baf187e0d3d38d971d9423a1b1afe2296

gpu-compute: add GPU RubyHitMiss for TCP and TCC

Change-Id: I4430532b901811e03d9b077b61e2eca4557b34e1

gpu-compute: Add RubyHitMiss flag for TCP and TCC cache

Change-Id: I4e5d1127c84b9eb1060ec9ba0b6638267449eda5

gpu-compute: Add RubyHitMiss flag for TCP and TCC cache

Change-Id: I4e5d1127c84b9eb1060ec9ba0b6638267449eda5

Remove space

Change-Id: I401f528c6f128ba0956bdbc232e8f2ae37bf648c
2024-06-12 16:04:36 -05:00
Matthew Poremba
be0a7937c1 mem-ruby: Fix deadlock in GPU_VIPER when issuing atomic requests (#1216)
When a compute unit issues several requests to the same line,
the requests wait in the L2 if it is a writeback cache. If the line is
invalid initially and the first request is atomic in nature, the L2
cache issues a request to main memory. On data return, the cache line
transitions to M but doesn't wake up the other requests, resulting in
a deadlock. This commit adds a wakeup call on data return for atomics
and fixes potential deadlocks.
2024-06-12 10:10:32 -07:00
Vishnu Ramadas
42b9a9666e mem-ruby: Add instSeqNum to atomic responses from GPU L2 caches
This commit adds instSeqNum to the atomic responses in
GPU_VIPER-TCC.sm. This will be useful when debugging issues related to
GPU atomic transactions

Change-Id: Ic05c8e1a1cb230abfca2759b51e5603304aadaa3
2024-06-11 20:35:43 -05:00
Vishnu Ramadas
943d1f1453 mem-ruby: Fix deadlock in GPU_VIPER when issuing atomic requests
When a compute unit issues several requests to the same line,
the requests wait in the L2 if it is a writeback cache. If the line is
invalid initially and the first request is atomic in nature, the L2
cache issues a request to main memory. On data return, the cache line
transitions to M but doesn't wake up the other requests, resulting in
a deadlock. This commit adds a wakeup call on data return for atomics
and fixes potential deadlocks.

Change-Id: I8200ce6e77da7c8b4db285c0cc8b8ca0dfa7d720
2024-06-11 20:33:46 -05:00
NSurawar
efbfdeabd7 mem-ruby: Reduce handshaking between CorePair and dir (#1117)
Currently when data is downgraded by MOESI_AMD_Base-CorePair (e.g. due
to a replacement) this requires a 4-way handshake between the CorePair
and the dir. Specifically, the CorePair send a message telling the dir
it'd like to downgrade then, the dir sends an ACK back and then, the
CorePair writes the data back, and finally, the dir ACKs the writeback.
This is very inefficient and not representative of how modern protocols
downgrade a request. Accordingly, this commits updates the downgrade
support such that the CorePair writes back the data immediately and then
the dir ACKs it.
Thus, this approach requires only a 2-way handshake.

Change-Id: I7ebc85bb03e8ce46a8847e3240fc170120e9fcd6

Co-authored-by: Neeraj Surawar <neerajs@hyrule.cs.wisc.edu>
2024-05-30 09:36:29 -07:00
Matthew Poremba
e82cf20150 mem-ruby: Remove VIPER StoreThrough temp cache storage (#1156)
StoreThrough in VIPER when the TCP is disabled, GLC bit is set, or SLC
bit is set will bypass the TCP, but will temporarily allocate a cache
entry seemingly to handle write coalescing with valid blocks. It does
not attempt to evict a block if the set is full and the address is
invalid. This causes a panic if the set is full as there is no spare
cache entry to use temporarily to use for DataBlk manipulation. However,
a cache block is not required for this.

This commit removes using a cache block for StoreThrough with invalid
blocks as there is no existing data to coalesce with. It creates no
allocate variants of the actions needed in StoreThrough and pulls the
DataBlk information from the in_msg instead. Non-invalid blocks do not
have this panic as they have a cache entry already.

Fixes issues with StoreThroughs on more aggressive architectures like
MI300.

Change-Id: Id8687eccb991e967bb5292068cbe7686e0930d7d
2024-05-28 11:02:00 -07:00
Ivana Mitrovic
233135da81 mem-ruby: Fix NullPointerException in RubyRequest (#1118)
This PR includes a check for `m_pkt` being null and appropriately
handles that case. This issue was causing the Daily tests to fail.

Change-Id: I87142ca14ca4ab3d8306153a1cf34c2629a119ba
2024-05-09 08:46:13 -07:00
Giacomo Travaglini
0df5635bdf mem-ruby: Implement NS bit for CHI transactions (#1100)
This patch is adding the NS bit to CHI requests to make sure they are
properly tagged according to their security


Change-Id: I33d3610edefbb5a05a6090e9125c35d4fb8bca58
Reviewed-by: Tiago Muck <tiago.muck@arm.com>

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-05-08 07:46:50 +02:00
Giacomo Travaglini
36c1ea9c61 mem-ruby: Implement MakeReadUnique in CHI (#1101)
Change-Id: I64cd3c62804cca184d68287fc099534e9205f2b8
Reviewed-by: Tiago Muck <tiago.muck@arm.com>

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-05-06 08:30:59 +02:00
Nicholas Mosier
66decb2e93 mem-ruby: Fix functional reads for MESI Three-Level messages (#1045)
Fix #1044. This patch adds checks for message types (PUTX_COPY, DATA,
DATA_EXCLUSIVE) that contain data blocks but were missing from the
original `functionalRead` method in MESI Three-Level messages.

Change-Id: I0cedc314166c9cc037bf20f5b7fef5552dd1253c
2024-04-25 11:14:37 -07:00
Ivana Mitrovic
42ffa52907 mem-ruby: Implement no_alloc Far Atomics in CHI (#994)
This PR introduces a missing pice of far atomic implementation. This
pull request incorporates several changes:

- Enable 2-level and 4-level (and N-level) cache hierarchies, removing
Atomic_NoWait transactions
- Fix Unique Near policy implementation that raised abort
- Add support for alloc_on_atomic == False. Enables Far Atomics on
systems where the HNF does not allocate evicted lines at LLC (Like in
WriteUpdate).
2024-04-18 11:35:47 -07:00
Matthew Poremba
1d64669473 mem,gpu-compute: Implement GPU TCC directed invalidate
The GPU device currently supports large BAR which means that the driver
can write directly to GPU memory over the PCI bus without using SDMA or
PM4 packets. The gem5 PCI interface only provides an atomic interface
for BAR reads/writes, which means the values cannot go through timing
mode Ruby caches. This causes bugs as the TCC cache is allowed to keep
clean data between kernels for performance reasons. If there is a BAR
write directly to memory bypassing the cache, the value in the cache is
stale and must be invalidated.

In this commit a TCC invalidate is generated for all writes over PCI
that go directly to GPU memory. This will also invalidate TCP along the
way if necessary. This currently relies on the driver synchonization
which only allows BAR writes in between kernels. Therefore, the cache
should only be in I or V state.

To handle a race condition between invalidates and launching the next
kernel, the invalidates return a response and the GPU command processor
will wait for all TCC invalidates to be complete before launching the
next kernel.

This fixes issues with stale data in nanoGPT and possibly PENNANT.

Change-Id: I8e1290f842122682c271e5508a48037055bfbcdf
2024-04-10 11:35:25 -07:00
Matthew Poremba
833392e7b2 mem-ruby,gpu-compute: Allow memory reqs without inst
The GPUDynInst for sending memory requests through the CUs data port
is required but only used for DPRINTFs. Relax this constraint so that
the methods can be reused for requests such as probes generated by the
GPU device.

Change-Id: I16094e400968225596370b684d6471580888d98a
2024-04-10 11:35:24 -07:00
Víctor Soria Pardos
98358da968 mem-ruby: Implement Atomic No Alloc Policy
Add alternative implementation to far atomics when the flag alloc_on_commit
is false. The implementation fetches the data, performs the atomic and
writes back the cache line to main memory.

Co-authored-by: Fabian Schätzle <f.schaetzle@fz-juelich.de>
Change-Id: I8797fbc68448e1866a292f4afeedd3613113dddd
2024-04-06 18:51:11 +02:00
Víctor Soria Pardos
5a6a3be6da mem-ruby: Fix policy_type condition in CHI
Fix if-else condition in CHI-cache-actions to correctly
support policy_type Present Near (2)

Change-Id: Ib776d847a908a8ac7693c2d10405bc0c4a9d767d
2024-04-04 10:55:56 +02:00
Víctor Soria Pardos
7ee574b309 mem-ruby: Remove AtomicReturn_NoWait from CHI
To make Atomic transaction recursive and enable 2-level config,
remove AtomicReturn_NoWait and other level-dependent code

GitHub Issue: https://github.com/gem5/gem5/issues/882

Change-Id: Iac468cdb8a3b5914c8f05c5cedde866ce85f359a
2024-04-04 10:54:42 +02:00
Minje Jun
ffd0680a2c mem-ruby: Copyback UD_RU line when evicted in CHI protocol (#945)
This is a followed up fix to #791 mem-ruby: Fix possible dirty line loss
in CHI when ReadShared hit on UD line.
UD_RU line may have stale data since the upstream could have updated the
line, so its local cache line data is treated as invalid
(dataValid=false). But when the line is evicted, it must be written back
to downstream because the upstream may have the line in clean state
(UC). This change fixes it by performing copy back the UD_RU line while
keeping its dataValid as false.

Example error case:
- L3 was in UD_RSC and being evicted without back-invalidation. LLC (HN)
was in RU state.
- Because there's still upstream sharer, L3 sends WriteClean.
- Because the data state was unique and dirty, L3 sends CBWrData_UD_PD.
- LLC becomes UD_RU.
- When the line is evicted from LLC (LocalHN_Eviction), the line is just
dropped, causing the loss of the dirty copy

Co-authored-by: Minje Jun <minje.jun@samsung.com>
2024-04-03 08:33:22 -07:00
Matt Sinclair
777ac91bb0 mem-ruby: Add categorization of bypassed atomics in TCC (#899)
Adds categorization of bypassed atomics in TCC to the TBE as either
return or no-return, which gets consumed in pa_performAtomic to
determine if atomic logs should be stored.

Reestablishes TCC bypassed atomics after #546.

Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38
2024-02-28 14:26:09 -06:00
Daniel Kouchekinia
de615836f0 mem-ruby: Add categorization of bypassed atomics in TCC
Adds categorization of bypassed atomics in TCC to the TBE as either return
or no-return, which gets consumed in pa_performAtomic to determine if
atomic logs should be stored.

Reestablishes TCC bypassed atomics after #546.

Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38
2024-02-27 23:12:45 -06:00
Daniel Kouchekinia
6374697a20 mem-ruby: Add missing transition for SLC writes to VIPER TCC
Bypassed write though requests on invalid lines in the TCC should be
written though to the directory. This transition was previously
missing.

Change-Id: I16b117c4e085ce6be0ed5297aa0129d52cd35a51
2024-02-26 13:13:06 -06:00
Ivana Mitrovic
61ee36eee6 mem-ruby: Fix possible dirty line loss in CHI when ReadShared hit on UD line (#791)
In case ReadShared hit on a UD line and there's no sharers, this chage
makes the downstream passes Dirty to the requestor whenever possible
even though it doesn't deallocate the line. This will make the requestor
to SD and the downstream to UD_RSD.
In the previous implementation, loosely exclusive intermediate cache can
cause loss of dirty data. Example error condition is as below.
   
Configurations
L2 cache: Roughly inclusive to L1 without back-invalidation
- dealloc_on_* = false
- dealloc_backinv_* = false
L3 cache: Roughly exclusive to L2 without back-invalidation
- alloc_on_readshared = tue
- alloc_on_readunique = false
- dealloc_on_shared = false
- dealloc_on_unique = true
- dealloc_backinv_* = false
- is_HN = false
LLC: Same clusivity as L3 except is_HN = true
For all caches, allow_SD = true and fwd_unique_on_readshared = false
    
Example problem sequence:
1. L1 sends ReadUnique then becomes UD. L2 is UC_RU. L3 and LLC are RU.
2. L1 evicts the line to L2 by WriteBackFull (UD_PD). L2 becomes UD.
3. L2 evicts the line to L3 using WriteBackFull (UD_PD). L3 becomes UD.
4. L1 reads the line with ReadShared which misses on L2.
5. L2 reads the line with ReadShared which hits on L3. L3 becomes UD_RSC
because it doesn't deallocate the line (dataToBeInvalid=false)
6. L3 evicts the line to LLC by WriteCleanFull (UD_PD) because L3
doesn't back-invalidate and still has sharer. The local cache line is
invalidated by Deallocate_CacheBlock. L3 becomes RUSC and LLC becomes
UD_RU.
7. When UD_RU is evicted at LLC, the UD_RU line is dropped expecting the
upstream to writeback, causing loss of dirty data
2024-02-26 10:06:17 -08:00
Vishnu Ramadas
690b2b9462 gpu-compute, mem-ruby: Add comments and reformat code
Change-Id: Id2b3886dce347fdcfcad22009a42b92febc00a6c
2024-02-09 12:17:24 -06:00