Starting with https://github.com/gem5/gem5/pull/1453 , some Ruby
structures require a block size be set
and other require a pointer to the Ruby system. This fixes some cases
which were not covered by the per-checkin tests but seen in daily+
tests. In particular:
- WriteMasks and PerfectCacheMemory must explicitly set a block size.
- NetDest and RubyProxyPort require RubySystem pointer.
- Classes inheriting Message now have a setRubySystem collecting all
objects that need a RubySystem pointer and this should be called in
the constructor of the Message.
This commit makes sure all of these happen. This should fix daily
arm_boot_tests and daily learning_gem5 tests.
There are several parts to this PR to work towards #1349 .
(1) Make RubySystem::getBlockSizeBytes non-static by providing ways to
access the block size or passing the block size explicitly to classes.
The main changes are:
- DataBlocks must be explicitly allocated. A default ctor still exists
to avoid needing to heavily modify SLICC. The size can be set using a
realloc function, operator=, or copy ctor. This is handled completely
transparently meaning no protocol or config changes are required.
- WriteMask now requires block size to be set. This is also handled
transparently by modifying the SLICC parser to identify WriteMask
types and call setBlockSize().
- AbstractCacheEntry and TBE classes now require block size to be set.
This is handled transparently by modifying the SLICC parser to
identify these classes and call initBlockSize() which calls
setBlockSize() for any DataBlock or WriteMask.
- All AbstractControllers now have a pointer to RubySystem. This is
assigned in SLICC generated code and requires no changes to protocol
or configs.
- The Ruby Message class now requires block size in all constructors.
This is added to the argument list automatically by the SLICC parser.
(2) Relax dependence on common functions in
src/mem/ruby/common/Address.hh
so that RubySystem::getBlockSizeBits is no longer static. Many classes
already have a way to get block size from the previous commit, so they
simply multiple by 8 to get the number of bits. For handling SLICC and
reducing the number of changes, define makeCacheLine, getOffset, etc. in
RubyPort and AbstractController. The only protocol changes required are
to change any "RubySystem::foo()" calls with "m_ruby_system->foo()".
For classes which do not have a way to get access to block size but
still used makeLineAddress, getOffset, etc., the block size must be
passed to that class. This requires some changes to the SimObject
interface for two commonly used classes: DirectoryMemory and
RubyPrefecther, resulting in user-facing API changes
User-facing API changes:
- DirectoryMemory and RubyPrefetcher now require the cache line size as
a non-optional argument.
- RubySequencer SimObjects now require RubySystem as a non-optional
argument.
- TesterThread in the GPU ruby tester now requires the cache line size
as a non-optional argument.
(3) Removes static member variables in RubySystem which control
randomization, cooldown, and warmup. These are mostly used by the Ruby
Network. The network classes are modified to take these former static
variables as parameters which are passed to the corresponding method
(e.g., enqueue, delayHead, etc.) rather than needing a RubySystem object
at all.
Change-Id: Ia63c2ad5cf0bf9d1cbdffba5d3a679bb4d3b1220
(4) There are two major SLICC generated static methods:
getNumControllers()
on each cache controller which returns the number of controllers created
by the configs at run time and the functions which access this method,
which are MachineType_base_count and MachineType_base_number. These need
to be removed to create multiple RubySystem objects otherwise NetDest,
version value, and other objects are incorrect.
To remove the static requirement, MachineType_base_count and
MachineType_base_number are moved to RubySystem. Any class which needs
to call these methods must now have a pointer to a RubySystem. To enable
that, several changes are made:
- RubyRequest and Message now require a RubySystem pointer in the
constructor. The pointer is passed to fields in the Message class
which require a RubySystem pointer (e.g., NetDest). SLICC is modified
to do this automatically.
- SLICC structures may now optionally take an "implicit constructor"
which can be used to call a non-default constructor for locally
defined variables (e.g., temporary variables within SLICC actions). A
statement such as "NetDest bcast_dest;" in SLICC will implicitly
append a call to the NetDest constructor taking RubySystem, for
example.
- RubySystem gets passed to Ruby network objects (Network, Topology).
This commit changes metric units (e.g. kB, MB, and GB) to binary units
(KiB, MiB, GiB) in various files. This PR covers files that were missed
by a previous PR that also made these changes.
The current GPU_VIPER protocol's TCC cache update the MRU information
twice with calling a_allocateBlock and ut_updateTag which affectgs the
LIP and RRIP replacement polies. Remove ut_updateTag fixes the LIP and
RRIP replacement polies.
Change-Id: I79ad9392593e00425a7fe8828048465b2c2c2e1f
The Vega ISA's s_memtime instruction is used to obtain a cycle value
from the GPU. Previously, this was implemented to obtain the cycle count
when the memtime instruction reached the execute stage of the GPU
pipeline. However, from microbenchmarking we have found that this under
reports the latency for memtime instructions relative to real hardware.
Thus, we changed its behavior to go through the scalar memory pipeline
and obtain a latency value from the the SQC (L1 I$). This mirrors the
suggestion of the AMD Vega ISA manual that s_memtime should be treated
like a s_load_dwordx2.
The default latency was set based on microbenchmarking.
Change-Id: I5e251dde28c06fe1c492aea4abf9f34f05784420
Functional writes atomically update all copies of a data block, so they
should invalidate any pending LL/SC locks, just like a conventional
write would.
Change-Id: Ic79d2d8d24901f1b6a2ce81dc0e2decc84c0ebbc
Change-Id: I3733b31baf187e0d3d38d971d9423a1b1afe2296
gpu-compute: add GPU RubyHitMiss for TCP and TCC
Change-Id: I4430532b901811e03d9b077b61e2eca4557b34e1
gpu-compute: Add RubyHitMiss flag for TCP and TCC cache
Change-Id: I4e5d1127c84b9eb1060ec9ba0b6638267449eda5
gpu-compute: Add RubyHitMiss flag for TCP and TCC cache
Change-Id: I4e5d1127c84b9eb1060ec9ba0b6638267449eda5
Remove space
Change-Id: I401f528c6f128ba0956bdbc232e8f2ae37bf648c
When a compute unit issues several requests to the same line,
the requests wait in the L2 if it is a writeback cache. If the line is
invalid initially and the first request is atomic in nature, the L2
cache issues a request to main memory. On data return, the cache line
transitions to M but doesn't wake up the other requests, resulting in
a deadlock. This commit adds a wakeup call on data return for atomics
and fixes potential deadlocks.
This commit adds instSeqNum to the atomic responses in
GPU_VIPER-TCC.sm. This will be useful when debugging issues related to
GPU atomic transactions
Change-Id: Ic05c8e1a1cb230abfca2759b51e5603304aadaa3
When a compute unit issues several requests to the same line,
the requests wait in the L2 if it is a writeback cache. If the line is
invalid initially and the first request is atomic in nature, the L2
cache issues a request to main memory. On data return, the cache line
transitions to M but doesn't wake up the other requests, resulting in
a deadlock. This commit adds a wakeup call on data return for atomics
and fixes potential deadlocks.
Change-Id: I8200ce6e77da7c8b4db285c0cc8b8ca0dfa7d720
Currently when data is downgraded by MOESI_AMD_Base-CorePair (e.g. due
to a replacement) this requires a 4-way handshake between the CorePair
and the dir. Specifically, the CorePair send a message telling the dir
it'd like to downgrade then, the dir sends an ACK back and then, the
CorePair writes the data back, and finally, the dir ACKs the writeback.
This is very inefficient and not representative of how modern protocols
downgrade a request. Accordingly, this commits updates the downgrade
support such that the CorePair writes back the data immediately and then
the dir ACKs it.
Thus, this approach requires only a 2-way handshake.
Change-Id: I7ebc85bb03e8ce46a8847e3240fc170120e9fcd6
Co-authored-by: Neeraj Surawar <neerajs@hyrule.cs.wisc.edu>
StoreThrough in VIPER when the TCP is disabled, GLC bit is set, or SLC
bit is set will bypass the TCP, but will temporarily allocate a cache
entry seemingly to handle write coalescing with valid blocks. It does
not attempt to evict a block if the set is full and the address is
invalid. This causes a panic if the set is full as there is no spare
cache entry to use temporarily to use for DataBlk manipulation. However,
a cache block is not required for this.
This commit removes using a cache block for StoreThrough with invalid
blocks as there is no existing data to coalesce with. It creates no
allocate variants of the actions needed in StoreThrough and pulls the
DataBlk information from the in_msg instead. Non-invalid blocks do not
have this panic as they have a cache entry already.
Fixes issues with StoreThroughs on more aggressive architectures like
MI300.
Change-Id: Id8687eccb991e967bb5292068cbe7686e0930d7d
This PR includes a check for `m_pkt` being null and appropriately
handles that case. This issue was causing the Daily tests to fail.
Change-Id: I87142ca14ca4ab3d8306153a1cf34c2629a119ba
This patch is adding the NS bit to CHI requests to make sure they are
properly tagged according to their security
Change-Id: I33d3610edefbb5a05a6090e9125c35d4fb8bca58
Reviewed-by: Tiago Muck <tiago.muck@arm.com>
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Fix#1044. This patch adds checks for message types (PUTX_COPY, DATA,
DATA_EXCLUSIVE) that contain data blocks but were missing from the
original `functionalRead` method in MESI Three-Level messages.
Change-Id: I0cedc314166c9cc037bf20f5b7fef5552dd1253c
This PR introduces a missing pice of far atomic implementation. This
pull request incorporates several changes:
- Enable 2-level and 4-level (and N-level) cache hierarchies, removing
Atomic_NoWait transactions
- Fix Unique Near policy implementation that raised abort
- Add support for alloc_on_atomic == False. Enables Far Atomics on
systems where the HNF does not allocate evicted lines at LLC (Like in
WriteUpdate).
The GPU device currently supports large BAR which means that the driver
can write directly to GPU memory over the PCI bus without using SDMA or
PM4 packets. The gem5 PCI interface only provides an atomic interface
for BAR reads/writes, which means the values cannot go through timing
mode Ruby caches. This causes bugs as the TCC cache is allowed to keep
clean data between kernels for performance reasons. If there is a BAR
write directly to memory bypassing the cache, the value in the cache is
stale and must be invalidated.
In this commit a TCC invalidate is generated for all writes over PCI
that go directly to GPU memory. This will also invalidate TCP along the
way if necessary. This currently relies on the driver synchonization
which only allows BAR writes in between kernels. Therefore, the cache
should only be in I or V state.
To handle a race condition between invalidates and launching the next
kernel, the invalidates return a response and the GPU command processor
will wait for all TCC invalidates to be complete before launching the
next kernel.
This fixes issues with stale data in nanoGPT and possibly PENNANT.
Change-Id: I8e1290f842122682c271e5508a48037055bfbcdf
The GPUDynInst for sending memory requests through the CUs data port
is required but only used for DPRINTFs. Relax this constraint so that
the methods can be reused for requests such as probes generated by the
GPU device.
Change-Id: I16094e400968225596370b684d6471580888d98a
Add alternative implementation to far atomics when the flag alloc_on_commit
is false. The implementation fetches the data, performs the atomic and
writes back the cache line to main memory.
Co-authored-by: Fabian Schätzle <f.schaetzle@fz-juelich.de>
Change-Id: I8797fbc68448e1866a292f4afeedd3613113dddd
To make Atomic transaction recursive and enable 2-level config,
remove AtomicReturn_NoWait and other level-dependent code
GitHub Issue: https://github.com/gem5/gem5/issues/882
Change-Id: Iac468cdb8a3b5914c8f05c5cedde866ce85f359a
This is a followed up fix to #791 mem-ruby: Fix possible dirty line loss
in CHI when ReadShared hit on UD line.
UD_RU line may have stale data since the upstream could have updated the
line, so its local cache line data is treated as invalid
(dataValid=false). But when the line is evicted, it must be written back
to downstream because the upstream may have the line in clean state
(UC). This change fixes it by performing copy back the UD_RU line while
keeping its dataValid as false.
Example error case:
- L3 was in UD_RSC and being evicted without back-invalidation. LLC (HN)
was in RU state.
- Because there's still upstream sharer, L3 sends WriteClean.
- Because the data state was unique and dirty, L3 sends CBWrData_UD_PD.
- LLC becomes UD_RU.
- When the line is evicted from LLC (LocalHN_Eviction), the line is just
dropped, causing the loss of the dirty copy
Co-authored-by: Minje Jun <minje.jun@samsung.com>
Adds categorization of bypassed atomics in TCC to the TBE as either
return or no-return, which gets consumed in pa_performAtomic to
determine if atomic logs should be stored.
Reestablishes TCC bypassed atomics after #546.
Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38
Adds categorization of bypassed atomics in TCC to the TBE as either return
or no-return, which gets consumed in pa_performAtomic to determine if
atomic logs should be stored.
Reestablishes TCC bypassed atomics after #546.
Change-Id: Ibc1fa2b795ef1c47c3893a0b1911fa7993522d38
Bypassed write though requests on invalid lines in the TCC should be
written though to the directory. This transition was previously
missing.
Change-Id: I16b117c4e085ce6be0ed5297aa0129d52cd35a51
In case ReadShared hit on a UD line and there's no sharers, this chage
makes the downstream passes Dirty to the requestor whenever possible
even though it doesn't deallocate the line. This will make the requestor
to SD and the downstream to UD_RSD.
In the previous implementation, loosely exclusive intermediate cache can
cause loss of dirty data. Example error condition is as below.
Configurations
L2 cache: Roughly inclusive to L1 without back-invalidation
- dealloc_on_* = false
- dealloc_backinv_* = false
L3 cache: Roughly exclusive to L2 without back-invalidation
- alloc_on_readshared = tue
- alloc_on_readunique = false
- dealloc_on_shared = false
- dealloc_on_unique = true
- dealloc_backinv_* = false
- is_HN = false
LLC: Same clusivity as L3 except is_HN = true
For all caches, allow_SD = true and fwd_unique_on_readshared = false
Example problem sequence:
1. L1 sends ReadUnique then becomes UD. L2 is UC_RU. L3 and LLC are RU.
2. L1 evicts the line to L2 by WriteBackFull (UD_PD). L2 becomes UD.
3. L2 evicts the line to L3 using WriteBackFull (UD_PD). L3 becomes UD.
4. L1 reads the line with ReadShared which misses on L2.
5. L2 reads the line with ReadShared which hits on L3. L3 becomes UD_RSC
because it doesn't deallocate the line (dataToBeInvalid=false)
6. L3 evicts the line to LLC by WriteCleanFull (UD_PD) because L3
doesn't back-invalidate and still has sharer. The local cache line is
invalidated by Deallocate_CacheBlock. L3 becomes RUSC and LLC becomes
UD_RU.
7. When UD_RU is evicted at LLC, the UD_RU line is dropped expecting the
upstream to writeback, causing loss of dirty data
This commit adds support for cache invalidation in GPU VIPER protocol's
SQC cache. To support this, the commit also adds L1 cache invalidation
framework in the Sequencer such that the Sequencer sends out an
invalidation request for each line in the cache and declares completion
once all lines are evicted.
Change-Id: I2f52eacabb2412b16f467f994e985c378230f841
This commit allows CompData_SD be sent when ReadShared hits on UD line and
the local cache keeps the line, unless the request doesn't allow SD.
Change-Id: I337f24c871cc4c19c5b5fb11f9b35c0a8eb7911c
In case ReadShared hit on a UD line and there's no sharers, this chage
makes the downstream respond with Unique even though it doesn't deallocate
the line. This will make the requestor to UD and the downstream to UD_RU.
In the previous implementation, loosely exclusive intermediate cache can
cause loss of dirty data. Example sequence is as below.
Configurations
L2 cache: Roughly inclusive to L1 without back-invalidation
- dealloc_on_* = false
- dealloc_backinv_* = false
L3 cache: Roughly exclusive to L2 without back-invalidation
- alloc_on_readshared = tue
- alloc_on_readunique = false
- dealloc_on_shared = false
- dealloc_on_unique = true
- dealloc_backinv_* = false
- is_HN = false
LLC: Same clusivity as L3 except is_HN = true
For all caches, allow_SD = true and fwd_unique_on_readshared = false
Example problem sequence:
1. L1 sends ReadUnique then becomes UD. L2 is UC_RU. L3 and LLC are RU.
2. L1 evicts the line to L2 by WriteBackFull (UD_PD). L2 becomes UD.
3. L2 evicts the line to L3 using WriteBackFull (UD_PD). L3 becomes UD.
4. L1 reads the line with ReadShared which misses on L2.
5. L2 reads the line with ReadShared which hits on L3. L3 becomes UD_RSC
because it doesn't deallocate the line (dataToBeInvalid=false)
6. L3 evicts the line to LLC by WriteCleanFull (UD_PD) because L3 doesn't
back-invalidate and still has sharer. The local cache line is
invalidated by Deallocate_CacheBlock.
L3 becomes RUSC and LLC becomes UD_RU.
7. When UD_RU is evicted at LLC, the UD_RU line is dropped expecting the
upstream to writeback, causing loss of dirty data.
Change-Id: Ic9bee27f2ec8906dd5df8bd3be60e5a9a76c782f
In Ruby CHI protocol UD_RU state means the line is in UD state in
the local cache and the upstream may have it in UD or UC state.
In the previous implementation UD_RU line was just dropped without
WriteBack which can cause loss of dirty data when the upstream has it
in UC state.
This commit fixes it by performing WriteBack when evciting UD_RU line.
Change-Id: I1db9b4f95cc576e71dcef38b01de24775df514ba
Related to issue #703 , this PR removes GCN3 related files and updates
source code, documentation, and tests to switch over to Vega is that was
not done already. Highlights are:
- Remove all src/arch/amdgpu/gcn3 files and update Kconfigs.
- Remove references to GCN3 and replace with Vega where applicable.
- Update the build targets in the gcn-gpu Docker. This will need to be
rebuilt but not urgently.
- Remove the GCN3 tag in testlib. Most tests seem to be using Vega
already, so that commit is small.
Replace instances of "GCN3" with Vega. Remove gfx801 and gfx803. Rename
FIJI to Vega and Carrizo to Raven.
Using misc since there is not enough room to fit all the tags.
Change-Id: Ibafc939d49a69be9068107a906e878408c7a5891
When restoring the simulate_limit_event pointer is not
restored after running the dry simulation run which ends up in
"Panic: event not found!"
In this commit we fix this issue by correctly restoring
the pointer value along with the event queue head
Change-Id: Id5ad4d2a270a6cd34eec1dc5c9b170b2b84610d4
---------
Co-authored-by: narya <nitish.arya@bsc.es>
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
The WriteUniqueZero is an immediate write to a Snoopable address region
that does not require any data transfer (cacheline is zeroed)
Change-Id: Ia8c9b40e08a3b7d613f0b62ce0ac4b0547860871
Reviewed-by: Tiago Muck <tiago.muck@arm.com>
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Stash requests will simply be discarded by the Home Node This will
return a CompI response to the RNF
Change-Id: I9c2ce5d4d42f380d1a554933d381cf8a8590ba22
Reviewed-by: Tiago Muck <tiago.muck@arm.com>
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
clang correctly found that the functions `inCache`, `hasBeenPrefetched`
and `inMissQueue` had the wrong signatures in the DVM funcs files. These
functions are unused, so this change just updates their signatures.
Change-Id: Id669ff661e1c6c46eaf04ea1f17cd9866a9e49ed
Update the RubyCache debug flag prints in CacheMemory to be more
descriptive and make clearer what is happening in a given function.
This makes it easier to determine what is happening when looking at the
RubyCache debug flags prints.
Change-Id: Ieee172b6df0d100f4b1e8fe4bba872fc9cf65854