Previously, GPU L2 caches could be configured in either writeback or
writethrough mode when used in an APU. However, in a CPU+dGPU system,
only writethrough worked. This is mainly because in CPU+dGPU system, the
CPU sends either PCI or SDMA requests to transfer data from the GPU
memory to CPU. When L2 cache is configured to be writeback, the dirty
data resides in L2 when CPU transfers data from GPU memory. This leads
to the wrong version being transferred. A similar issue also crops up
when the GPU command processor reads kernel information before kernel
dispatch, only to incorrect data. This PR contains a set of commits that
fix both these issues.
Two NetDest locally declared variables are using default constructor
instead of constructor with RubySystem pointer. This will cause asserts
when (1) garnet is used or (2) a protocol that uses `broadcast()` is
built.
Fix these two by passing the appropriate RubySystem pointers.
PR gem5#1453 left some unused variables in the ruby code that triggered
"unused variable" warnings found comiling ALL/gem5.opt to use the CHI
protocol. These have been removed.
In #1453, an `implicit_ctor` option was added for SLICC structures. This
was done to allow statements such as `NetDest tmp;` which now require a
non-default constructor without modifying every protocol. The new
`implicit_ctor` option converts the statement `NetDest tmp;` in SLICC to
`NetDest tmp(<implicit_ctor>);` in C++. This is problematic when doing
something like `NetDest tmp := getMachines(...);` which gets converted
to `NetDest tmp(<implicit_ctor) = getMachines(...);` as the constructor
doesn't return an object. Before #1453 NetDest had a default constructor
so there we no difference between a local variable definition and local
variable assignment.
This commit fixes this issue by checking in the LocalVariableAST if the
local variable is part of an assignment or not. If it is not part of an
assignment, the implicit_ctor is used. Otherwise, the assignment is
printed to the generated code.
Note that this is not done anywhere in the public code but should be
allowed for folks writing their own Ruby protocols who might otherwise
be confused why a simple assignment presents a compile error.
Fix#1384.
MESI_Two_Level and MESI_Three_Level protocols are susceptible to LL/SC
livelocks when simulating boards with high core count.
This fix is based on MOESI_CMP_directory's implementation of locked
states, but tailors the solution to only apply it when a Load-Linked is
initiated.
There are two new states to act as locked states and stall any messages
leading to eviction:
* LLSC_E: equivalent to E state, go to E after timeout.
* LLSC_M: equivalent to M state, go to M after timeout.
The main new event is Load_Linked, which is very similar (in behavior)
to a Store, reusing several transient states. When a controller receives
the exclusive data, it differentiates a Load_Linked from a Store by
checking a new field added to the TBE: 'isLoadLinked'. It triggers a
different event when it is a Load_Linked, which in turn causes the
transition to one of the locked states.
The entire mechanism can be turned off by setting 'use_llsc_lock' to
false, and the amount of time to keep locked is defined by
'llsc_lock_timeout_latency'.
Change-Id: I13f415b6b7890d51d01f23001047d2363467a814
Starting with https://github.com/gem5/gem5/pull/1453 , some Ruby
structures require a block size be set
and other require a pointer to the Ruby system. This fixes some cases
which were not covered by the per-checkin tests but seen in daily+
tests. In particular:
- WriteMasks and PerfectCacheMemory must explicitly set a block size.
- NetDest and RubyProxyPort require RubySystem pointer.
- Classes inheriting Message now have a setRubySystem collecting all
objects that need a RubySystem pointer and this should be called in
the constructor of the Message.
This commit makes sure all of these happen. This should fix daily
arm_boot_tests and daily learning_gem5 tests.
This PR adds the SMS prefetcher described in [this
](https://web.eecs.umich.edu/~twenisch/papers/isca06.pdf) paper.
This work was done in collaboration with @Setu-Gupta, and @xmlizhao
On branch sms
Changes to be committed:
modified: src/mem/cache/prefetch/Prefetcher.py
modified: src/mem/cache/prefetch/SConscript
new file: src/mem/cache/prefetch/sms.cc
new file: src/mem/cache/prefetch/sms.hh
Change-Id: I68d3bb6cf07385177d0f776fb958f652cfc41489
- This change updates syntax of constructors of Template Classes from
`class<T>()` to `class()`
- Initializes coherence to 0 in `src/mem/cache_blk.hh`
The above changes are made to solve the errors when compiling gem5 in
gcc 14
There are several parts to this PR to work towards #1349 .
(1) Make RubySystem::getBlockSizeBytes non-static by providing ways to
access the block size or passing the block size explicitly to classes.
The main changes are:
- DataBlocks must be explicitly allocated. A default ctor still exists
to avoid needing to heavily modify SLICC. The size can be set using a
realloc function, operator=, or copy ctor. This is handled completely
transparently meaning no protocol or config changes are required.
- WriteMask now requires block size to be set. This is also handled
transparently by modifying the SLICC parser to identify WriteMask
types and call setBlockSize().
- AbstractCacheEntry and TBE classes now require block size to be set.
This is handled transparently by modifying the SLICC parser to
identify these classes and call initBlockSize() which calls
setBlockSize() for any DataBlock or WriteMask.
- All AbstractControllers now have a pointer to RubySystem. This is
assigned in SLICC generated code and requires no changes to protocol
or configs.
- The Ruby Message class now requires block size in all constructors.
This is added to the argument list automatically by the SLICC parser.
(2) Relax dependence on common functions in
src/mem/ruby/common/Address.hh
so that RubySystem::getBlockSizeBits is no longer static. Many classes
already have a way to get block size from the previous commit, so they
simply multiple by 8 to get the number of bits. For handling SLICC and
reducing the number of changes, define makeCacheLine, getOffset, etc. in
RubyPort and AbstractController. The only protocol changes required are
to change any "RubySystem::foo()" calls with "m_ruby_system->foo()".
For classes which do not have a way to get access to block size but
still used makeLineAddress, getOffset, etc., the block size must be
passed to that class. This requires some changes to the SimObject
interface for two commonly used classes: DirectoryMemory and
RubyPrefecther, resulting in user-facing API changes
User-facing API changes:
- DirectoryMemory and RubyPrefetcher now require the cache line size as
a non-optional argument.
- RubySequencer SimObjects now require RubySystem as a non-optional
argument.
- TesterThread in the GPU ruby tester now requires the cache line size
as a non-optional argument.
(3) Removes static member variables in RubySystem which control
randomization, cooldown, and warmup. These are mostly used by the Ruby
Network. The network classes are modified to take these former static
variables as parameters which are passed to the corresponding method
(e.g., enqueue, delayHead, etc.) rather than needing a RubySystem object
at all.
Change-Id: Ia63c2ad5cf0bf9d1cbdffba5d3a679bb4d3b1220
(4) There are two major SLICC generated static methods:
getNumControllers()
on each cache controller which returns the number of controllers created
by the configs at run time and the functions which access this method,
which are MachineType_base_count and MachineType_base_number. These need
to be removed to create multiple RubySystem objects otherwise NetDest,
version value, and other objects are incorrect.
To remove the static requirement, MachineType_base_count and
MachineType_base_number are moved to RubySystem. Any class which needs
to call these methods must now have a pointer to a RubySystem. To enable
that, several changes are made:
- RubyRequest and Message now require a RubySystem pointer in the
constructor. The pointer is passed to fields in the Message class
which require a RubySystem pointer (e.g., NetDest). SLICC is modified
to do this automatically.
- SLICC structures may now optionally take an "implicit constructor"
which can be used to call a non-default constructor for locally
defined variables (e.g., temporary variables within SLICC actions). A
statement such as "NetDest bcast_dest;" in SLICC will implicitly
append a call to the NetDest constructor taking RubySystem, for
example.
- RubySystem gets passed to Ruby network objects (Network, Topology).
This commit changes metric units (e.g. kB, MB, and GB) to binary units
(KiB, MiB, GiB) in various files. This PR covers files that were missed
by a previous PR that also made these changes.
This will allow gem5 to configure the maximum capacity of a
partition dynamically during simulation, rather than
having it statically defined at construction time
Change-Id: Ib55c9990a6bc2930abaf2438c13337acc643520f
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
In this way we actually need to store one unsigned integer instead of
two. We also won't need to recompute the total number of cache blocks
whenever we will adapt this policy to be dynamically modified
Change-Id: Ia8cf906539d1891b6cdb821f2a74628127dc68c6
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This PR is adjusting the constructor to relax template
requirements. In this way child classes are free to provide
their own way of calculating the number of entries and the
shifting required to extract the set
Why do we need this?
Up to this patch we have been configuring the indexing policy
by setting up the cache/table size (in bytes) and the entry size.
Those parameters make a lot of sense in caching structures
where:
a) We want to configure the caching structure using
the amount of storage (in bytes) provided (e.g. 4kB of Cache)
b) the content of a single entry is addressable therefore
we need the entry size to know how many bits in the indexing
process we need to shift to extract the set
In those cases the number of cache entries is derived from the formula
num_entries = size / entry_size
The adoption of the IndexingPolicy for different kinds
of caching structures (e.g. prefetcher tables) make this
way of configuring the IP a bit quirky.
For some tables directly setting the number of entries is a far more
intuitive way of configuring the IP, instead of allocating the desired
number of entries by working things out with the formula above
This PR changes memory and cache sizes in various parts of the gem5
codebase to use binary units (e.g. KiB) instead of metric units (e.g.
kB). This makes the codebase more consistent, as gem5 automatically
converts memory and cache sizes that are in metric units to binary
units.
This PR also adds a warning message to let users know when an
auto-conversion from base 10 to base 2 units occurs.
There were a few places in configs and in the comments of various files
where I didn't change the metric units, as I couldn't figure out where
the parameters with those units were being used.
The current GPU_VIPER protocol's TCC cache update the MRU information
twice with calling a_allocateBlock and ut_updateTag which affectgs the
LIP and RRIP replacement polies. Remove ut_updateTag fixes the LIP and
RRIP replacement polies.
Change-Id: I79ad9392593e00425a7fe8828048465b2c2c2e1f
This commit is adjusting the constructor to relax template
requirements. In this way child classes are free to provide
their own way of calculating the number of entries and the
shifting required to extract the set
Why do we need this?
Up to this patch we have been configuring the indexing policy
by setting up the cache/table size (in bytes) and the entry size.
Those parameters make a lot of sense in caching structures
where:
a) We want to configure the caching structure using
the amount of storage (in bytes) provided (e.g. 4kB of Cache)
b) the content of a single entry is addressable therefore
we need the entry size to know how many bits in the indexing
process we need to shift to extract the set
In those cases the number of cache entries is derived from the formula
num_entries = size / entry_size
The adoption of the IndexingPolicy for different kinds
of caching structures (e.g. prefetcher tables) make this
way of configuring the IP a bit quirky.
For some tables directly setting the number of entries is a far more
intuitive way of configuring the IP, instead of allocating the desired
number of entries by working things out with the formula above
Change-Id: Ic7994c129196d6ba83dc99ce397ad43393d35252
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit refactors the error message added to convert.py.
A mapping between the base 10 and base 2 suffix magnitudes
(e.g. k: ki, M: Mi, etc.) and a new function that extracts the
magnitude and numerical value have been added. Also, a warning
message has been added to the toMemoryBandwidth function in
addition to the one in toMemorySize.
Change-Id: I3ae157d13c7089d38a34a6e4c35a2b58978106d0
The Vega ISA's s_memtime instruction is used to obtain a cycle value
from the GPU. Previously, this was implemented to obtain the cycle count
when the memtime instruction reached the execute stage of the GPU
pipeline. However, from microbenchmarking we have found that this under
reports the latency for memtime instructions relative to real hardware.
Thus, we changed its behavior to go through the scalar memory pipeline
and obtain a latency value from the the SQC (L1 I$). This mirrors the
suggestion of the AMD Vega ISA manual that s_memtime should be treated
like a s_load_dwordx2.
The default latency was set based on microbenchmarking.
Change-Id: I5e251dde28c06fe1c492aea4abf9f34f05784420
This commit contains the rest of the base 2 vs base 10 cache/memory
size clarifications. It also changes the warning message to use
warn(). With these changes, the warning message should now no
longer show up during a fresh compilation of gem5.
Change-Id: Ia63f841bdf045b76473437f41548fab27dc19631
We don't store a pointer to the indexing policy anymore.
Instead, we register a tag extractor callback when we
construct the TaggedEntry
Change-Id: I79dbc1bc5c5ce90d350e83451f513c05da9f0d61
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
We don't store a pointer to the indexing policy anymore.
Instead, we register a tag extractor callback when we
construct the CacheEntry
Change-Id: I06dc58e2f67e01f3f9bcd9f0c641505d3aec82ff
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
As detailed by a previous commit, AssociativeSet is not needed anymore.
The class is effectively the same as AssociativeCache
Change-Id: I24bfb98fbf0826c0a2ea6ede585576286f093318
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The only difference between the TaggedEntry and the newly defined
CacheEntry is the presence of the secure flag in the first case. The
need to tag a cache entry according to the security bit required the
overloading of the matching methods in the TaggedEntry class to take
security into account (See matchTag [1]), and the persistance after
PR #745 of the AssociativeSet class which is basically identical
to its AssociativeCache superclass, only it overrides its virtual
method to match the tag according to the secure bit as well.
The introduction of the KeyType parameter in the previous commit
will smoothe the differences and help unifying the interface.
Rather than overloading and overriding to account for a different
signature, we embody the difference in the KeyType class. A
CacheEntry will match with KeyType = Addr,
whereas a TaggedEntry will use the following lookup type proposed in this
patch:
struct KeyType {
Addr address;
bool secure;
}
This patch is partly reverting the changes in #745 which were
reimplementing TaggedEntry on top of the CacheEntry. Instead
we keep them separate as the plan is to allow different
entry types with templatization rather than polymorphism.
As a final note, I believe a separate commit will have to
change the naming of our entries; the CacheEntry should
probably be renamed into TaggedEntry and the current TaggedEntry
into something that reflect the presence of the security bit
alongside the traditional address tag
[1]: https://github.com/gem5/gem5/blob/stable/\
src/mem/cache/tags/tagged_entry.hh#L81
Change-Id: Ifc104c8d0c1d64509f612d87b80d442e0764f7ca
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The KeyType data type is the type of the lookup and the cache extracts
it from the Entry template parameter
Change-Id: I147d7c2503abc11becfeebe6336e7f90989ad4e8
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Exposing the tag of a cache entry through the associative
cache APIs makes it hard to generalize the cache for
structured tags. Ultimately the tag should be a property
of the cache entry and any tag extraction logic (if needed)
should reside there. In this we can reuse the associative
cache for different Entry params, each one bearing a different
representation of a tag
Change-Id: I51b4526be64683614e01d763b1656e5be23a611b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Some compilers (gcc version 12.3.0) will start complaining when perfect
forwarding the StrideEntry argument constructed with an extra parameter
(see later patches).
Using a pointer seems to fix the gcc bug.
The commit is also changing the signature of findTable and allocateContext
so that a reference rather than a pointer is return. In this way we don't
deal with the hack of returning a raw ptr from a unique_ptr
Change-Id: Idd451208aae80bbfae76110c859e93084bcb2635
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The code is already assuming a fully associative cache. Rather than
calling getPossibleEntries with a random value and therefore needlessly
passing a vector of pointers, we use the AssociativeCache iterator to
loop over the cache entries
Change-Id: Ic99cbd39ee9f12eef9091d9d62ca24d0c3e61300
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This PR fixes the issues with the implementation of the Best Offset
Prefetcher described in issue #1402
On branch bop
Changes to be committed:
modified: src/mem/cache/prefetch/bop.cc
modified: src/mem/cache/prefetch/bop.hh
---------
Co-authored-by: Setu Gupta <setu.gupta.2020@gamil.com>
Co-authored-by: Abhishek Shailendra Singh <abs218@leigh.edu>
Co-authored-by: Setu Gupta <setu.gupta@partner.samsung.com>
This PR fixes the issues mentioned in #1448.
**Note that this contribution is the result of a joint collaboration
with @AbhishekUoR**
This PR introduces the following 4 changes:
1. It changes the addresses which are used to compute the stride to
cache line aligned addresses (the current version uses word aligned
addresses)
2. It correctly returns if the stride does not match (as opposed to
issuing prefetches using the new stride incorrectly)
3. It returns if the new stride is 0, indicating multiple reads from the
same cache line.
4. It removes code which is no longer necessary after the addition of
changes number 1 and 3.
Change-Id: Ic346d0e15df6d07e2b93289c8d6b89b4c2f45a34
---------
Co-authored-by: Abhishek Shailendra Singh <abs218@leigh.edu>
Functional writes atomically update all copies of a data block, so they
should invalidate any pending LL/SC locks, just like a conventional
write would.
Change-Id: Ic79d2d8d24901f1b6a2ce81dc0e2decc84c0ebbc
This commit removes some comments and adds constexpr in front
of "bool is_secure..." in pif.cc, signature_path.cc, and
signature_path_v2.cc
Change-Id: Icafe1d7c97d1d3fbf6abc12ba87ebb596255b96f
This modifies the crash fix so that the function calls that were
modified use a local variables called `is_secure` instead of a
hardcoded `false`. Some of these existed previously so it made
more sense to use them, while others were newly added in to mark
where the code might need to be changed later.
Change-Id: I0c0d14b74f0ccf70ee5fe7c8b01ed0266353b3c1
This commit fixes the "Need is_secure arg" crash that occurs when
using the IndirectMemoryPrefetcher, SignaturePathPrefetcher,
SignaturePathPrefetcherV2, STeMSPrefetcher, and PIFPrefetcher. This
was done by changing some variables to be AssociativeSet<...>
instead of AssociativeCache<...> and changing the affected function
calls.
Change-Id: I61808c877514efeb73ad041de273ae386711acae
At the moment the method simply returns the rank number. This is not
particularly useful when enabling debug flags as the beginning of the
line prints something like:
1: <debug_message>
whereas it should really be:
system.dram.rank1: <debug_message>
Change-Id: I0136dc3d182afa4ae2e5a719cb366d8d0f444667
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This changes `long`s in src/mem/physical.cc, which are 32 bits or more,
to `uint64_t`s, which are exactly 64 bits.
Change-Id: I64e089a2ac087bcf58b9c3c918c59dc5ff75d010
I noticed while using the stable branch that there were a few typos of
the word 'cache' and so I've corrected a few files where I found such
typos.
Change-Id: I7c7f64812039f34fe39d0c45c4f5ce921cba06d0