This commit is building over the CHI-TLM wrapping introduced
by the previous commit and it is adding a CHI traffic generator
as a SimObject.
This will get the python objects as input and it will forward
them to the TlmController to convert them into ruby CHI
messages
Change-Id: Ia67094c9bb880e37b24184313df546ecbaa3289f
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit is wrapping the external AMBA CHI-TLM with pybind11
so that it will be possible to use its data structures/functions
from python.
More specifically we will be able to instantiate a ARM::CHI::Payload
and ARM::CHI::Phase from a gem5 config, with the end goal of being
able to configure a CHI transaction from python
Change-Id: I9587b445c21df44161fa3d9e09fc2651541b38bd
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit is extending the previously defined CHIGenericController
to implement a CacheController which acts as a bridge between the
AMBA TLM 2.0 implementation of CHI [1][2] with the gem5 (ruby) one.
In other words it translates AMBA CHI transactions into ruby
messages (which are then forwarded to the MessageQueues)
and viceversa.
ARM::CHI::Payload, CHIRequestMsg
<--> CHIDataMsg
ARM::CHI::Phase CHIResponseMsg
CHIDataMsg
[1]: https://developer.arm.com/documentation/101459/latest
[2]: https://developer.arm.com/Architectures/AMBA#Downloads
Change-Id: I6f35e7b4ade4d0de1b5e5d2dbf73ce796a9f9fb6
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Component implementing a generic controller that allow classic caches
interaction with Ruby/CHI.
The CHIGenericController provides an interface to send/receive CHI
messages to/from the interconnect. This is implement in C++ rather then
SLICC. This controller is seen as a MachineType:Cache by the CHI
implementation in SLICC.
Change-Id: I3afc4363f4290095c2f7428c8487bccd932e0300
This snoop reponse is not generated internally by the SLICC
implementation, but is required for compatibility with classic caches
which may remain in SD state while returning SC data upon receiving
a converted SnpShared.
Change-Id: I5270b29c8863c7afd8abc39b3c7978b95330c183
This change does many things, but they must all be atomically done.
**USER FACING CHANGE**: The Ruby protocols in Kconfig have changed names
(they are now the same case as the SLICC file names). So, after this
commit, your build configurations need to be updated. You can do so by
running `scons menuconfig <build dir>` and selecting the right ruby
options. Alternatively, if you're using a `build_opts` file, you can run
`scons defconfig build/<ISA> build_opts/<ISA>` which should update your
config correctly.
Detailed changes are described below.
Kconfig changes:
- Kconfig files in ruby now must all be declared in the ruby/Kconfig
file
- All of the protocol names are changed to match their slicc file names
including the case
- A new option is available called "Use multiple protocols" which should
be selected if multiple protocols are selected. This is only used to
set the PROTOCOL variable to "MULTIPLE" when in multiple mode.
- The PROTOCOL variable can now be "MULTIPLE" which means it will be
ignored. If it's not "MULTIPLE" then it holds the "main" protocol,
which is necessary for backwards compatibility with the Ruby.py files.
Ruby config changes:
To make this change backwards compatible with Ruby.py, this change adds
a new "protocol" config called MULTIPLE.py which is used to allow the
user to set a "--protocol" option on the command line. This is only
needed if you are using a gem5 binary with multiple protocols but need
to use Ruby.py.
stdlib changes:
- Make the coherence protocol file behave like the ISA file
- Add a function to get the coherence protocol from the `CacheHierarchy`
like we do with the ISA in the `Processor`.
- Use this function where `get_runtime_coherence_protocol` was used
- Update the requires code to work with the ne CoherenceProtocol
- Fix a typo in the AMD Hammer name and also add the missing MSI
protocol
Scons changes:
- In Ruby we now gather up all of the protocols and build them all if
there are multiple protocols
- There's some bending over backwards to tell the user if they are using
an out of date gem5.build/config file and how to update it
- Note that multiple ruby protocols adds a significant amount of time to
the build since we have to run slicc twice for each file.
build_opts:
- Update all files with new names
- Add a new NULL_All_Ruby that will be used for testing
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This removes two #defines: PARTIAL_FUNC_READS and PROTOCOL_<protocol>.
Instead, update the code to use the runtime information about which
protocol we are using.
Change-Id: Icb6f10fc2d3fd59128c62f9f6e37b52ef2581b61
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Add a ProtocolInfo class that is specialized (through inheritance) for
each protocol. This class currently has the protocol's name and any
protocol-specific options (partial_func_reads is the only one so far).
Note that the SLICC language has been updated so that you can specify
the options in the `protocol` statement in the `.slicc` file.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This change is a step toward multiple protocols building at the same
time in scons. Add functions and use lists instead of single protocol.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This change makes it so that the MachineType.cc/hh file are not unique
for each protocol. All of the machine types are now tracked.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Rename all SLICC generated SimObjects to have the protocol in their
name. This will allow for two different protocols to have the same
machine names (e.g., L1Cache). For compatiblity, we check to see if the
current or main protocol that is built matches the SimObject's protocol
and export the backwards-compatible name.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Move the html output to be in a subdirectory with the protocol name.
Change-Id: I1510d2d5a531cc6db74d10a0478c23bc8a836a26
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Move all generated protocol-specific files to a subdirectory with the
protocol's name.
This change also updates SLICC to have separate variables for the
filename, c identifier and python identifier instead of just using
variations of the c identifier.
Change-Id: I62f69a4606b030ee23cb2d96493f3257a6923748
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Wrap all protocol-specific types in `namespace <protocol>`. This will
facilitate compiling multiple protocols into one binary.
There is a one-time hack to the generated `MachineType.cc` file to use
the namespace for the protocol until we generalize the machine types.
Change-Id: I5947e8ac69afe6f7ed257d7c5980ad65e9338acf
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
This changes extends SLICC to understand two different kinds of slicc
files: files that are protocol-specific and files that are shared or
included between different protocols.
Each declaration in SLICC can now be shared or not. If it is shared,
then we can take a different action in the code generation (e.g., wrap
in a namespace).
*Developer facing change*
Removes the RubySlicc_interfaces.slicc file from the SLICC includes of
every protocol.
Changes required: If you have a custom protocol, you will need to remove
the line `include "RubySlicc_interfaces.slicc" from your .slicc file.
Change-Id: Ia6c2dafe2b8fe86749a13d17daa885bddd166855
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Component that require randomness should not share their randomness
source with other components to avoid simulation noise. For instance,
the branch predictor of one core should not impact the random
cache replacement policy of the cache of another core. This currently
happens as all components share a single random number generator.
This PR provides their own generators to relevant components, although
a couple components still use rand().
Change-Id: I3fb7226111c9194ee457af0f0f2b83f8c7b69d1e
Co-authored-by: Arthur Perais <arthur.perais@univ-grenoble-alpes.fr>
This commit fixes three issues in MESI_Three_Level and MESI_Two_Level
implementations (MEI_Three_Level_HTM might still have issues).
1) Define functional read priorities for the cache controllers which
have states with Maybe_Stale access permission (L1 > L2 > Directory).
2) Fix incorrect access permissions in MESI_Three_Level-L1cache:
* S_IL0 is Read_Only, it is waiting for L0 to acknowledge the
invalidation request before moving to SS, also a Read_Only state.
* E_IL0 is Maybe_Stale, its contents might be valid, since there is a
transition (E_IL0, L0_Ack, EE) with no writeback data.
* M_IL0 is Maybe_Stale, its contents might be valid, since there is a
transition (M_IL0, L0_Ack, MM) with no writeback data.
3) Add missing message types carrying valid data in functional reads:
* INV_DATA is a writeback from L0 to L1.
* DATA is a response to GET_S, but there are scenarios where it might
be the only place with valid data (e.g. during L2 replacement).
Change-Id: Ie44fa317027f9ede272967e7461d337e14355eec
Functional reads can be satisfied by one of the following, in order:
1. Main memory (when the data is not present in the cache hierarchy);
2. Valid data block in cache;
3. Valid data block in coherence message;
4. Valid data block marked as Maybe_Stale;
Number 4 is not handled by the current implementation. A Maybe_Stale
block can be either truly stale or actually valid. When it is stale,
the memory read will be satisfied by either number 2 or number 3. When
it is valid, there will be no coherence message with valid data inside,
and the Maybe_Stale block will transition to a valid state after
receiving some kind of acknowledgement.
The main challenge to handle number 4 is how to know from which
Maybe_Stale block the data should be read from. For instance, in a two
level cache hierarchy, we might have a block marked as Maybe_Stale in
both L1 and L2. In this case, we should prioritize the cache controller
that is closest to the CPU. To define this priority, a new virtual
function 'functionalReadPriority' was added to the AbstractController
class.
Change-Id: I4774cd01aab7bb9ca53694cd9dc4f9416a8e4025
Previously, GPU L2 caches could be configured in either writeback or
writethrough mode when used in an APU. However, in a CPU+dGPU system,
only writethrough worked. This is mainly because in CPU+dGPU system, the
CPU sends either PCI or SDMA requests to transfer data from the GPU
memory to CPU. When L2 cache is configured to be writeback, the dirty
data resides in L2 when CPU transfers data from GPU memory. This leads
to the wrong version being transferred. A similar issue also crops up
when the GPU command processor reads kernel information before kernel
dispatch, only to incorrect data. This PR contains a set of commits that
fix both these issues.
Two NetDest locally declared variables are using default constructor
instead of constructor with RubySystem pointer. This will cause asserts
when (1) garnet is used or (2) a protocol that uses `broadcast()` is
built.
Fix these two by passing the appropriate RubySystem pointers.
PR gem5#1453 left some unused variables in the ruby code that triggered
"unused variable" warnings found comiling ALL/gem5.opt to use the CHI
protocol. These have been removed.
In #1453, an `implicit_ctor` option was added for SLICC structures. This
was done to allow statements such as `NetDest tmp;` which now require a
non-default constructor without modifying every protocol. The new
`implicit_ctor` option converts the statement `NetDest tmp;` in SLICC to
`NetDest tmp(<implicit_ctor>);` in C++. This is problematic when doing
something like `NetDest tmp := getMachines(...);` which gets converted
to `NetDest tmp(<implicit_ctor) = getMachines(...);` as the constructor
doesn't return an object. Before #1453 NetDest had a default constructor
so there we no difference between a local variable definition and local
variable assignment.
This commit fixes this issue by checking in the LocalVariableAST if the
local variable is part of an assignment or not. If it is not part of an
assignment, the implicit_ctor is used. Otherwise, the assignment is
printed to the generated code.
Note that this is not done anywhere in the public code but should be
allowed for folks writing their own Ruby protocols who might otherwise
be confused why a simple assignment presents a compile error.
Fix#1384.
MESI_Two_Level and MESI_Three_Level protocols are susceptible to LL/SC
livelocks when simulating boards with high core count.
This fix is based on MOESI_CMP_directory's implementation of locked
states, but tailors the solution to only apply it when a Load-Linked is
initiated.
There are two new states to act as locked states and stall any messages
leading to eviction:
* LLSC_E: equivalent to E state, go to E after timeout.
* LLSC_M: equivalent to M state, go to M after timeout.
The main new event is Load_Linked, which is very similar (in behavior)
to a Store, reusing several transient states. When a controller receives
the exclusive data, it differentiates a Load_Linked from a Store by
checking a new field added to the TBE: 'isLoadLinked'. It triggers a
different event when it is a Load_Linked, which in turn causes the
transition to one of the locked states.
The entire mechanism can be turned off by setting 'use_llsc_lock' to
false, and the amount of time to keep locked is defined by
'llsc_lock_timeout_latency'.
Change-Id: I13f415b6b7890d51d01f23001047d2363467a814
Starting with https://github.com/gem5/gem5/pull/1453 , some Ruby
structures require a block size be set
and other require a pointer to the Ruby system. This fixes some cases
which were not covered by the per-checkin tests but seen in daily+
tests. In particular:
- WriteMasks and PerfectCacheMemory must explicitly set a block size.
- NetDest and RubyProxyPort require RubySystem pointer.
- Classes inheriting Message now have a setRubySystem collecting all
objects that need a RubySystem pointer and this should be called in
the constructor of the Message.
This commit makes sure all of these happen. This should fix daily
arm_boot_tests and daily learning_gem5 tests.
This PR adds the SMS prefetcher described in [this
](https://web.eecs.umich.edu/~twenisch/papers/isca06.pdf) paper.
This work was done in collaboration with @Setu-Gupta, and @xmlizhao
On branch sms
Changes to be committed:
modified: src/mem/cache/prefetch/Prefetcher.py
modified: src/mem/cache/prefetch/SConscript
new file: src/mem/cache/prefetch/sms.cc
new file: src/mem/cache/prefetch/sms.hh
Change-Id: I68d3bb6cf07385177d0f776fb958f652cfc41489
- This change updates syntax of constructors of Template Classes from
`class<T>()` to `class()`
- Initializes coherence to 0 in `src/mem/cache_blk.hh`
The above changes are made to solve the errors when compiling gem5 in
gcc 14
There are several parts to this PR to work towards #1349 .
(1) Make RubySystem::getBlockSizeBytes non-static by providing ways to
access the block size or passing the block size explicitly to classes.
The main changes are:
- DataBlocks must be explicitly allocated. A default ctor still exists
to avoid needing to heavily modify SLICC. The size can be set using a
realloc function, operator=, or copy ctor. This is handled completely
transparently meaning no protocol or config changes are required.
- WriteMask now requires block size to be set. This is also handled
transparently by modifying the SLICC parser to identify WriteMask
types and call setBlockSize().
- AbstractCacheEntry and TBE classes now require block size to be set.
This is handled transparently by modifying the SLICC parser to
identify these classes and call initBlockSize() which calls
setBlockSize() for any DataBlock or WriteMask.
- All AbstractControllers now have a pointer to RubySystem. This is
assigned in SLICC generated code and requires no changes to protocol
or configs.
- The Ruby Message class now requires block size in all constructors.
This is added to the argument list automatically by the SLICC parser.
(2) Relax dependence on common functions in
src/mem/ruby/common/Address.hh
so that RubySystem::getBlockSizeBits is no longer static. Many classes
already have a way to get block size from the previous commit, so they
simply multiple by 8 to get the number of bits. For handling SLICC and
reducing the number of changes, define makeCacheLine, getOffset, etc. in
RubyPort and AbstractController. The only protocol changes required are
to change any "RubySystem::foo()" calls with "m_ruby_system->foo()".
For classes which do not have a way to get access to block size but
still used makeLineAddress, getOffset, etc., the block size must be
passed to that class. This requires some changes to the SimObject
interface for two commonly used classes: DirectoryMemory and
RubyPrefecther, resulting in user-facing API changes
User-facing API changes:
- DirectoryMemory and RubyPrefetcher now require the cache line size as
a non-optional argument.
- RubySequencer SimObjects now require RubySystem as a non-optional
argument.
- TesterThread in the GPU ruby tester now requires the cache line size
as a non-optional argument.
(3) Removes static member variables in RubySystem which control
randomization, cooldown, and warmup. These are mostly used by the Ruby
Network. The network classes are modified to take these former static
variables as parameters which are passed to the corresponding method
(e.g., enqueue, delayHead, etc.) rather than needing a RubySystem object
at all.
Change-Id: Ia63c2ad5cf0bf9d1cbdffba5d3a679bb4d3b1220
(4) There are two major SLICC generated static methods:
getNumControllers()
on each cache controller which returns the number of controllers created
by the configs at run time and the functions which access this method,
which are MachineType_base_count and MachineType_base_number. These need
to be removed to create multiple RubySystem objects otherwise NetDest,
version value, and other objects are incorrect.
To remove the static requirement, MachineType_base_count and
MachineType_base_number are moved to RubySystem. Any class which needs
to call these methods must now have a pointer to a RubySystem. To enable
that, several changes are made:
- RubyRequest and Message now require a RubySystem pointer in the
constructor. The pointer is passed to fields in the Message class
which require a RubySystem pointer (e.g., NetDest). SLICC is modified
to do this automatically.
- SLICC structures may now optionally take an "implicit constructor"
which can be used to call a non-default constructor for locally
defined variables (e.g., temporary variables within SLICC actions). A
statement such as "NetDest bcast_dest;" in SLICC will implicitly
append a call to the NetDest constructor taking RubySystem, for
example.
- RubySystem gets passed to Ruby network objects (Network, Topology).
This commit changes metric units (e.g. kB, MB, and GB) to binary units
(KiB, MiB, GiB) in various files. This PR covers files that were missed
by a previous PR that also made these changes.
This will allow gem5 to configure the maximum capacity of a
partition dynamically during simulation, rather than
having it statically defined at construction time
Change-Id: Ib55c9990a6bc2930abaf2438c13337acc643520f
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
In this way we actually need to store one unsigned integer instead of
two. We also won't need to recompute the total number of cache blocks
whenever we will adapt this policy to be dynamically modified
Change-Id: Ia8cf906539d1891b6cdb821f2a74628127dc68c6
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This PR is adjusting the constructor to relax template
requirements. In this way child classes are free to provide
their own way of calculating the number of entries and the
shifting required to extract the set
Why do we need this?
Up to this patch we have been configuring the indexing policy
by setting up the cache/table size (in bytes) and the entry size.
Those parameters make a lot of sense in caching structures
where:
a) We want to configure the caching structure using
the amount of storage (in bytes) provided (e.g. 4kB of Cache)
b) the content of a single entry is addressable therefore
we need the entry size to know how many bits in the indexing
process we need to shift to extract the set
In those cases the number of cache entries is derived from the formula
num_entries = size / entry_size
The adoption of the IndexingPolicy for different kinds
of caching structures (e.g. prefetcher tables) make this
way of configuring the IP a bit quirky.
For some tables directly setting the number of entries is a far more
intuitive way of configuring the IP, instead of allocating the desired
number of entries by working things out with the formula above
This PR changes memory and cache sizes in various parts of the gem5
codebase to use binary units (e.g. KiB) instead of metric units (e.g.
kB). This makes the codebase more consistent, as gem5 automatically
converts memory and cache sizes that are in metric units to binary
units.
This PR also adds a warning message to let users know when an
auto-conversion from base 10 to base 2 units occurs.
There were a few places in configs and in the comments of various files
where I didn't change the metric units, as I couldn't figure out where
the parameters with those units were being used.
The current GPU_VIPER protocol's TCC cache update the MRU information
twice with calling a_allocateBlock and ut_updateTag which affectgs the
LIP and RRIP replacement polies. Remove ut_updateTag fixes the LIP and
RRIP replacement polies.
Change-Id: I79ad9392593e00425a7fe8828048465b2c2c2e1f
This commit is adjusting the constructor to relax template
requirements. In this way child classes are free to provide
their own way of calculating the number of entries and the
shifting required to extract the set
Why do we need this?
Up to this patch we have been configuring the indexing policy
by setting up the cache/table size (in bytes) and the entry size.
Those parameters make a lot of sense in caching structures
where:
a) We want to configure the caching structure using
the amount of storage (in bytes) provided (e.g. 4kB of Cache)
b) the content of a single entry is addressable therefore
we need the entry size to know how many bits in the indexing
process we need to shift to extract the set
In those cases the number of cache entries is derived from the formula
num_entries = size / entry_size
The adoption of the IndexingPolicy for different kinds
of caching structures (e.g. prefetcher tables) make this
way of configuring the IP a bit quirky.
For some tables directly setting the number of entries is a far more
intuitive way of configuring the IP, instead of allocating the desired
number of entries by working things out with the formula above
Change-Id: Ic7994c129196d6ba83dc99ce397ad43393d35252
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit refactors the error message added to convert.py.
A mapping between the base 10 and base 2 suffix magnitudes
(e.g. k: ki, M: Mi, etc.) and a new function that extracts the
magnitude and numerical value have been added. Also, a warning
message has been added to the toMemoryBandwidth function in
addition to the one in toMemorySize.
Change-Id: I3ae157d13c7089d38a34a6e4c35a2b58978106d0
The Vega ISA's s_memtime instruction is used to obtain a cycle value
from the GPU. Previously, this was implemented to obtain the cycle count
when the memtime instruction reached the execute stage of the GPU
pipeline. However, from microbenchmarking we have found that this under
reports the latency for memtime instructions relative to real hardware.
Thus, we changed its behavior to go through the scalar memory pipeline
and obtain a latency value from the the SQC (L1 I$). This mirrors the
suggestion of the AMD Vega ISA manual that s_memtime should be treated
like a s_load_dwordx2.
The default latency was set based on microbenchmarking.
Change-Id: I5e251dde28c06fe1c492aea4abf9f34f05784420
This commit contains the rest of the base 2 vs base 10 cache/memory
size clarifications. It also changes the warning message to use
warn(). With these changes, the warning message should now no
longer show up during a fresh compilation of gem5.
Change-Id: Ia63f841bdf045b76473437f41548fab27dc19631
We don't store a pointer to the indexing policy anymore.
Instead, we register a tag extractor callback when we
construct the TaggedEntry
Change-Id: I79dbc1bc5c5ce90d350e83451f513c05da9f0d61
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
We don't store a pointer to the indexing policy anymore.
Instead, we register a tag extractor callback when we
construct the CacheEntry
Change-Id: I06dc58e2f67e01f3f9bcd9f0c641505d3aec82ff
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
As detailed by a previous commit, AssociativeSet is not needed anymore.
The class is effectively the same as AssociativeCache
Change-Id: I24bfb98fbf0826c0a2ea6ede585576286f093318
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>