Classes CHI_RNF and CHI_MN can be specialized to override base
class/subclass attributes, like it happens in CustomMesh with
router_list (see configs/example/noc_config/2x4.py). To avoid missing
these attributes, it is needed to generalize the class types when
instantiating the objects in the recently added generators.
This commit allows top level configs making use of the Ruby module
to define node generation callbacks.
The config_ruby function will check the system object for two
factory methods
1) _rnf_gen, if defined, will be called to generate RNFs
2) _mn_gen, if defined, will be called to generate MNs
Change-Id: I9daeece646e7cdb2d3bfefa761a9650562f8eb4b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This change does many things, but they must all be atomically done.
**USER FACING CHANGE**: The Ruby protocols in Kconfig have changed names
(they are now the same case as the SLICC file names). So, after this
commit, your build configurations need to be updated. You can do so by
running `scons menuconfig <build dir>` and selecting the right ruby
options. Alternatively, if you're using a `build_opts` file, you can run
`scons defconfig build/<ISA> build_opts/<ISA>` which should update your
config correctly.
Detailed changes are described below.
Kconfig changes:
- Kconfig files in ruby now must all be declared in the ruby/Kconfig
file
- All of the protocol names are changed to match their slicc file names
including the case
- A new option is available called "Use multiple protocols" which should
be selected if multiple protocols are selected. This is only used to
set the PROTOCOL variable to "MULTIPLE" when in multiple mode.
- The PROTOCOL variable can now be "MULTIPLE" which means it will be
ignored. If it's not "MULTIPLE" then it holds the "main" protocol,
which is necessary for backwards compatibility with the Ruby.py files.
Ruby config changes:
To make this change backwards compatible with Ruby.py, this change adds
a new "protocol" config called MULTIPLE.py which is used to allow the
user to set a "--protocol" option on the command line. This is only
needed if you are using a gem5 binary with multiple protocols but need
to use Ruby.py.
stdlib changes:
- Make the coherence protocol file behave like the ISA file
- Add a function to get the coherence protocol from the `CacheHierarchy`
like we do with the ISA in the `Processor`.
- Use this function where `get_runtime_coherence_protocol` was used
- Update the requires code to work with the ne CoherenceProtocol
- Fix a typo in the AMD Hammer name and also add the missing MSI
protocol
Scons changes:
- In Ruby we now gather up all of the protocols and build them all if
there are multiple protocols
- There's some bending over backwards to tell the user if they are using
an out of date gem5.build/config file and how to update it
- Note that multiple ruby protocols adds a significant amount of time to
the build since we have to run slicc twice for each file.
build_opts:
- Update all files with new names
- Add a new NULL_All_Ruby that will be used for testing
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Update the MOESI_AMD_Base and GPU_VIPER configuration files with the new
full protocol-specific names for the controllers instead of the
deprecated names.
Note: If you have any files which use the `CntrlBase` base, you will
likely need to update the class names that you are inheriting from.
Change-Id: I623fea7dd4cd151f7b15fe7cb43f8a4c45492d89
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Use the protocol-specific controller names in CHI.
**Important**: This could change some scripts. As long as people use
CHI_config (likely), this shouldn't be a problem, but if you have a
different version of CHI_config.py locally, you will need to make the
following updates:
`Cache_Controller` -> `CHI_Cache_Controller`
`Memory_Controller` -> `CHI_Memory_Controller`
Website updates coming soon!
Change-Id: I7afdcede884ac5f9a9a76cc3d3dd35941e4e2faa
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
There are several parts to this PR to work towards #1349 .
(1) Make RubySystem::getBlockSizeBytes non-static by providing ways to
access the block size or passing the block size explicitly to classes.
The main changes are:
- DataBlocks must be explicitly allocated. A default ctor still exists
to avoid needing to heavily modify SLICC. The size can be set using a
realloc function, operator=, or copy ctor. This is handled completely
transparently meaning no protocol or config changes are required.
- WriteMask now requires block size to be set. This is also handled
transparently by modifying the SLICC parser to identify WriteMask
types and call setBlockSize().
- AbstractCacheEntry and TBE classes now require block size to be set.
This is handled transparently by modifying the SLICC parser to
identify these classes and call initBlockSize() which calls
setBlockSize() for any DataBlock or WriteMask.
- All AbstractControllers now have a pointer to RubySystem. This is
assigned in SLICC generated code and requires no changes to protocol
or configs.
- The Ruby Message class now requires block size in all constructors.
This is added to the argument list automatically by the SLICC parser.
(2) Relax dependence on common functions in
src/mem/ruby/common/Address.hh
so that RubySystem::getBlockSizeBits is no longer static. Many classes
already have a way to get block size from the previous commit, so they
simply multiple by 8 to get the number of bits. For handling SLICC and
reducing the number of changes, define makeCacheLine, getOffset, etc. in
RubyPort and AbstractController. The only protocol changes required are
to change any "RubySystem::foo()" calls with "m_ruby_system->foo()".
For classes which do not have a way to get access to block size but
still used makeLineAddress, getOffset, etc., the block size must be
passed to that class. This requires some changes to the SimObject
interface for two commonly used classes: DirectoryMemory and
RubyPrefecther, resulting in user-facing API changes
User-facing API changes:
- DirectoryMemory and RubyPrefetcher now require the cache line size as
a non-optional argument.
- RubySequencer SimObjects now require RubySystem as a non-optional
argument.
- TesterThread in the GPU ruby tester now requires the cache line size
as a non-optional argument.
(3) Removes static member variables in RubySystem which control
randomization, cooldown, and warmup. These are mostly used by the Ruby
Network. The network classes are modified to take these former static
variables as parameters which are passed to the corresponding method
(e.g., enqueue, delayHead, etc.) rather than needing a RubySystem object
at all.
Change-Id: Ia63c2ad5cf0bf9d1cbdffba5d3a679bb4d3b1220
(4) There are two major SLICC generated static methods:
getNumControllers()
on each cache controller which returns the number of controllers created
by the configs at run time and the functions which access this method,
which are MachineType_base_count and MachineType_base_number. These need
to be removed to create multiple RubySystem objects otherwise NetDest,
version value, and other objects are incorrect.
To remove the static requirement, MachineType_base_count and
MachineType_base_number are moved to RubySystem. Any class which needs
to call these methods must now have a pointer to a RubySystem. To enable
that, several changes are made:
- RubyRequest and Message now require a RubySystem pointer in the
constructor. The pointer is passed to fields in the Message class
which require a RubySystem pointer (e.g., NetDest). SLICC is modified
to do this automatically.
- SLICC structures may now optionally take an "implicit constructor"
which can be used to call a non-default constructor for locally
defined variables (e.g., temporary variables within SLICC actions). A
statement such as "NetDest bcast_dest;" in SLICC will implicitly
append a call to the NetDest constructor taking RubySystem, for
example.
- RubySystem gets passed to Ruby network objects (Network, Topology).
This PR changes memory and cache sizes in various parts of the gem5
codebase to use binary units (e.g. KiB) instead of metric units (e.g.
kB). This makes the codebase more consistent, as gem5 automatically
converts memory and cache sizes that are in metric units to binary
units.
This PR also adds a warning message to let users know when an
auto-conversion from base 10 to base 2 units occurs.
There were a few places in configs and in the comments of various files
where I didn't change the metric units, as I couldn't figure out where
the parameters with those units were being used.
The Vega ISA's s_memtime instruction is used to obtain a cycle value
from the GPU. Previously, this was implemented to obtain the cycle count
when the memtime instruction reached the execute stage of the GPU
pipeline. However, from microbenchmarking we have found that this under
reports the latency for memtime instructions relative to real hardware.
Thus, we changed its behavior to go through the scalar memory pipeline
and obtain a latency value from the the SQC (L1 I$). This mirrors the
suggestion of the AMD Vega ISA manual that s_memtime should be treated
like a s_load_dwordx2.
The default latency was set based on microbenchmarking.
Change-Id: I5e251dde28c06fe1c492aea4abf9f34f05784420
This commit contains the rest of the base 2 vs base 10 cache/memory
size clarifications. It also changes the warning message to use
warn(). With these changes, the warning message should now no
longer show up during a fresh compilation of gem5.
Change-Id: Ia63f841bdf045b76473437f41548fab27dc19631
Rather than adding the options to *every* config that might be using
GPU_VIPER.py, just change the Ruby config to check if the option is
available before trying to use it. Otherwise, reverts to what was the
default on stable.
Change-Id: Ia6f1d0827d489ee2a35c598b644461cbff59e247
Adding RP_choose function to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
Adding RP_choose functions to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
Adding RP_choose functions to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
The scalar cache is not being invalidated which causes stale data to be
left in the scalar cache between GPU kernels. This commit sends
invalidates to the scalar cache when the SQC is invalidated. This is a
sufficient baseline for simulation.
Since the number of invalidates might be larger than the mandatory queue
can hold and no flash invalidate mechanism exists in the VIPER protocol,
the command line option for the mandatory queue size is removed, which
is the same behavior as the SQC.
Change-Id: I1723f224711b04caa4c88beccfa8fb73ccf56572
After removing `get_runtime_isa`, the `send_evicts` function in the ruby
configs assumes that there is an ISA built. This change short-circuits
that logic if the current build is the NULL (none) ISA.
Change-Id: I75fefe3d70649b636b983c4d2145c63a9e1342f7
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
configs/ruby/Ruby.py fails when `DerivO3CPU` is not compiled into the
gem5 binary. The `isinstance` check fails. This fix addds a guard.
Change-Id: I1e5503ab18ec94683056c6eb28cebeda6632ae8e
1. Fix the wrong ISA detect of get_isa function
2. Fix the typo ObjectLIst.cpu_list
3. Fix missing PageTableWalkerCache
4. Fix the invalid default cpu_type paramter
Change-Id: I217ea8da8a6d8e712743a5b32c4c0669216ce6c4
`get_runtime_isa()` has been deprecated for some time. It is a leftover
piece of code from when gem5 was compiled to a single ISA and that ISA
used to configure the simulated system to use that ISA. Since multi-ISA
compilations are possible, `get_runtime_isa()` should not be used.
Unless the gem5 binary is compiled to a single ISA, a failure will
occur.
The new proceedure for specify which ISA to use is by the setting of the
correct `BaseCPU` implementation. E.g., `X86SimpleTimingCPU` of
`ArmO3CPU`.
This patch removes the remaining `get_runtime_isa()` instances and
removes the function itself. The `SimpleCore` class has been updated to
allow for it's CPU factory to return a class, needed by scripts in
"configs/common".
The deprecated functionality in the standard library, which allowed for
the specifying of an ISA when setting up a processor and/or core has
also been removed. Setting an ISA is now manditory.
Fixes#216.
This patch adds supports for using the "classic" prefetchers with ruby
cache controllers.
This pull request includes a few commits making the changes in this
order:
- Refactor decouples the classic cache and prefetchers interfaces
- Extras probes for later integration with ruby
- General ruby-side support
- Adds support for the CHI protocol
Commit [mem-ruby: support prefetcher in CHI
protocol](2bdb65653b)
may be used as example on how to add support for other protocols.
JIRA issues that may be related to this pull request:
https://gem5.atlassian.net/browse/GEM5-457https://gem5.atlassian.net/browse/GEM5-1112
Added a resource constraint, AtomicALUOperation, to GLC atomics
performed in the TCC.
The resource constraint uses a new class, ALUFreeList array. The class
assumes the following:
- There are a fixed number of atomic ALU pipelines
- While a new cache line can be processed in each pipeline each cycle,
if a cache line is currently going through a pipeline, it can't be
processed again until it's finished
Two configuration parameters have been used to tune this behavior:
- tcc-num-atomic-alus corresponds to the number of atomic ALU pipelines
- atomic-alu-latency corresponds to the latency of atomic ALU pipelines
Change-Id: I25bdde7dafc3877590bb6536efdf57b8c540a939
PR #367 adds an option to configs/ruby/GPU_VIPER.py that was not added
to the corresponding dGPU equal for GPUFS and thus all GPUFS runs are
failing. Fixed in this patch.
Added checks to ensure that atomics are not performed in the TCC when it
is configured as a write-through cache. Also added SLC bit overwrite to
ensure directory preforms atomics when there is a write-through TCC.
Change-Id: I4514e6c8022aeb7785f2c59871cd9acec8161ed8
Added a new feature to CHI protocol (in collaboration with @tiagormk).
Here is the Jira Ticket
[https://gem5.atlassian.net/browse/GEM5-1326](https://gem5.atlassian.net/browse/GEM5-1326
). As described in CHI specs, far atomic transactions enable remote
execution of Atomic Memory Operations. This pull request incorporates
several changes:
* Fix Arm ISA definition of Swap instructions. These instructions should
return an operand, so their ISA definition should be Return Operation.
* Enable AMOs in Ruby Mem Test to verify that AMOs work
* Enable near and far AMO in the Cache Controler of CHI
Three configuration parameters have been used to tune this behavior:
* policy_type: sets the atomic policy to one of the described in [our
paper](https://dl.acm.org/doi/10.1145/3579371.3589065)
* atomic_op_latency: simulates the AMO ALU operation latency
* comp_anr: configures the Atomic No return transaction to split
CompDBIDResp into two different messages DBIDResp and Comp
Introduce far atomic operations in CHI protocol.
Three configuration parameters have been used to tune this behavior:
policy_type: sets the atomic policy to one of the described in our paper
atomic_op_latency: simulates the AMO ALU operation latency
comp_anr: configures the Atomic No return transaction to split
CompDBIDResp into two different messages DBIDResp and Comp
Change-Id: I087afad9ad9fcb9df42d72893c9e32ad5a5eb478
Previously, the L1, L2 number of banks and L2 latencies were not
configurable through command line arguments. This commit adds support to
configure them through the arguments '--tcp-num-banks' for number of
banks in L1, '--tcc-num-banks' for number of banks in L2, and
'--tcc-tag-access-latency', and '--tcc-data-access-latency'
Change-Id: Ie3b713ead16865fd7120e2d809ebfa56b69bc4a1
Added a GLC atomic latency parameter (glc-atomic-latency) used when
enqueueing response messages regarding atomics directly performed in
the TCC. This latency is added in addition to the L2 response latency
(TCC_latency). This represents the latency of performing an atomic
within the L2.
With this change, the TCC response queue will receive enqueues with
varying latencies as GLC atomic responses will have this added GLC
atomic latency while data responses will not. To accommodate this in
light of the queue having strict FIFO ordering (which would be violated
here), this change also adds an optional parameter bypassStrictFIFO to
the SLICC enqueue function which allows overriding strict FIFO
requirements for individual messages on a case-by-case basis. This
parameter is only being used in the TCC's atomic response enqueue call.
Change-Id: Iabd52cbd2c0cc385c1fb3fe7bcd0cc64bdb40aac
The TARGET_ISA variable would let you select one ISA from a list of
possible ISAs. That has now been replaced with USE_ARM_ISA, USE_X86_ISA,
etc, variables which are boolean on or off. That will allow any number
of ISAs to be enabled or disabled individually. Enabling something other
than exactly one of these will probably prevent you from getting a
working gem5 binary, but those problems are being addressed in other,
parallel change series.
I decided to use the USE_ prefix since it was consistent with most other
on/off variables we have in gem5. One noteable exception is the
BUILD_GPU setting which, you could convincingly argue, is a better
prefix than USE_. Another option would be to use CONFIG_, in
anticipation of using a kconfig style config mechanism in gem5.
It seemed premature to start using a CONFIG_ prefix here, and if we
decide to switch to some other prefix like BUILD_, it should be a
purposeful choice and not something somebody just starts using.
Change-Id: I90fef2835aa4712782e6c1313fbf564d0ed45538
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/52491
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Update the TCP_latency input arg to reflect what it does -- in
combination with the number of banks, it determines the number of
accesses that can happen in the L1 (TCP) in a given cycle. It does
not directly affect the L1 latency as the name implies. Instead,
the mandatory_queue_latency does this.
Change-Id: Ib6cbc8367ce2b1f30005d137384f53650a403b49
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61309
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>