This PR improves the legacy RISC-V FS Linux script in the following
ways:
- Adds an argument to specify the bootloader, to (optionally) use the
`RiscvBootloaderKernelWorkload` class.
- Updates the DTB generation function adding the Chosen node. This
fixes the execution with recent Linux kernels.
- Checks if the `--kernel` required argument is set.
This PR adds support for command line arguments in GPU-FS runs to allow
the user to configure several parts of the GPU. It also increases the
bits per set in the build_opts/VEGA_X86 file to enable GPU-FS
simulations to use 64 directories or more.
This PR adds a RiscvDemoBoard that can be used with both SE and FS
mode.This was tested using the workloads riscv-matrix-multiply-run for
SE and riscv-ubuntu-20.04-boot for FS. Two example config scripts have
also been added.
Starting with https://github.com/gem5/gem5/pull/1453 , some Ruby
structures require a block size be set
and other require a pointer to the Ruby system. This fixes some cases
which were not covered by the per-checkin tests but seen in daily+
tests. In particular:
- WriteMasks and PerfectCacheMemory must explicitly set a block size.
- NetDest and RubyProxyPort require RubySystem pointer.
- Classes inheriting Message now have a setRubySystem collecting all
objects that need a RubySystem pointer and this should be called in
the constructor of the Message.
This commit makes sure all of these happen. This should fix daily
arm_boot_tests and daily learning_gem5 tests.
This missing parameter causing the Learning gem5 tests to fail.
**Note:** We need to update the website's learning gem5 examples to
reflect this change.
This demo board is a preset arm board, that can be used to run example
gem5 simulations. This board doesnt simulate any known hardware.
The board will be used to run benchmarks such as gapbs and npb to
collect stats. The plan is to show these stats on the gem5 resources
website to provide more details about the resources.
* Deprecates the setting of FS/SE mode via the `Simulator` module.
* Moved the creation of the `Root` object from the `Simulator` to the
board.
* Moved the setting of `sim_quantum` from the `Simulator` to the
processor.
* Allows for easier development of boards which support both SE and FS
mode simulation by moving board setup function calls to occur after the
set_workload function is call which sets a boards stats `is_fs` status.
There are several parts to this PR to work towards #1349 .
(1) Make RubySystem::getBlockSizeBytes non-static by providing ways to
access the block size or passing the block size explicitly to classes.
The main changes are:
- DataBlocks must be explicitly allocated. A default ctor still exists
to avoid needing to heavily modify SLICC. The size can be set using a
realloc function, operator=, or copy ctor. This is handled completely
transparently meaning no protocol or config changes are required.
- WriteMask now requires block size to be set. This is also handled
transparently by modifying the SLICC parser to identify WriteMask
types and call setBlockSize().
- AbstractCacheEntry and TBE classes now require block size to be set.
This is handled transparently by modifying the SLICC parser to
identify these classes and call initBlockSize() which calls
setBlockSize() for any DataBlock or WriteMask.
- All AbstractControllers now have a pointer to RubySystem. This is
assigned in SLICC generated code and requires no changes to protocol
or configs.
- The Ruby Message class now requires block size in all constructors.
This is added to the argument list automatically by the SLICC parser.
(2) Relax dependence on common functions in
src/mem/ruby/common/Address.hh
so that RubySystem::getBlockSizeBits is no longer static. Many classes
already have a way to get block size from the previous commit, so they
simply multiple by 8 to get the number of bits. For handling SLICC and
reducing the number of changes, define makeCacheLine, getOffset, etc. in
RubyPort and AbstractController. The only protocol changes required are
to change any "RubySystem::foo()" calls with "m_ruby_system->foo()".
For classes which do not have a way to get access to block size but
still used makeLineAddress, getOffset, etc., the block size must be
passed to that class. This requires some changes to the SimObject
interface for two commonly used classes: DirectoryMemory and
RubyPrefecther, resulting in user-facing API changes
User-facing API changes:
- DirectoryMemory and RubyPrefetcher now require the cache line size as
a non-optional argument.
- RubySequencer SimObjects now require RubySystem as a non-optional
argument.
- TesterThread in the GPU ruby tester now requires the cache line size
as a non-optional argument.
(3) Removes static member variables in RubySystem which control
randomization, cooldown, and warmup. These are mostly used by the Ruby
Network. The network classes are modified to take these former static
variables as parameters which are passed to the corresponding method
(e.g., enqueue, delayHead, etc.) rather than needing a RubySystem object
at all.
Change-Id: Ia63c2ad5cf0bf9d1cbdffba5d3a679bb4d3b1220
(4) There are two major SLICC generated static methods:
getNumControllers()
on each cache controller which returns the number of controllers created
by the configs at run time and the functions which access this method,
which are MachineType_base_count and MachineType_base_number. These need
to be removed to create multiple RubySystem objects otherwise NetDest,
version value, and other objects are incorrect.
To remove the static requirement, MachineType_base_count and
MachineType_base_number are moved to RubySystem. Any class which needs
to call these methods must now have a pointer to a RubySystem. To enable
that, several changes are made:
- RubyRequest and Message now require a RubySystem pointer in the
constructor. The pointer is passed to fields in the Message class
which require a RubySystem pointer (e.g., NetDest). SLICC is modified
to do this automatically.
- SLICC structures may now optionally take an "implicit constructor"
which can be used to call a non-default constructor for locally
defined variables (e.g., temporary variables within SLICC actions). A
statement such as "NetDest bcast_dest;" in SLICC will implicitly
append a call to the NetDest constructor taking RubySystem, for
example.
- RubySystem gets passed to Ruby network objects (Network, Topology).
The ROM field was originally intended as a future alternate way to load
VBIOS without the ROM being on the disk image. This code path is never
taken for the devices gem5 supports and there is no gem5 implementation.
Deprecate the rom_binary field for this reason.
Similarly, MMIO traces were only used for Vega10. Deprecate this as
Vega10 is now deprecated. The MMIO trace reader is kept as it may still
be useful in the future. It is still the primary way to handle devies
which have graphics capability. None of the devices supported by gem5
have graphics now that Vega10 is deprecated.
It makes much more sense for the Root Object to be create within the
board and passed where required. Creating it in the Simulator class is
not required.
For this to work the signuature of the `_pre_instantiate` function in
`AbstractBoard` has been updated to return the Root object.
Invalidate requests align to system cache line size. This causes
problems if the GPU cache hierarchy's cache line size is different than
the system as the unlaigned requests never return, leading to deadlock
on deferred dispatch.
This commit uses the cache line size from the GPU memory manager and
makes the cache line size there non-optional.
Tested with multiple RubySystems where CPU side was 64B and GPU side was
128B cache lines.
Vega10 is no longer officially supported by ROCm and ROCm is starting to
use some packet types not supported. These were originally kept to allow
users to use older disk images with newer gem5. Going forward the gem5
version and gem5-resources releases will be required to be the same to
prevent lingering old configs.
As a replacement for vega10*.py, mi300.py or mi200.py should be used.
HIP examples, cookbook, and rodinia configs can be replaced with the
standard flow of building / obtaining the GPU application and running
using mi300.py or mi200.py as they do not require any input options and
therefore do not require changes to the disk image.
This commit changes metric units (e.g. kB, MB, and GB) to binary units
(KiB, MiB, GiB) in various files. This PR covers files that were missed
by a previous PR that also made these changes.
This PR changes memory and cache sizes in various parts of the gem5
codebase to use binary units (e.g. KiB) instead of metric units (e.g.
kB). This makes the codebase more consistent, as gem5 automatically
converts memory and cache sizes that are in metric units to binary
units.
This PR also adds a warning message to let users know when an
auto-conversion from base 10 to base 2 units occurs.
There were a few places in configs and in the comments of various files
where I didn't change the metric units, as I couldn't figure out where
the parameters with those units were being used.
A bug was uncovered in that for various syscalls that used 64bit
parametres, the ABI for 32bit operating systems was passing the wrong
values to the syscalls, due to discrepancies between the target and
guest OS. This commit fixes that by replacing 64-bit types, or types
that are platform specific in size, with the exact correspondent for the
guest OS, thus producing the correct signature for the respective
syscalls. On top of this, the --param argument is added to the
starter_se script, in order to support attachment of remote debuggers.
The Vega ISA's s_memtime instruction is used to obtain a cycle value
from the GPU. Previously, this was implemented to obtain the cycle count
when the memtime instruction reached the execute stage of the GPU
pipeline. However, from microbenchmarking we have found that this under
reports the latency for memtime instructions relative to real hardware.
Thus, we changed its behavior to go through the scalar memory pipeline
and obtain a latency value from the the SQC (L1 I$). This mirrors the
suggestion of the AMD Vega ISA manual that s_memtime should be treated
like a s_load_dwordx2.
The default latency was set based on microbenchmarking.
Change-Id: I5e251dde28c06fe1c492aea4abf9f34f05784420
This commit contains the rest of the base 2 vs base 10 cache/memory
size clarifications. It also changes the warning message to use
warn(). With these changes, the warning message should now no
longer show up during a fresh compilation of gem5.
Change-Id: Ia63f841bdf045b76473437f41548fab27dc19631
This commit adds the --param option to the starter_se
configuration script for the Arm ISA. This is in order
to support attaching remote debugger sessions.
Change-Id: I2d8cc9f677f731948872003cca6066d1072ad570
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
A new host tag `gcn_gpu` has been added. This allows for selection of
those GPU tests which depend upon the gcn-gpu docker image to run.
In addition to this, the square GPU tests has been moved to the CI
tests. This ensures some GPU code is compiled and run on every PR.
The GPU TLB maxOutstandingReqs field gets limited by the associativity.
In the current setup, this means that the max outstanding requests is
32 even though the setup is for 64 entries. Update the associativity
to all 64 entries.
Change-Id: I2104e4647d97bf4d1cf5ac447e38ad6ac6a1a0d8
This is on by default in gem5 (see src/cpu/kvm/BaseKvmCPU.py), however
the perf counters only measure host instruction counters and GPUFS is
not concerned about accuracy of KVM CPU stats. There are also a larger
set of users who have access to KVM, but do not have the paranoid level
low enough to attach performance counters.
Therefore, make the performance counters OFF by default. They can still
be enabled, but this will allow for a larger set of users to follow the
upcoming GPUFS documentation without needing to read through a
troubleshooting section after seeing a gem5 error about the KVM paranoid
level.
Change-Id: I6b465559edf3ce17e7117ada049c60bd39aecd83
This change adds a new utility function for processing Spatter traces
into SpatterKernels under parse_kernels.
Additionally, it adds documentation for all the utility functions in
spatter_kernel.py.
Lastly, it adds an example script for running one spatter trace using
SpatterGenerator to the examples.
Rather than adding the options to *every* config that might be using
GPU_VIPER.py, just change the Ruby config to check if the option is
available before trying to use it. Otherwise, reverts to what was the
default on stable.
Change-Id: Ia6f1d0827d489ee2a35c598b644461cbff59e247
This allows for multiple gem5 simulations to be spawned from a single
parent gem5 process, as defined in a simgle gem5 configuration. In this
design _all_ the `Simulator`s are defined in the simulation script and
then added to the mutlisim module. For example:
```py
from gem5.simulate.Simulator import Simulator
import gem5.utils.multisim as multisim
# Construct the board[0] and board[1] as you wish here...
simulator1 = Simulator(board=board[0], id="board-1")
simulator2 = Simulator(board=board[1], id="board-2")
multisim.add_simulator(simulator1)
multisim.add_simulator(simulator2)
```
This specifies that two simulations are to be run in parallel in
seperate threads: one specified by `simulator1` and another by
`simulator2`. They are then added to MultiSim via the
`multisim.add_simulator` function. The user can specify an id via the
Simulator constructor. This is used to give each process a unique id and
output directory name. Given this, the id should be a helpful name
describing the simulation being specified. If not specified one is
automatically given.
To run these simulators we use `<gem5 binary> -m gem5.utils.multisim
<script> -p <num_processes>`. Note: multisim is an executable module in
gem5. This is the same module we input into our scripts to add the
simulators. This is an intentionally modular encapsulated design. When
the module processes a script it will schedule multiple gem5 jobs and,
dependent on the number of processes specified, will create child gem5
processes to processes tjese jobs (jobs are just gem5 simulations in
this case). The `--processes` (`-p`) argument is optional and if not
specified the max number of processes which can be run concurrently will
be the number of available threads on the host system.
The id for each process is used to create a subdirectory inside the
`outputdor` (`m5out`) of that id name. E.g, in the example above the
ID's are `board-1` and `board-2`. Therefore the m5 out directory will
look as follows:
```sh
- m5out
- board-1
- stats.txt
- config.ini
- config.json
- terminal.out
- board-2
- stats.txt
- config.ini
- config.json
- terminal.out
```
Each simulations output is encapsulated inside the subdirectory of the
id name.
If the multisim configuation script is passed directly to gem5 (like a
traditional gem5 configuraiton script, i.e.: `<gem5 binary> <script>`),
the user may run a single simulation specified in that script by passing
its id as an argument. E.g. `<gem5 binary> <script> board-1` will run
the `board-1` simulation specified in `script`. If no argument is passed
an Exception is raised asking the user to either specify or use the
MultiSim module if multiprocessing is needed.
If the user desires a list of ids of the simulations specified in a
given MultiSim script, they can do so by passing the `--list` (`-l`)
parameter to the config script. I.e., `<gem5 binary> <script> --list`
will list all the IDs for all the simulations specified in`script`.
This change comes with two new example scripts found in
'configs/example/gem5_library/multsim" to demonstrate multisim in both
an SE and FS mode simulation. Tests have been added which run these
scripts as part of gem5' Daily suite of tests.
Notes
=====
* **Bug fixed**: The `NoCache` classic cache hierarchy has been modified
so the Xbar is no longet set with a `__func__` call. This interfered
with MultiProcessing as this structure is not serializable via Pickle.
This was quite bad design anyway so should be changed
* **Change**: `readfile_contents` parameter previously wrote its value
to a file called "readfile" in the output dorectory. This has been
changed to write to a file called "readfile_{hash}" with "{hash}" being
a hash of the `readfile_contents`. This ensures that, during multisim
running, this file is not overwritten by other processes.
* **Removal note**: This implementation supercedes the functionality
outlined in 'src/python/gem5/utils/multiprocessing'. As such, this code
has been removed.
Limitations/Things to Fix/Improve
=================================
* Though each Simulator process has its own output directory (a
subdirectory within m5out, with an ID set by the user unique to that
Simulator), the stdout and stderr are still output to the terminal, not
the output directory. This results in: 1. stdout and stderr data lost
and not recorded for these runs. 2. An incredibly noisy terminal output.
* Each process uses the same cached resources. While there are locks on
resources when downloading, each processes will hash the resources they
require to ensure they are valid. This is very inefficient in cases
where resources are common between processes (e.g., you may have 10
processes each using the same disk image with each processes hashing the
disk images independently to give the same result to validate the
resources).
Change-Id: Ief5a3b765070c622d1f0de53ebd545c85a3f0eee
---------
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
The GPUFS scripts include support for dumping and resetting
stats at kernel boundaries by identifying specific GPU kernel
exit events. This commit extends that support to work with
GPU SE-mode support.
Change-Id: I662233ae71e2987d90af1fd0100e29036b2ef1c6
GPU_VIPER.py was modified to use these options but they did not exist,
breaking GPUFS. This commit adds them to fix the issue.
Change-Id: I0095f400ea606c4e8d91a41870ef208465cef803
Adding RP_choose function to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
Adding RP_choose functions to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
Adding RP_choose functions to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
Add a config capable of simulating MI300X ISA (gfx942). This is similar
to the mi200.py config and uses the same scripts followed by some
tuneable parameters. This config optionally lets the user call the
runMI300GPU function with gem5 resources. This allows for something like
the following before a VIPER stdlib python is available:
```
import mi300
from gem5.resources.resource import obtain_resource
disk = obtain_resource("x86-gpu-fs-img")
kernel = obtain_resource("x86-linux-kernel-5.4.0-105-generic")
app = obtain_resource("square-gpu-test")
mi300.runMI300GPUFS("X86KvmCPU", disk, kernel, app)
```
Tested cold boot config, checkpoint create and restore, and using gem5
resources.
Change-Id: I50a13d7a3d207786b779bf7fd47a5645256b1e6a
This is the version for MI300. For the most part, it is the same as
MI200 with the exception of architected flat scratch (not yet
implemented in gem5) and therefore a new version enum is required.
Change-Id: Id18cd7b57c4eebd467c010a3f61e3117beb8d58a
These changes to sweep and sweep_hybrid for NVM allow them to run. I'm
not an expert on this, so I'm not sure if these are technically correct,
but they no longer fail when running
`build/X86/gem5.opt configs/nvm/sweep.py` and `build/X86/gem5.opt
configs/nvm/sweep_hybrid.py`
GitHub Issue: #669
The extended control registers were not being updated in the KVM thread
context nor updated in the KVM state. This was causing issues when
checkpointing since the XCR0 value was reverting to the default value
rather than what it was previously before the checkpoint. THis was
causing multiple applications to crash due to executing instructions
which are now illegal instructions due to XCR0 being incorrect.
This commit adds the XCR0 as a misc register similar to the exiting x86
control registers and adds all of the helper functions to access and set
the register value. It also adds support for updating the KVM CPU's
state with the register value and updating the thread context's misc reg
value so that it is checkpointed along with the other misc regs.
Note that this does *not* add support for XSAVE of the AVX state (i.e.,
the upper 128 bits of YMM registers). It does however fix the immediate
problem in issue #958 .
Change-Id: I97456c8b57cbc7b381bd4be94944ce6567a43c76
Includes fixes for several bugs reported via email, self found, and
internal reports. Also includes runs through Valgrind and UBsan. See
individual commits for more details.
The scalar cache is not being invalidated which causes stale data to be
left in the scalar cache between GPU kernels. This commit sends
invalidates to the scalar cache when the SQC is invalidated. This is a
sufficient baseline for simulation.
Since the number of invalidates might be larger than the mandatory queue
can hold and no flash invalidate mechanism exists in the VIPER protocol,
the command line option for the mandatory queue size is removed, which
is the same behavior as the SQC.
Change-Id: I1723f224711b04caa4c88beccfa8fb73ccf56572