Add a config capable of simulating MI300X ISA (gfx942). This is similar
to the mi200.py config and uses the same scripts followed by some
tuneable parameters. This config optionally lets the user call the
runMI300GPU function with gem5 resources. This allows for something like
the following before a VIPER stdlib python is available:
```
import mi300
from gem5.resources.resource import obtain_resource
disk = obtain_resource("x86-gpu-fs-img")
kernel = obtain_resource("x86-linux-kernel-5.4.0-105-generic")
app = obtain_resource("square-gpu-test")
mi300.runMI300GPUFS("X86KvmCPU", disk, kernel, app)
```
Tested cold boot config, checkpoint create and restore, and using gem5
resources.
Change-Id: I50a13d7a3d207786b779bf7fd47a5645256b1e6a
This is the version for MI300. For the most part, it is the same as
MI200 with the exception of architected flat scratch (not yet
implemented in gem5) and therefore a new version enum is required.
Change-Id: Id18cd7b57c4eebd467c010a3f61e3117beb8d58a
The extended control registers were not being updated in the KVM thread
context nor updated in the KVM state. This was causing issues when
checkpointing since the XCR0 value was reverting to the default value
rather than what it was previously before the checkpoint. THis was
causing multiple applications to crash due to executing instructions
which are now illegal instructions due to XCR0 being incorrect.
This commit adds the XCR0 as a misc register similar to the exiting x86
control registers and adds all of the helper functions to access and set
the register value. It also adds support for updating the KVM CPU's
state with the register value and updating the thread context's misc reg
value so that it is checkpointed along with the other misc regs.
Note that this does *not* add support for XSAVE of the AVX state (i.e.,
the upper 128 bits of YMM registers). It does however fix the immediate
problem in issue #958 .
Change-Id: I97456c8b57cbc7b381bd4be94944ce6567a43c76
The address has one too many zeros and is therefore placed in a memory
region usually used for system memory. As a result this causes failure
when trying to run a simulation with a huge amount of memory.
Change the address to be within the C000'0000h - FFFF'FFFFh X86 I/O hole
as was intended.
Change-Id: I5d03ac19ea3b2c01a8c431073c12fa1868b3df24
A user reported a bug with the SSE4.1 version of memcmp in libc. When
enabled the simulated program crashes with SIGILL. After attempting all
fixes recommended by Intel SDM and still not working, turning the bit
off instead.
Similar, the default XSAVE functionality is not completely implemented
for AVX and newer ISA extensions. Therefore, there is not much point to
claiming to support the more advanced versions of XSAVE (XSAVEOPT,
XSAVEC, XSAVES, and XGETBV with ECX=1).
Note that none of these bits are enabled for non-GPU full system
simulations (see src/arch/x86/X86ISA.py). This only impacts GPUFS
simulations.
Change-Id: I8eb7bf0f2a0a29226095e7889fec9c1e8a65f88f
This PR is offloading some of the partitioning logic to the partitioning
manager, effectively changing
the partitioning interface. Rather than always relying on the
PartitionFieldExtention data structure to
convey partition IDs, we make it implementation defined by introducing
the partitioning manager abstraction.
We want user to be able to extract the partitionId more flexibly and
this requires using a SimObject.
Users can extend the PartitioningManager, overriding the
readPacketPartitionId, therefore providing their
own mean of injecting/extracting partitioning data from a packet
This change sets the `release` of the ARM board at the config file
instead of overriding the release on the ArmBoard. This change partially
solves issue 932 as the system taking and restoring the checkpoint is
consistent across KVM and timing CPUs respectively.
Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>
Currently gem5 assumes that there is only one command processor (CP)
which contains the PM4 packet processor. Some GPU devices have multiple
CPs which the driver tests individually during POST if they are used or
not. Therefore, these additional CPs need to be supported.
This commit allows for multiple PM4 packet processors which represent
multiple CPs. Each of these processors will have its own independent
MMIO address range. To more easily support ranges, the MMIO addresses
now use AddrRange to index a PM4 packet processor instead of the
hard-coded constexpr MMIO start and size pairs.
By default only one PM4 packet processor is created, meaning the
functionality of the simulation is unchanged for devices currently
supported in gem5.
Change-Id: I977f4fd3a169ef4a78671a4fb58c8ea0e19bf52c
The top level AMDGPUDevice currently reads/writes all unknown registers
to/from a map containing the previously written value. This is intended
as a way to handle registers that are not part of the model but the
driver requires for functionality. Since this is at the top level, it
can mask changes to register values which do not go through the same
interface. For example, reading an MMIO, changing via PM4 queue, and
reading again returns the stale cached value.
This commit removes the usage of the regs map in AMDGPUDevice,
implements some important MMIOs that were previously handled by it, and
moves the unknown register handling to the NBIO aperture only. To reduce
the number of additional MMIOs to implement, the display manager in
vega10 is now disabled.
Change-Id: Iff0a599dd82d663c7e710b79c6ef6d0ad1fc44a2
gpu-compute: Add support for skipping GPU kernels
This commit adds two new command-line options:
--skip-until-gpu-kernel N
Skips (non-blit) GPU kernels until the target kernel is reached.
Execution continues normally from there. Blit kernels are not skipped
because they are responsible for copying the kernel code and metadata
for the non-blit kernels. Note that skipping kernels can impact
correctness; this feature is only useful if the kernel of interest has
no data-dependent behavior, or its data-dependent behavior is not based
on data generated by the skipped kernels.
--exit-after-gpu-kernel N
Ends the simulation after completing (non-blit) GPU kernel N.
This commit also renames two existing command-line options:
--debug-at-gpu-kernel -> --debug-at-gpu-task
--exit-at-gpu-kernel -> --exit-at-gpu-task
These were renamed because they count GPU tasks, which include both
kernels launched by the application as well as blit kernels.
Change-Id: If250b3fd2db05c1222e369e9e3f779c4422074bc
This change fixes two issues:
1) The --cacheline_size option was setting the system cache line size
but not the Ruby cache line size, and the mismatch was causing assertion
failures.
2) The submitDispatchPkt() function accesses the kernel object in
chunks, with the chunk size equal to the cache line size. For cache line
sizes >64B (e.g. 128B), the kernel object is not guaranteed to be
aligned to a cache line and it was possible for a chunk to be partially
contained in two separate device memories, causing the memory access to
fail.
Change-Id: I8e45146901943e9c2750d32162c0f35c851e09e1
Co-authored-by: Michael Boyer <Michael.Boyer@amd.com>
As X86 and RISCV are relying on a Table Walker cache, we
change their stdlib configs to use the newly defined
PrivateL1PrivateL2WalkCacheHierarchy
Change-Id: I63c3f70a9daa3b2c7a8306e51af8065bf1bea92b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
* Add Cache partitioning policies to manage and enforce cache partitioning:
* Add Way partition policy
* Add MaxCapacity partition policy
* Add PartitionFieldsExtension Extension class for Packets to store
Partition IDs for cache partitioning and monitoring
* Modify Cache Tags SimObjects to store partition policies
* Modify Cache Tags block eviction logic to use new partitioning policies
* Add example system and TrafficGen configurations for testing Cache
Partitioning Policies
Change-Id: Ic3fb0f35cf060783fbb9289380721a07e18fad49
Co-authored-by: Adrian Herrera <adrian.herrera@arm.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This change adds support to use KVM cores on the ARM board. The board
simulates gic to enable KVM, similar to the gem5 ARM FS configs. The
limitation is that it only supports VExpress_GEM5_V1.
Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>
Replace instances of "GCN3" with Vega. Remove gfx801 and gfx803. Rename
FIJI to Vega and Carrizo to Raven.
Using misc since there is not enough room to fit all the tags.
Change-Id: Ibafc939d49a69be9068107a906e878408c7a5891
This is matching what we are already doing in the starter_fs.py script
Change-Id: I50239050be9bd151a607ec892f8dd9322b24040b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The RFC is defaulted to a size of 0 which removes it completely. To use
the RFC set the --register-file-cache-size to a non-zero multiple of
two. In addition, rfc_pipe_length may be altered to increase or decrease
RFC latency benefit.
The RFC is defaulted to a size of 0 which removes it completely. To use
the RFC set the --register-file-cache-size to a non-zero multiple of
two. In addition, rfc_pipe_length may be altrered to increase or
decrease RFC latency benefit.
Change-Id: I6f5bf5b750eb64155fbc8c8343e9feadce5c9f79
1. Fix the wrong ISA detect of get_isa function
2. Fix the typo ObjectLIst.cpu_list
3. Fix missing PageTableWalkerCache
4. Fix the invalid default cpu_type paramter
Change-Id: I217ea8da8a6d8e712743a5b32c4c0669216ce6c4
`get_runtime_isa()` has been deprecated for some time. It is a leftover
piece of code from when gem5 was compiled to a single ISA and that ISA
used to configure the simulated system to use that ISA. Since multi-ISA
compilations are possible, `get_runtime_isa()` should not be used.
Unless the gem5 binary is compiled to a single ISA, a failure will
occur.
The new proceedure for specify which ISA to use is by the setting of the
correct `BaseCPU` implementation. E.g., `X86SimpleTimingCPU` of
`ArmO3CPU`.
This patch removes the remaining `get_runtime_isa()` instances and
removes the function itself. The `SimpleCore` class has been updated to
allow for it's CPU factory to return a class, needed by scripts in
"configs/common".
The deprecated functionality in the standard library, which allowed for
the specifying of an ISA when setting up a processor and/or core has
also been removed. Setting an ISA is now manditory.
Fixes#216.
- updated the x86-npb-benchmarks.py to use npb workloads and suites.
The suites and workloads are not in the database are also waiting
feedback. I am attaching the JSON file here.
[npb_workloads_suite.json](https://github.com/gem5/gem5/files/13431116/npb_workloads_suite.json)
To run the x86-npb-benchmarks.py script use the
GEM5_RESOURCE_JSON_APPEND env variable. The full command is:
```
GEM5_RESOURCE_JSON_APPEND=[path to npb_workloads_suite.json] ./build/X86/gem5.opt configs/example/gem5_library/x86-npb-benchmarks.py --benchmark [benchmark]
```
Change-Id: I248e6452ea4122e9260e34e4368847660edae577
Recent breaking changes in the DRAMSys API require user code to be
updated. These updates have been applied to the gem5 integration.
Furthermore, as DRAMSys started to use CMake dependency management,
it is no longer sensible to maintain two separate build systems for
DRAMSys. The use of the DRAMSys integration in gem5 will therefore
from now on require that CMake is installed on the target machine.
Additionally, support for snapshots have been implemented into DRAMSys
and coupled with gem5's checkpointing API.
This change updates the gem5 SST bridge to call m5.instantiate()
in the gem5 config script instead of in the SST component. This
allows more flexibility for the gem5-SST setup, as we can now write
traffic generators using the bridge.
Change-Id: I510a8c15f8fb00bdbdd60dafa2d9f5ad011e48f2
Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>
Add a --no-kvm-perf option to disable KVM perf counters for GPUFS
scripts. This is useful for users who have KVM enabled but configured
with more restrictive settings, which seems to be the default in newer
Linux distros.
Change-Id: I7508113d0f7c74deb21ea7b2770522885a0ec822
This change updates the gem5 SST Bridge to use SST 13.0.0. Changes are
made to replace SimpleMem class to StandardMem class as SimpleMem will
be deprecated in SST 14 and above. In addition, the translator.hh is
updated to translate more types of gem5 packets. A new parameter `ports`
was added on SST's side when invoking the gem5 component which does not
require recompiling the gem5 component whenever a new outgoing bridge is
added in a gem5 config.
This change updates the gem5 SST Bridge to use SST 13.0.0. Changes
are made to replace SimpleMem class to StandardMem class as
SimpleMem will be deprecated in SST 14 and above. In addition, the
translator.hh is updated to translate more types of gem5 packets.
A new parameter `ports` was added on SST's side when invoking the
gem5 component which does not require recompiling the gem5
component whenever a new outgoing bridge is added in a gem5 config.
Change-Id: I45f0013bc35d088df0aa5a71951422cabab4d7f7
Signed-off-by: Kaustav Goswami <kggoswami@ucdavis.edu>
Earlier, GPU checkpointing was working only if a checkpoint was created
before the first kernel execution. This pull request adds support to
checkpoint in-between any two kernel calls. It does so by doing the
following.
- Adds flush support in the GPU_VIPER protocol
- Adds flush support in the GPUCoalescer
- Updates cache recorder to use the GPUCoalescer during simulation
cooldown and cache warmup times.
This `mkdir` is problematic as it doesn't create the directory
recursively. This casues errors if `dir` is `X/Y/Z` and both `Y` and `Z`
has not been created. An error will be returned (`No such file or
directory`).
This issue was fixed with: https://github.com/gem5/gem5/pull/263. The
checkpointing code already recursively creates directories as needed.
Ergo was can remove this `mkdir` statement.
This `mkdir` is problematic as it doesn't create the directory
recursively. This casues errors if `dir` is `X/Y/Z` and both `Y` and `Z`
has not been created. An error will be returned (`No such file or
directory`).
This issue was fixed with: https://github.com/gem5/gem5/pull/263. The
checkpointing code already recursively creates directories as needed.
Ergo was can remove this `mkdir` statement.
Change-Id: Ibae38267c8ee1eba76d7834367aa1c54013365bc
The new script will automatically use the newly
defined O3_ARM_v7a_3_Etrace CPU to run a simple SE simulation while
generating elastic trace files.
The script is based on starter_se.py, but contains the following
limitations:
1) No L2 cache as it might affect computational delay calculations
2) Supporting SimpleMemory only with minimal memory latency
There restrictions were imported by the existing elastic trace
generation logic in the common library (collected by grepping
elastic_trace_en) [1][2][3]
Example usage:
build/ARM/gem5.opt configs/example/arm/etrace_se.py \
--inst-trace-file [INSTRUCTION TRACE] \
--data-trace-file [DATA TRACE] \
[WORKLOAD]
[1]: https://github.com/gem5/gem5/blob/stable/\
configs/common/MemConfig.py#L191
[2]: https://github.com/gem5/gem5/blob/stable/\
configs/common/MemConfig.py#L232
[3]: https://github.com/gem5/gem5/blob/stable/\
configs/common/CacheConfig.py#L130
Change-Id: I021fc84fa101113c5c2f0737d50a930bb4750f76
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
We define a new parent (ClusterSystem) to model a system
with one or more cpu clusters within it.
The idea is to make this new base class reusable by SE
systems/scripts as well (like starter_se.py)
Change-Id: I1398d773813db565f6ad5ce62cb4c022cb12a55a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Added a new feature to CHI protocol (in collaboration with @tiagormk).
Here is the Jira Ticket
[https://gem5.atlassian.net/browse/GEM5-1326](https://gem5.atlassian.net/browse/GEM5-1326
). As described in CHI specs, far atomic transactions enable remote
execution of Atomic Memory Operations. This pull request incorporates
several changes:
* Fix Arm ISA definition of Swap instructions. These instructions should
return an operand, so their ISA definition should be Return Operation.
* Enable AMOs in Ruby Mem Test to verify that AMOs work
* Enable near and far AMO in the Cache Controler of CHI
Three configuration parameters have been used to tune this behavior:
* policy_type: sets the atomic policy to one of the described in [our
paper](https://dl.acm.org/doi/10.1145/3579371.3589065)
* atomic_op_latency: simulates the AMO ALU operation latency
* comp_anr: configures the Atomic No return transaction to split
CompDBIDResp into two different messages DBIDResp and Comp