As pointed out here [1], the expected M5OP_ADDR for arm64 arch
is 0x10010000. This change reflects that.
[1] https://github.com/gem5/gem5/pull/725
Change-Id: I7e72f5ea20d4aacf3115a485ba7cd664d33d037e
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Remove the following files:
* src/dev/virtio/rng 2.cc
* src/dev/virtio/rng 2.hh
Which were a copy of rng.hh and rng.cc. Probably added to the repository
by accident. They were not compiled by scons
Change-Id: I9d1da19cc243c513ab7af887b1b6260d8e361b57
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Now that we are able to provide a view of the cache hierarchy from
the python world, we can start generating DTB entries for caches
and more specifically to properly fill the next-level-cache and
cache-level properties
Change-Id: Iba9ea08fe605f77a353c9e64d62b04b80478b4e2
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
One of things we miss in gem5 is the capability to neatly compose the
cache hierarchy of CPUs and clusters of CPUs. The BaseCPU
addPrivateSplitL1Caches and addTwoLevelCacheHierarchy APIs have
historically been used to bind cache levels together.
These APIs have been superseeded by the introduction of the Cache
hierarchy abstraction in the standard library. The standard library
makes it cleaner for a user to quickly instantiate a hierarchy of caches
with few lines of code. While this removes a lot of complexity for a
user, the Hierarchy objects still have little information about their
internal topology.
To address this problem, this patch adds a tree data structure to the
AbstractCacheHierarchy class, where every node of the tree represent
a cache in the hierarchy. In this way we will expose APIs for traversing
and querying the tree.
For example a 2 CPUs system with private L1, private L2 and shared L3
will contain the following tree:
[root]
|
[L3]
/\
/ \
[L2] [L2]
| |
[L1] [L1]
Change-Id: I78fe6ad094f0938ff9bed191fb10b9e841418692
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is a first step towards offloading some of the partitioning
logic to the partitioning manager. We start with this patch
by replacing the static readPacketPartitionId into a virtual
method owned by the manager.
The issue with readPacketPartitionId as of now is that it relies
on the fixed PartitionFieldExtention.
We want user to be able to extract the partitionId more flexibly
and this requires using a SimObject
Change-Id: I3bd2e81e2a97c55fc83548956fc59f422c8049a6
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit updates cpu by removing VectorXXX types and updates
FUs according to the newer SimdXXX ones. This is part of the
homogenization of RISCV Vector instruction types, which moved
from VectorXXX to SimdXXX.
Change-Id: I84baccd099b73a11cf26dd714487a9f272671d3d
This commit adds some more detailed instruction types for RISC-V
Vector. Concretely, it substitutes VectorIntegerArith,
VectorFloatArith, VectorIntegerReduce and VectorFloatReduce with
more specific types related to the operation that each instruction
performs, being consistent with SimdXXX ones.
Change-Id: Iaffa74871ccc56d8c3627e1f1e111b9bc9e864af
This commit fixes two RISC-V instruction types (VectorXXX) that
were used in ARM SVE to the proper SimdXXX ones.
Change-Id: Id632926a89ae2395234f3cf34adeab63844bdd57
This PR fixes#948 in which running KVM CPUs through the updated gem5
interface in SE mode causes an immediate crash.
To fix this, I added a check to set_se_binary_workload that checks if
any of the cores are KVM, and if so, sets a couple of knobs for the
board and process that are required to make KVM work. The depecated
se.py script, which sets these knobs, is able to run KVM in SE mode just
fine, so doing the same here fixed the bug.
When running `scons build/ALL/gem5.opt --with-ubsan`, with GCC, the
following error was returned:
```
[ CXX] src/base/loader/image_file_data.cc -> ALL/base/loader/image_file_data.o
In file included from /usr/include/string.h:535,
from /usr/include/c++/11/cstring:42,
from src/base/cprintf_formats.hh:33,
from src/base/cprintf.hh:38,
from src/base/logging.hh:49,
from src/base/loader/image_file_data.cc:40:
In function ‘char* strcpy(char*, const char*)’,
inlined from ‘int gem5::loader::doGzipLoad(int)’ at src/base/loader/image_file_data.cc:70:11,
inlined from ‘gem5::loader::ImageFileData::ImageFileData(const string&)’ atsrc/base/loader/image_file_data.cc:116:24:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:79:33: error: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’ offset [0, 19] is out of the bounds [0, 0] [-Werror=array-bounds]
79 | return __builtin___strcpy_chk (__dest, __src, __glibc_objsize (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
scons: *** [build/ALL/base/loader/image_file_data.o] Error 1
scons: building terminated because of errors.
```
I do not know the exact issue but using strcpy in this way (i.e.
`strcpy(char_pointer + offset, string)`) appears to trigger this error
with the undefined behavior sanitizer. The fix in this patch replaces
this with `strcat`.
Change-Id: I1a0c50c9022adc841e175aad0fe2247bfcb29d71
This commit adds support for vector unit-stride segment store operations
for RISC-V (vssegXeXX). This implementation is based in two types of
microops:
- VsSegIntrlv microops that properly interleave source registers into
structs.
- VsSeg microops that store data in memory as contiguous structs of
several fields.
Change-Id: Id80dd4e781743a60eb76c18b6a28061f8e9f723d
Gem5 issue: https://github.com/gem5/gem5/issues/382
Implement several features new in ROCm 6.0 and features required for
future devices. Includes the following:
- Support for multiple command processors
- Improve handling of unknown register addresses
- Use AddrRange for MMIO address regions
- Handle GART writes through SDMA copy
- Implement PCIe indirect reads and writes
- Improve PM4 write to check dword count
- Implement common MI300X instruction
This update modifies the test configuration to specify the versions of
resources used, rather than automatically using the latest versions.
Previously, if a resource was updated for a change, it could potentially
cause tests to fail if those tests were incompatible with the new
version of the resource.
Now, with this change, tests are tied to specific versions of resources,
ensuring that any updates to resources will require corresponding
updates to the tests to maintain compatibility.
Change-Id: I9633b1749f6c6c82af6aa6697b7e7656020f62fa
Currently gem5 assumes that there is only one command processor (CP)
which contains the PM4 packet processor. Some GPU devices have multiple
CPs which the driver tests individually during POST if they are used or
not. Therefore, these additional CPs need to be supported.
This commit allows for multiple PM4 packet processors which represent
multiple CPs. Each of these processors will have its own independent
MMIO address range. To more easily support ranges, the MMIO addresses
now use AddrRange to index a PM4 packet processor instead of the
hard-coded constexpr MMIO start and size pairs.
By default only one PM4 packet processor is created, meaning the
functionality of the simulation is unchanged for devices currently
supported in gem5.
Change-Id: I977f4fd3a169ef4a78671a4fb58c8ea0e19bf52c
PCIe can read/write to any 32-bit address using the PCI index/index2
registers as an address and then reading/writing the corresponding
data/data2 register.
This commit adds this functionality and removes one magic value being
written to support GPU POST. This feature is disabled for Vega10 which
relies on an MMIO trace for too many values to implement in the MMIO
interface.
Change-Id: Iacfdd1294a7652fc3e60304b57df536d318c847b
The SRBM write packets where previously not required. This commit
implements SRBM writes to set a register by using the new setRegVal
interface. SRBM writes seem to be used for SRIOV enabled devices.
Change-Id: I202653d339e882e8de59d69a995f65332b2dfb8c
The top level AMDGPUDevice currently reads/writes all unknown registers
to/from a map containing the previously written value. This is intended
as a way to handle registers that are not part of the model but the
driver requires for functionality. Since this is at the top level, it
can mask changes to register values which do not go through the same
interface. For example, reading an MMIO, changing via PM4 queue, and
reading again returns the stale cached value.
This commit removes the usage of the regs map in AMDGPUDevice,
implements some important MMIOs that were previously handled by it, and
moves the unknown register handling to the NBIO aperture only. To reduce
the number of additional MMIOs to implement, the display manager in
vega10 is now disabled.
Change-Id: Iff0a599dd82d663c7e710b79c6ef6d0ad1fc44a2
The SDMA engine can potentially be used to write to the GART address
range. Since gem5 has a shadow copy of the GART table to avoid sending
functional reads to device memory, the GART table must be updated when
copying to the GART range.
This changeset adds a check in the VM for GART range and implements the
SDMA copy packet writing to the GART range. A fatal is added to write
and ptePde, which are the only other two ways to write to memory, as
using these packets to update the GART table has not been observed.
Change-Id: I1e62dfd9179cc9e987659e68414209fd77bba2bd
The write data packet can write multiple dwords but currently always
assumes there is one dword, which can cause some write data to be
missed. This case is not common, but the number of dwords is implicitly
defined in the PM4 header.
This changeset passes the PM4 header to write data so that the correct
number of dwords can be determined. For now we assume no page crossing
when writing multiple dwords as the driver should be checking for that.
Change-Id: I0e8c3cbc28873779f468c2a11fdcf177210a22b7
The ROCm 6.0 driver adds a node_id field to interrupts which must match
before passing on the interrupt to be cleared by the cookie from gem5's
interrupt handler implementation. Add this field and enable for gfx942.
The usage of the field can be seen in event_interrupt_isr_v9_4_3 at
https://github.com/ROCm/ROCK-Kernel-Driver/blob/roc-6.0.x/drivers/
gpu/drm/amd/amdkfd/kfd_int_process_v9.c#L449
Change-Id: Iae8b8f0386a5ad2852b4a3c69f2c161d965c4922
The main decoder for GPU instructions looks at the first 9 bits of a
dword to determine either the instruction or a subDecode table with more
information for specific instructions types. For flat instructions the
first 9 bits currently consist of 6 fixed encoding bits, a reserved bit,
and the first two bits of the opcode. Hence to support all opcodes there
are four indirections to the flat subDecode table. In MI300 the reserved
bit is part of a field to determine memory scope and therefore may be
non-zero.
This commit adds four addition calls to the subDecode table for the
cases where the scope bit is 1. See page 468 (PDF page 478) below:
https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/
instruction-set-architectures/
amd-instinct-mi300-cdna3-instruction-set-architecture.pdf
Change-Id: Ic3c786f0ca00a758cbe87f42c5e3470576f73a32
gpu-compute: Add support for skipping GPU kernels
This commit adds two new command-line options:
--skip-until-gpu-kernel N
Skips (non-blit) GPU kernels until the target kernel is reached.
Execution continues normally from there. Blit kernels are not skipped
because they are responsible for copying the kernel code and metadata
for the non-blit kernels. Note that skipping kernels can impact
correctness; this feature is only useful if the kernel of interest has
no data-dependent behavior, or its data-dependent behavior is not based
on data generated by the skipped kernels.
--exit-after-gpu-kernel N
Ends the simulation after completing (non-blit) GPU kernel N.
This commit also renames two existing command-line options:
--debug-at-gpu-kernel -> --debug-at-gpu-task
--exit-at-gpu-kernel -> --exit-at-gpu-task
These were renamed because they count GPU tasks, which include both
kernels launched by the application as well as blit kernels.
Change-Id: If250b3fd2db05c1222e369e9e3f779c4422074bc
MI200 adds support for four FP32 packed math instructions. These are
VOP3P instructions which have a negative input modifier field. The
description made it unclear if these were used for F32 packed math
however the assembly of some Tensile kernels are using these modifiers
therefore adding support for them. Tested with PyTorch nn.Dropout kernel
which is using negative modifiers.
Change-Id: I568a18c084f93dd2a88439d8f451cf28a51dfe79
The datatype is U32 but should be F32. This is causing an implicit cast
leading to incorrect results. This fixes nn.Dropout in PyTorch.
Change-Id: I546aa917fde1fd6bc832d9d0fa9ffe66505e87dd
This change fixes two issues:
1) The --cacheline_size option was setting the system cache line size
but not the Ruby cache line size, and the mismatch was causing assertion
failures.
2) The submitDispatchPkt() function accesses the kernel object in
chunks, with the chunk size equal to the cache line size. For cache line
sizes >64B (e.g. 128B), the kernel object is not guaranteed to be
aligned to a cache line and it was possible for a chunk to be partially
contained in two separate device memories, causing the memory access to
fail.
Change-Id: I8e45146901943e9c2750d32162c0f35c851e09e1
Co-authored-by: Michael Boyer <Michael.Boyer@amd.com>
From [1] The PrivateL1PrivateL2Cache hierarchy has been amended with an
MMUCache, which is basically a small cache in front of the page table
walker.
Not every ISA makes use of it.
Arm for example already implements caching of page table walks, via the
partial_levels parameter in the ArmTLB.
With this patch we define a new module which explicitly makes use of the
WalkCache. Configurations that do not require
another cache in the first level of the memsys (for the ptw) can use the
PrivateL1PrivateL2CacheHierarchy
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/49364
In the RISC-V unprivileged spec[1], the misaligned load/store support is
depend on the EEI.
In the RISC-V privileged spec ver1.12[2], the PMA specify wether the
misaligned access is support for each data width and the memory region.
In the [3] of `mcause` spec, we cloud directly raise misalign exception
if there is no memory region misalignment support. If the part of memory
region support misaligned-access, we need to translate the `vaddr` to
`paddr` first then check the `paddr` later. The page-fault or
access-fault is rose before misalign-fault.
The benefit of moving check_alignment option from ISA option to PMA
option is we can specify the part region of memory support misalign
load/store.
MMU will check alignment with virtual addresss if there is no misaligned
memory region specified. If there are some misaligned memory region
supported, translate address first and check alignment at final.
[1]
https://github.com/riscv/riscv-isa-manual/blob/main/src/rv32.adoc#base-instruction-formats
[2]
https://github.com/riscv/riscv-isa-manual/blob/main/src/machine.adoc#physical-memory-attributes
[3]
https://github.com/riscv/riscv-isa-manual/blob/main/src/machine.adoc#machine-cause-register-mcause
As X86 and RISCV are relying on a Table Walker cache, we
change their stdlib configs to use the newly defined
PrivateL1PrivateL2WalkCacheHierarchy
Change-Id: I63c3f70a9daa3b2c7a8306e51af8065bf1bea92b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
From [1] The PrivateL1PrivateL2Cache hierarchy has been amended
with an MMUCache, which is basically a small cache in front
of the page table walker. Not every ISA makes use of it.
Arm for example already implements caching of page table
walks, via the partial_levels parameter in the ArmTLB.
With this patch we define a new module which explicitly makes
use of the WalkCache. Configurations that do not require
another cache in the first level of the memsys (for the ptw)
can use the PrivateL1PrivateL2CacheHierarchy
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/49364
Change-Id: I17f7e68940ee947ca5b30e6ab3a01dafeed0f338
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit adds two additional whitespaces in the definition of
GuestAddr as well as in the operator << overload.
Change-Id: Ifb371a09b378fcf4862a768f113b5963b45bd167
There were two errors causing the Weekly tests to fail. Each has a patch
in this PR:
1. Fixed incorrect version for the `artifact-downloader` (v4 instead of
v3).
2. Fixed incorrect use of `working-directory` which use of
`build/VEGA_X86/gem5.opt` to fail (not accessible in set
`working-directory`. The default `${github.workspace}` is sufficient.
Only the lower 32 bit of return values of pseudo instructions are
stored (in a0). Therefore, the upper 32 bit are stored in a1 to
enable a correct return value.
Change-Id: Idf33c325033281fc191a9285eb5d34fd4965cde9