Release of MI300X simulation capability:
- Implements the required MI300X features over MI200 (currently only
architecture flat scratch).
- Make the gpu-compute model use MI200 features when MI300X / gfx942 is
configured.
- Fix up the scratch_ instructions which are seem to be preferred in
debug hipcc builds over buffer_.
- Add mi300.py config similar to mi200.py. This config can optionally
use resources instead of command line args.
It appears we have been trying to read 64-bit arguments for ARM32 since
695583709b. I noticed that SYS_OPEN was
trying to read a really long string as the pathname argument and it
turned out it was reading from the wrong stack offset. With this change
I can successfully run some of the semihosting tests for ARM32.
Change-Id: Ie154052dac4211993fb6c4c99d93990123c2eacf
In BaseSemihosting::readString() we were using the len argument to
allocate a std::vector without checking whether the value makes any
sense. This resulted in a std::bad_alloc exception being raised prior to
https://github.com/gem5/gem5/pull/1142 for my semihosting tests. This
commit prevents semihosting from reading more than 64K for string
arguments which should be more than sufficient for any valid code.
Change-Id: I059669016ee2c5721fedb914595d0494f6cfd4cd
This commit fixes the implementation of vrgather instruction based on
rvv 1.0.
In section 16.4. Vector Register Gather Instructions,
> Vector-scalar and vector-immediate forms of the register gather are
also provided. These read one element from the source vector at the
given index, and write this value to the active elements of the
destination vector register. The index value in the scalar register and
the immediate, zero-extended to XLEN bits, are treated as unsigned
integers. If XLEN > SEW, the index value is not truncated to SEW bits.
The fix zero-extends the index value in the scalar register and the
immediate.
Architected flat scratch is added in MI300 which store the scratch base
address in dedicated registers rather than in SGPRs. These registers are
used by scratch_ instructions. These are flat instruction which
explicitly target the private memory aperture. These instructions have a
different address calculation than global_ instructions.
This change implements architected flat scratch support, fixes the
address calculation of scratch_ instructions, and implements decodings
for some scratch_ instructions. Previous flat_ instructions which happen
to access the private memory aperture have no change in address
calculation. Since scratch_ instructions are identical to flat_
instruction except for address calculation, the decodings simply reuse
existing flat_ instruction definitions.
Change-Id: I1e1d15a2fbcc7a4a678157c35608f4f22b359e21
Add support for the following two extensions:
[Zvfh](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#185-zvfh-vector-extension-for-half-precision-floating-point):
Vector Extension for Half-Precision Floating-Point
[Zvfhmin](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#184-zvfhmin-vector-extension-for-minimal-half-precision-floating-point):
Vector Extension for Minimal Half-Precision Floating-Point
For instructions (`vfncvt[.rtz].x[u].f.w`) and (`vfwcvt.f.x[u].v`) which
will become defined when `SEW = 8`, a new template
`VectorFloatWideningAndNarrowingCvtDecodeBlock` is added and 8-bit
floating point type (`float8_t`) is defined.
The data type `float8_t` is introduced in the newer `3e` version of the
SoftFloat Package, however, the current version in use is `3d` which
does not include this definition. Despite this, `float8_t` is utilized
solely for constructing the `vfncvt[.rtz].x[u].f.w` and
`vfwcvt.f.x[u].v` instructions when `SEW = 8`. There are no operations
that directly manipulate data of the `float8_t` type.
This PR implements FEAT_MPAM on the CPU side. We define a MPAM system
registers and a mechanism
for tagging memory requests with the MPAM information bundle as
specified in existing documentation [1].
What this PR is *not* covering is the MPAM implementation in a MSC
(Memory System Component).
Which means at the moment it's only possible to have static partitioning
schemes (via the PartitioningPolicies
already part of gem5) and there is currently no way to dynamically
program partitions at runtime.
[1]: https://developer.arm.com/documentation/ddi0487/latest/
Quote from change[1]
> The RISC-V spec clarifies the CSR instruction operation, some of them
shall not read or write CSR by the hints of RD/RS1/uimm, but the
original version use the 'data != oldData' condition to determine
whether write or not, and always read CSR first.
See CSR instruction in spec:
Section 9.1 Page 56 of
https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf
|||Register operand|||
|--- |--- |--- |--- |--- |
|Instruction|rd is x0|rs1 is x0|Reads CSR|Writes CSR|
|CSRRW|Yes|-|No|Yes|
|CSRRW|No|-|Yes|Yes|
|CSRRS/CSRRC|-|Yes|Yes|No|
|CSRRS/CSRRC|-|No|Yes|Yes|
|||Immediate operand|||
|Instruction|rd is x0|uimm = 0|Reads CSR|Writes
CSR|
|CSRRWI|Yes|-|No|Yes|
|CSRRWI|No|-|Yes|Yes|
|CSRRSI/CSRRCI|-|Yes|Yes|No|
|CSRRSI/CSRRCI|-|No|Yes|Yes|
The issue cause the ubuntu hanging because we shared the same status CSR
with `mstatus`, `sstatus` and `ustatus` and interrupt enabling CSR with
mip, sip and uip. We may need to read origin CSR without effect of
unmask bits to avoid override the bits of other CSR. Now the ubuntu can
work after the patch merged.
[1] https://gem5-review.googlesource.com/c/public/gem5/+/67717
The destination select should take a value of the selection size (dword,
word, or byte) starting at bit 0, move that to the selected destination,
and then apply the unused constraint (DST_U) to the remaining word or
bytes. Currently the code is selecting the word/byte currently being
iterated over, rather than the least significant word/byte. As a result,
any selection that is not word 0 or byte 0 will be replaced with the
original destination value at those bits. This results in the wrong
value.
This commit changes the orig bits to be the original dest value at the
lowest word / byte location. Tested with the mfma_i32_16x16x16i8 example
which uses an SDWA V_OR_B32 to pack i8 values into VGPRs for the MFMA.
Change-Id: I54ed819479a25fa9276d29a8f14f0fea7fd71afe
The extended control registers were not being updated in the KVM thread
context nor updated in the KVM state. This was causing issues when
checkpointing since the XCR0 value was reverting to the default value
rather than what it was previously before the checkpoint. THis was
causing multiple applications to crash due to executing instructions
which are now illegal instructions due to XCR0 being incorrect.
This commit adds the XCR0 as a misc register similar to the exiting x86
control registers and adds all of the helper functions to access and set
the register value. It also adds support for updating the KVM CPU's
state with the register value and updating the thread context's misc reg
value so that it is checkpointed along with the other misc regs.
Note that this does *not* add support for XSAVE of the AVX state (i.e.,
the upper 128 bits of YMM registers). It does however fix the immediate
problem in issue #958 .
Change-Id: I97456c8b57cbc7b381bd4be94944ce6567a43c76
Includes fixes for several bugs reported via email, self found, and
internal reports. Also includes runs through Valgrind and UBsan. See
individual commits for more details.
The implementation of SYS_FLEN was missing, which caused picolibc to
treat this file as not implemented. Additionally, there was a bug in the
SYS_READ call that was comparing the wrong variable against the passed
buffer length. It was comparing the current file position against the
buffer length instead of the number of written bytes. Finally, pos was
unititialized which could result in spurious errors.
Change-Id: I8b487a79df5970a5001d3fef08d5579bb4aa0dd0
The VOP3 instruction encoding generally states that ABS/NEG modifiers in
the instruction encoding are only valid on floating point data types.
This is currently coded in gem5 to mean floating point *instructions*.
For untyped instructions like V_CNDMASK_B32, we don't actually know what
the data type is. We must trust that the compiler did not attempt to
apply these bits to non-FP data types.
This commit simply removes the asserts. The ABS/NEG modifiers are
therefore ignored which is consistent with the ISA documentation.
This is done on the lane manipulation instructions V_CNDMASK_B32,
V_READLINE_B32, and V_WRITELANE_B32 which are typically used to mask off
or move data between registers. Other bitwise instructions (e.g.,
V_OR_B32) keep the asserts as bitwise operations on FP types are
genernally illegal in languages like C++.
Change-Id: I478c5272ba96383a063b2828de21d60948b25c8f
Three main fixes:
- Remove the initDynOperandInfo. UBSAN errors and exits due to things
not being captured properly. After a few failed attempts playing with
the capture list, just move the lambda to a new method.
- Invalid data type size for some thread mask instructions. This might
actually have caused silent bugs when the thread id was > 31.
- Alignment issues with the operands.
Change-Id: I0297e10df0f0ab9730b6f1bd132602cd36b5e7ac
From sepc
> Instructions that access a non-existent CSR are reserved. Attempts to
access a CSR without appropriate privilege level raise
illegal-instruction exceptions or, as described in Section 13.6.1,
virtual-instruction exceptions. Attempts to write a read-only register
raise illegal-instruction exceptions. A read/write register might also
contain some bits that are read-only, in which case writes to the
read-only bits are ignored.
Setting the bit not in the mask should be ignore rather than raise the
illegal exception. The unmask bits of xstatus CSR are `WPRI`, the
unmasks bits of xie are `RO`(above priv v1.12) or `WPRI`(priv v1.11 and
priv v1.10), the unmask bits of xip CSR are `RO`(above priv v1.12) or
`WPRI`(priv v1.11) or `WIRI` (priv v1.10).
Note: The workload of `riscv-ubuntu-20.04-boot` uses the priv v1.10.
More details please see the `RISC-V spec: Privileged Architecture`
v1.10:
https://github.com/riscv/riscv-isa-manual/releases/tag/riscv-priv-1.10
v1.11(20190608):
https://github.com/riscv/riscv-isa-manual/releases/tag/Ratified-IMFDQC-and-Priv-v1.11
v1.12(20211213):
https://github.com/riscv/riscv-isa-manual/releases/tag/Priv-v1.12
Change-Id: I5d6e964e99b30b71da3dc267cd1575665d922633
Currently, the filesRootDir is prepended for all paths that do not start
with '/'. However, we should not be doing this for the special files :tt
and :semihosting-features. Noticed this while testing semihosting with a
non-empty filesRootDir.
Change-Id: I156c8b680cb71cdc88788be3b0e93fc1d52e11e5
The implementation of SYS_FLEN was missing, which caused picolibc to
treat this file as not implemented. Additionally, there was a bug in
the SYS_READ call that was comparing the wrong variable against the
passed buffer length. It was comparing the current file position against
the buffer length instead of the number of written bytes.
Finally, pos was unititialized which could result in spurious errors.
Change-Id: I8b487a79df5970a5001d3fef08d5579bb4aa0dd0
This PR added RISC-V Integer Conditional Operations Extension, which is
in the RVA23U64 Profile Mandatory Base. And the performance of
conditional move instructions in micro-architecture is an interesting
point to explore.
Zicond instructions added: czero.eqz, czero.nez
Changes based on spec:
https://github.com/riscvarchive/riscv-zicond/releases/download/v1.0.1/riscv-zicond_1.0.1.pdf
With this patch we are adding the specific logic required to tag memory
requests from the PE with MPAM data. This is happening within the MMU
before a translation is completed
Change-Id: Ifecb285244dd469a639150d69a7e884fe3c441be
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
This is for fast retrieval of the highest implemented
exception level
Change-Id: Id631c2b999d46a8b79570e4043ae04bc2b2e7531
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
The extended control registers were not being updated in the KVM thread
context nor updated in the KVM state. This was causing issues when
checkpointing since the XCR0 value was reverting to the default value
rather than what it was previously before the checkpoint. THis was
causing multiple applications to crash due to executing instructions
which are now illegal instructions due to XCR0 being incorrect.
This commit adds the XCR0 as a misc register similar to the exiting x86
control registers and adds all of the helper functions to access and set
the register value. It also adds support for updating the KVM CPU's
state with the register value and updating the thread context's misc reg
value so that it is checkpointed along with the other misc regs.
Note that this does *not* add support for XSAVE of the AVX state (i.e.,
the upper 128 bits of YMM registers). It does however fix the immediate
problem in issue #958 .
A checkpoint upgrader is also provided to add the default value of XCR0
if the checkpoint tag is missing.
Change-Id: I97456c8b57cbc7b381bd4be94944ce6567a43c76
Use it accordingly in the faulting/exception logic
Change-Id: I2f6360d04698b6fb7188e776f1d6966e99ce19b1
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
This is adding two templated functions for reading/writing
system registers (MiscRegs). It is introducing them inside
a new misc_regs namespace.
Change-Id: I21233337c057673d46d1147971ebabbfc2c2bb6a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Now that the Request has been made an Extensible object, it
can carry within itself much more data. It makes sense
to pass it to the TlbTestInterface as more information about
the table walk can be extracted from it.
This is also aligning with the testTranslation utility which
is expecting a request reference as first argument.
Change-Id: I3dbc9a81d6b4bcc1801246ba7eb4136774d8f3c7
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
They both make final checks to the VA->PA translation before
relinquishing control back to the translate client (usually
CPU code)
Change-Id: Ib0a9da25404248c22c6a240817d2f50f0913fdf7
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
The finalizePhysical is just checking if the physical
address falls within the m5op region (if using mmapped
m5ops). There's not reason why we shouldn't enable it
with virtual memory off
Change-Id: I5ab80fd4e7886743abd4b7d85937b72253b578d3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
We also unify the fault handling logic; rather than cleaning
up the WalkerState in several places scattered throughout the
walking code, we handle faults in the top level method
Change-Id: Ia22fb6f27044ff445fffbab228777a48efa473cb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
It's more efficient to pass a reference of the tester to the
TableWalkers. In this way a table walk check is tested directly
from the walkers instead of going through the MMU every time.
Change-Id: I9820dbabb8b551981005a65efa54a76b1a027541
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
This is done in order to differentiate between EL0 (unprivileged) and
EL1. Effectively it won't change much as most of the decisions are
now taken according to the translation regime which will be the
same regardless (EL10)
Change-Id: I218037e9c19cf638aff05c51869e439204d9af69
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Movfp instruction did not account for only copying the lower half of src
register if dataSize is 4.
GitHub Issue: #893
I used the test code in issue #893 to verify the fix is working.
gem5.fast does not currently build if the GPU model is built. This fixes
the array-bounds warnings allowing gem5.fast to build again.
Change-Id: I463c2847c3ecfd2257a70418fa247090b0493f9b
This commit adds more detailed instruction types for RISC-V Vector.
Concretely, it substitutes VectorIntegerArith, VectorFloatArith,
VectorIntegerReduce and VectorFloatReduce with more specific types
related to the operation that each instruction (e.g., VectorIntegerAdd
or VectorIntegerMult).
Additionaly, fixes two RISC-V instruction types (VectorXXX) that were
used in ARM SVE, placing the proper SimdXXX ones.
Change-Id: I31774fa6a7cd249abfffec68d11d3d77f08ad70b
CC @adriaarmejach
This PR is offloading some of the partitioning logic to the partitioning
manager, effectively changing
the partitioning interface. Rather than always relying on the
PartitionFieldExtention data structure to
convey partition IDs, we make it implementation defined by introducing
the partitioning manager abstraction.
We want user to be able to extract the partitionId more flexibly and
this requires using a SimObject.
Users can extend the PartitioningManager, overriding the
readPacketPartitionId, therefore providing their
own mean of injecting/extracting partitioning data from a packet
M5Ops C / C++ functions partially use 64 bit arguments and return value.
In general, 64 bit arguments and return values are possible for 32 bit
RISC-V systems as well, since the arguments and the return value is
split into two registers. However, at the moment, this does not work for
32 bit RISC-V systems on the simulator side, since there is a one to one
mapping between argument registers and m5op function parameters.
To solve this problem, the get() function of the RISC-V reg_abi is
updated. It now will merge two registers if there is a 64 bit argument.
For this, the function code has to be passed to the get() function. The
default value of this function code is set to 0xF00, since 0x00 is
already used for M5_ARM. The parameter list of other get() functions for
argument return is also extended by this function code parameter with
the keyword [[maybe_unused]].
To enable a return value of size 64 bit, a0 is assigned with the lower
32 bit and a1 with the higher 32 bit.
Related Issue: https://github.com/gem5/gem5/issues/881