`assert(interruptID >=0)` is always true as `interruptID` is an unsigned
int.
This was causing compilation tests failures in GCC-8 with the following
error:
```sh
src/arch/riscv/interrupts.cc:47:32: error: comparison is always true due to limited range of data type [-Werror=type-limits]
assert(interruptID >= 0);
```
Change-Id: I356be78d7f75ea5d20d34768fb8ece0f746be2fc
Previously, the S_ICACHE_INV instruction was unimplemented and
simulation panicked if it was encountered. This commit adds support for
executing the instruction by injecting a memory barrier in the scalar
pipeline and invalidating the ICACHE (or SQC)
Change-Id: I0fbd4e53f630a267971a23cea6f17d4fef403d15
The vlm.v and vsm.v unit-stride mask load/store instructions are
constructed with an incorrect VL when the current one is larger than
than VLEN/EEW (i.e. when LMUL > 1). This commit fixes the issue for both
instructions.
This patch provides unit-stride fault-only-first loads (i.e. vle*ff) for
the RISC-V architecture.
They are implemented within the regular unit-stride load (i.e. vle*). A
snippet named `fault_code` is inserted with templating to change their
behaviour to fault-only-first.
A part from this, a new micro based on the vset\*vl\* instructions
(VlFFTrimVlMicroOp) is inserted as the last micro in the macro
constructor to trim the VL to it's corresponding length based on the
faulting index.
This trimming micro waits for the load micros to finish (via data
dependency) and has a reference to the other micros to check whether
they faulted or not. The new VL is calculated with the VL of each micro,
stopping on the first faulting one (if there's such a fault).
I've tested this with VLEN=128,256,...,16384 and all the corresponding
SEW+LMUL configurations.
Change-Id: I7b937f6bcb396725461bba4912d2667f3b22f955
Vector unit-stride instructions have an EEW encoded directly in the instruction,
We should use that instead of SEW in vtype.
Change-Id: I282041ce8ed57fbcca899f7497ef6c6fb2dfcf85
Besides the standard RISC-V interrupts software, timer, and external
interrupt, the RISC-V specification also offers the possibility to
implement local interrupts. With this patch, we contribute an extension
of RiscvInterrupts that enables connecting interrupt sources to the
local interrupt controller. We assigned the local interrupts to
machine-level and gave them the highest priority. If two local
interrupts are pending, there exception code will be the tie-breaker
(higher ID > lower ID). 32 Bit systems only recognize the local
interrupts 16 to 31, 64 Bit systems 16 to 63.
Change-Id: Iff8d34e740b925dce351c0c6f54f4bd37a647e0c
---------
Co-authored-by: Robert Hauser <robert.hauser@uni-rostock.de>
The RISC-V privilege spec don't specify the implementation of
PMA(physical memory attribute), which is addressed in the previous
CL[1].
This CL creates the BasePMAChecker to support customized PMA so that we
can only focus on the features wanted in the study. The CL also leaves
the common methods `check` and `takeOverFrom` to make MMU easy to
interact with PMA.
[1] https://gem5-review.googlesource.com/c/public/gem5/+/40596
Change-Id: I9725e3a8f7f9276e41f0d06988259456149d2a77
Crypto instructions will cause an undefined instruction when executed
with SIMD disabled. The PR is also
refactoring their implementation by checking the release object instead
of the ID register field. This is improving
readability
This commit adjusts the logic in VectorFloatMaskMacroConstructor to
ensure the %(copy_old_vd)s section is not skipped when vl = 0, ensuring
correct values in destination vector register.
Change-Id: I2478722d6f003a0f2e4b3cd0ba3e845bed938ee6
This is the same problem as #715 .
We not only check for the presence of the relative FEAT_*,
we also check if AdvSIMD is enabled; we throw an undefined
instruction otherwise.
Change-Id: I1fd0cdc8057c5a7901774802dc076817f06c8e66
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Check directly if extension is enabled instead of looking
for ID register field value. This makes the code more readable
Change-Id: If0b882ac3464c3587731b72a7edb3b8b65ea86c7
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
We therefore rename it to exceptionLevel
Change-Id: I2a3aabaefa315d95bd034b13d95d5a5b0b8e9319
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
With the old code, the MAIR_EL1 register was checked when inserting
an EL2&0 TLB entry
Change-Id: I064032fb2946777c2f4c50c06a124f828245e18a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The problem with:
ELIs64(tc, aarch64EL == EL0 ? EL1 : aarch64EL);
Is that when we are executing at EL0 in host (EL2&0 translation
regime), the execution mode (AArch32 vs AArch64) is dictated
by EL2 and not by EL1 (which is the guest)
Change-Id: I463a2a9461c94d0886990ae3d0a6e22aeb4b9ea3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is the final step in the transformation process.
We limit the use of the "managing Exception Level" for
a translation in favour of the more standard "Translation
Regime"
This greatly simplifies our code, especially with VHE
where the managing el (EL2) could handle to different
translation regimes (EL and EL2&0).
We can therefore remove the isHost flag wherever it got
used. That case is automatically handled by the proper
regime value (EL2&0)
Change-Id: Iafd1d2ce4757cfa6598656759694e5e7b05267ad
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The Xt is not part of the architectural name of the register
and it was likely added with the introduction of extended
register (Xt) TLBIs in Armv8 to differentiate them with
the old Armv7 ones.
The use of _Xt was not consistent anyway: newer TLBIs were
already omitting it.
Change-Id: Ic805340ffa7b5770e3b75a71bfb76e055e651f8b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
We should stop using isHyp.. An hypervisor entry is flagged
already by the EL of the entry (el == EL2)
Change-Id: I20c3d06fa2b04e0b938a380ca917d0b596eddcf2
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The isHyp descriptor is an old artifact of armv7 and it flags a PL2
(AArch32) or EL2 & EL2&0 (AArch64) translations.
It is commonly set according to the EL/mode [1] but it may differ from
the execution state in case of explicit translation requests (via
the AT instruction as an example [2]).
There is really no need to complicate the masking of isHyp. We should
just make use of the tranType method (in charge of setting aarch64EL)
to properly set aarch64EL, and make isHyp coincide with the case of
aarch64EL == EL2.
This is a step towards the removal of the isHyp flag.
More specifically the patch does the following:
* HypMode translation type moved in the EL2 case
The translation is used by
ATS1HR/ATS1HW:
Performs stage 1 address translation as defined for PL2 and the
Non-secure state
* S1S2NsTran translation type moved in the EL1 case
The translation is used by
ATS12NSOPR/ATS12NSOPW:
Performs stage 1 and 2 address translations as defined for PL1 and the
Non-secure state
* S1CTran translation type can be at either EL1 or EL3
The translation is used by
ATS1CPR/ATS1CPW
Performs stage 1 address translation as defined for PL1 and the current
Security state
[1]: https://github.com/gem5/gem5/blob/stable/src/arch/arm/mmu.cc#L1281
[2]: https://github.com/gem5/gem5/blob/stable/src/arch/arm/mmu.cc#L1282
Change-Id: Ie653170f6053c5d8141a2de9f50febf5bf53ab9c
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
We should clean the instruction buffer after the fence.i is execute
to avoid execute old instruction for self-modifying code
Change-Id: Iece0ee0d10631fcd9bd17ee67cf0c92f72acdbd8
This commit adds a new template, Vector1Vs1VdMaskDeclare, to replace
the use of Vector1Vs1RdMaskDeclare in Vector1Vs1VdMaskFormat.
The change addresses the issue with the number of indices in srcRegIdxArr.
Only two indices are available in Vector1Vs1RdMaskDeclare, but instructions
that use Vector1Vs1VdMaskFormat, like 'vmsbf', require three indices
(for vs1, vs2(old_vd), and vm) to function correctly.
Change-Id: I0c966e11289ce07efcc3b0cc56948311289530ad
This commit simplifies the conditional logic in vmsbf/vmsof/vmsif
by removing an unnecessary variable and condition.
The updated logic checks 'this->vm' or the result of 'elem_mask(v0, i)'
directly, which prevents a segmentation fault regardless of
whether 'vm' is set or not.
Change-Id: I799fa7b684ff98959a64f9694ef9c854f3a1f43a
These are:
FEAT_AES,
FEAT_PMULL,
FEAT_SHA256,
FEAT_SHA1,
FEAT_CRC32
With this patch we are also enabling them by default by adding them to
the Armv8 release object. Some of them are mandatory anyway since
Armv8.1
Change-Id: I221ae8646d91151fdfaf97a4815168a4fe3d8c5a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Related to issue #703 , this PR removes GCN3 related files and updates
source code, documentation, and tests to switch over to Vega is that was
not done already. Highlights are:
- Remove all src/arch/amdgpu/gcn3 files and update Kconfigs.
- Remove references to GCN3 and replace with Vega where applicable.
- Update the build targets in the gcn-gpu Docker. This will need to be
rebuilt but not urgently.
- Remove the GCN3 tag in testlib. Most tests seem to be using Vega
already, so that commit is small.
Vega (gfx900) introduced new memory aperture registers to get the base
address and limit for LDS and private (scratch) memory. These have not
commonly been used by the compiler until ROCm 6. Now that the compiler
is generating reads from these special registers, implement the support
for them.
Tested with LULESH which is using the SHARED_BASE register (LDS) with
ROCm 6.0. This assembly seems to replace S_GETREG_B32 emitted by the
ROCm 5 compiler.
Change-Id: Id2bd26ce8ef687c84a647fa2ac2da54d657913e5
The files registers.cc, isa.cc, and decoder.cc do not match the header
name. This is a minor cleanup to make development more straightforward.
Change-Id: Ibab18dfe315b0ce84359939b490f8227ea43cac0
The Vega instructions.cc file is 47k lines long which results in both
large compilation times whenever it is modified and long style check
times. This makes iterating over more complex instruction
implementations very time consuming.
This commit moves the instruction definitions to multiple files based on
the instruction encoding (SOP2, VOP2, FLAT, DS, etc.). The resulting
files are much smaller (max is 8k lines) and compilation and style check
times are much more reasonable. Other than moving code around, there are
no functional changes in this commit.
Change-Id: Id4ac8e98ef11a58de5fd328f8a0cd7ce60a11819
The addition of std::optional in #732 caused a compile error. This
change fixes the error by checking to see if the value is present and
panicing otherwise.
Change-Id: I46c3fb76eb0e14ba7bede7c336293fbe9add8c84
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
1. Add the new double width for int64_t and uint64_t
2. Use the wider type to get the upper result of multiplication
Change-Id: Id6cfa6f274c65592b2b3e2b70c00f82954b41f1a
I’ve been working on a fix for the issue #759 where ‘vd’ incorrectly
stores all zeros when ‘vl’ is set to 0 in VectorIntMaskMacroConstructor.
My solution seems to work, but it behaves differently from other macros
when ‘vl’ = 0. Instead of pushing a ‘nop’ to ‘microops’, it pushes a
micro operation that remains ineffective due to ‘vl’ being 0.
Newer compilers error on -Warray-length in the recent MI200 patches due
to casting from a 32-bit data type to a 64-bit type. Change it to cast
the 32-bit integer first then 64-bit integer latter to remove the
warning.
Rerun of validation tests on the three instructions passed.
Change-Id: I0309e5f7b5b8cc8ce1651660ddddb120fa6e7666
Currently, the TLB enforces that the bit 63 of a physical address to be
zero. This check stems from the riscv-tests that checks for the bit 63
of a physical address [1]. This is due to the fact that the ISA
implicitly says that the physical address must be zero-extended on the
most significant bits that are not translated [2]. More details on this
issue is here [3].
The check for bit 63 of a physical address in the TLB is rather too
specific, and I believe the check of invalid physical address is alread
implemented in PMA. Thus, this change proposes to remove this check from
RISC-V TLB.
[1]
bd0a19c136/isa/rv64mi/access.S (L18)
[2] https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/8kO7X0y4ubo
[3] https://github.com/gem5/gem5/issues/238
Change-Id: I247e4d4c75c1ef49a16882c431095f6e83f30383
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
The PMAChecker and PMP are only used in the RisvISA and it should be in
the RiscvISA to simply the implementation
Change-Id: I4968e2de4c028cb2dceed977f2173fc8b1efd175
This patch is amending encodeAArch64SysReg so that it covers the case
where there are no arch numbers available for the misc index passed as
an argument.
This could happen if the register ID is a gem5 pseudo register which is
not associated with any architected op1/op2/crn/crm tuple.
Rather than panicking we return a nullopt.
Change-Id: I7ab70467105ef93c0c78ac4e999c7dc8e5e09925
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Implemented according to the ISA spec. Validated with silion. In
particular the sign extend is important for the signed variants and the
unsigned variants seem to overflow lanes (hence why there is no mask()
in the unsigned varints. FP16 -> FP32 continues using ARM's fplib.
Tested vs. an MI210. Clamp has not been verified.
Change-Id: Ifc09aecbc1ef2c92a5524a43ca529983018a6d59
Starting with MI200, packed math can operate on double dword inputs. In
this case, 64-bits of inputs (two VGPRs per lane) contain two FP32
values.
Add instructions to perform add, multiply, and FMA on packed FP32 types.
Change-Id: Ib838bff91a10e02e013cc7c33ec3d91ff08647b0
This change adds all of the missing flat/global atomics up to including
the new atomics in gfx90a (MI200). Adds all decodings and instruction
implementations with the exception of __half2 which does not have a
corresponding data type in gem5. This refactors the execute() and
completeAcc() methods by creating helper functions similar to what
initiateAcc() uses. This reduces redundant code for global atomic
instruction implementations.
Validated all except PK_ADD_F16, ADD_F32, and ADD_F64 which will be done
shortly. Verified the source/dest register sizes in the header are
correct and the template parameters for the new execute()/completeAcc()
methods are correct.
Change-Id: I4b3351229af401a1a4cbfb97166801aac67b74e4
Use the opSelectorToRegSym which will print the full range of VGPRs
(e.g., will now print v[2:3] instead of v2 when the source / dest is
64-bits). Fixes atomic disassembly prints. Now shows "glc" if GLC bit is
enabled. Fixes some VGPR fields being printed as an SGPR in places where
the 9-bit register index bit is implied (e.g., VDST).
This makes it easier to use a GPUExec trace to match with LLVM
disassembly when debugging.
Change-Id: Ia163774850f0054243907aca8fc8d0361e37fdd5
This adds the VOP3P and VOP3P_MAI encodings from the MI200 spec. These
instructions are used for packed math and miSIMD instructions. The first
19 VOP3P opcodes are implemented and validated against hardware. This
includes all instructions which operate on one dword containing two
packed 16-bit values of fp16, int16_t, or uint16_t.
Implement one MFMA instruction for now which was also validated against
hardware.