Updated the bib information of the local RISC-V interrupts.
Change-Id: I666c3df4529e159bd1946ca1a9623e47f84d5d9e
Signed-off-by: Robert Hauser <robert.hauser@uni-rostock.de>
Vector reduce float (widening and non-widening) and integer (widening)
instructions initialize the reduce loop operation with the first element
of the destination register (i.e. `Vd[0]`).
Since all reductions per spec seem to be `Vd[0] = Vs1[0] + Vs2[*]`
(where `+` is an arbitrary binary op and `*` indicates all active
elements) gem5 will calculate this incorrectly if `Vd[0]` and/or
`Vs1[0]` are non-neutral for the operation (the later case being because
it's not taken into account at all).
To solve this we just have to initialize the reduction loop to `Vs1[0]`
(the non-widening integer reduction already does this).
The gem5 crashed when user try to update register value from GDB because
PR[1] changes the index of CSR_XSTATUS to MISCREG_XSTATUS, which is out
of NUM_PHYS_MISCREGS.
The CSR_XSTATUS should use setRegWithMask to update it.
[1] : https://github.com/gem5/gem5/pull/1099
gem5 issue: https://github.com/gem5/gem5/issues/1299
Change-Id: Iefc0d1f5adfb98ecfda0e74907964b47d1864b6d
These two commits add agnostic capability for both tail/mask policies,
for vector memory and arithmetic instructions respectively. The common
policy for instructions is to act as undisturbed if one is (i.e. tail or
mask), or write all 1s if none.
For those instructions in which multiple micro instructions are
instantiated to write to the same register (`VlStride` and `VlIndex` for
memory, and `VectorGather`, `VectorSlideUp` and `VectorSlideDown` for
arithmetic), a (new) micro instruction named `VPinVdCpyVsMicroInst` has
been used to pin the destination register so that there's no need to
copy the partial results between them. This idea is similar to what's on
ARM's SVE code. This micro also implements the tail/mask policy for this
cases.
Finally, it's worth noting that while now using an agnostic policy for
both tail/mask should remove all dependencies with old destination
registers, there's an exception with `VectorSlideUp`. The
`vslideup_{vx,vi}` instructions need the elements in the offset to be
unchanged. The current implementation overrides the current vta/vma and
makes them act as undisturbed, since they require the old destination
register anyways. There's a minor issue with this though, as
`v{,f}slide1up` variants do not need this, but since they share the same
constructor, will act all the same.
Related issue #997.
Implement small translation table extension.
This feature relaxes the lower limit on the size of the translation
tables, by increasing the maximum permitted values of the T1SZ and T0SZ
field in: TCR_EL1, TCR_EL2, TCR_EL3,VTCR_EL2 and VSTCR_EL2
Change-Id: I4c2187815b2d7f14407edb38095c6bcc2004b62a
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
We are currently using the LongDecriptor for both stage1
and stage2 translations. There are several cases where
the bitfield meaning changes depending on the translation
stage.
Change-Id: Ic33d9ef225a57fd79ce2b4bf47896aeb6bdd8d9c
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
For ARMv8 CPUs this register allows reading a 64-bit cycle counter in
from 32-bit execution state.
Change-Id: I7cd9e2711ada5156920440cc3c89e7a74ca54a49
This patch is adding a functional implementation of FEAT_XS. Unless we
operate with DVM enabled, TLBIs broadcasting is accomplished in 0 time;
so there is no timing benefit introduced by enabling FEAT_XS other than
the way it affects TLB management (invalidation)
Change-Id: I067cb8b7702c59c40c9bbb8da536a0b7f3337b5d
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The acquire-release flavor of the ldadd instruction should read ldaddalx
(eg. ldaddalb/ldaddalh) according to specification. However, this is
currently noted as ldadd"la"x (eg. ldaddlab/ldaddlah).
Issue: https://github.com/gem5/gem5/issues/1224
Change-Id: Ib932fa0e572207729c923c27f24c34cc21dff0e5
Co-authored-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Previously, all of the TLB lookup/insert functions were using the full
virtual addresses even though the variables in the functions said "vpn."
This change explicitly converts the virtual address to the VPN without
any least significant zeros for the offset. I.e., vpn >> page_size.
The main bug solved in this changeset is the asid was |'d with the upper
bits of the virtual address, but sometimes there were all 1's.
Therefore, you could get a TLB hit even if the ASID was different.
Interestingly, the page that seemed to cause these issues was a 1 GiB
page.
This change also starts refactoring some of the page table details to
support sv46 and sv57 page table formats.
In my testing, the Linux kernel boot uses large pages (even OpenSBI uses
large pages), so it seems that large pages also work. However, this
seems like magic to me, so I'm not sure if it's correct.
This change also updates some asserts, and debug statements with more
useful debugging information.
Partially fixes#1235. More testing needs to be done to be confident.
Currently, gem5's inst tracer prints the whole vector register container
by default. The size of vector register containers in gem5 is the
maximum size allowed by the ISA. For vector-length agnostic (VLA) vector
registers, this means ARM SVE vector container is 2048 bits long, and
RISC-V vector container is 65535 bits long. Note that VLA implementation
in gem5 allows the vector length to be varied within the limit specified
by the ISAs.
However, in most use cases of gem5, the vector length is much less than
65535 bits. This causes two issues: (1) the vector container requires
allocating and moving around a large amount of unused data while only a
fraction of it is used, and (2) printing the execution trace of a vector
register results in a wall of text with a small amount of useful data.
This change addresses the problem (2) by providing a mechanism to limit
the amount data printed by the instruction tracer. This is done by
adding a function printing the first X bits of a vector register
container, where X is the vector length determined at runtime, as
opposed to the vector container size, which is determined at compilation
time.
Change-Id: I815fa5aa738373510afcfb0d544a5b19c40dc0c7
---------
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This is a follow-up on the discussion here [1].
The IsInvalid flag was previously defined as an instruction that does
not appear in the ISA. However, a micro-architecture can choose to not
recognize an instruction in and raise illegal instruction fault even if
the instruction is in the ISA.
This change modifies the definition of a Invalid instruction such that,
if a StaticInst instruction is marked as IsInvalid, it means the
instruction is not recognized by the decoder. This means that any
instruction recognized by the decoder are not invalid, even if the
instruction is not in the official ISA spec; e.g., m5
pseudo-instructions.
Note that instructions that are recognized by the decoder but are chosen
to act as a nop are not invalid. This applies to WarnUnimplemented
instructions, e.g. hint instructions.
[1] https://github.com/gem5/gem5/pull/1071
Change-Id: I1371b222d8b06793d47f434d0f148c5571672068
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
For scratch instructions only, this bit specifies if an offset in a VGPR
should be used for address calculation. This is new in MI300 and was
previously the LDS bit. The LDS bit is rarely used and in fact gem5 does
not even check this bit.
This fixes a bug when SADDR == 0x7f (i.e., no SGPR should be used) where
a VGPR was being added to the address when it should have been ignored.
Change-Id: I9864379692df6795b25b58b98825da05d18fc5db
When accessing memory using functionalAccess(), the MMU could tell us to
use a nonsecure access even though the CPU is operating in secure mode.
I noticed this while trying to run a simple semihosting hello world with
the MMU+caches enabled and the semihosting calls ended up reading from
memory instead of the caches due to an S/NS mismatch.
See also https://github.com/gem5/gem5/pull/1198 which happens to also
mask the issue I saw, but I believe both changes are needed.
Change-Id: I9e6b9839b194fbd41938e2225449c74701ea7fee
Some arithmetic instructions of the riscv vector extension where still
using the default VLEN=256 instead of the dynamic one through the
inherited `vlen` attribute. Most of them only use this to calculate the
effective index for the mask element like so:
```
uint32_t ei = i + vtype_VLMAX(vtype, vlen, true) * this->microIdx;
if (this->vm || elem_mask(v0, ei)) {
...
```
This means that instructions will wrongly compute the mask index in the
second and subsequent micro instructions (`microIdx` > 0). This commit
fixes this by adding the corresponding `set_vlen` snippet to the
affected instruction formats.
Change-Id: Ib041de972d6938490741a9fb4c214a6a5172c34e
While running a simple Arm32 binary, I noticed that all memory
transactions were being marked as NS instead of S once I turn on the MMU
(even though the page tables have the NS bit set to zero). The result of
this was that semihosting calls were failing since they were using
functional accesses with the SECURE flag set, but the caches only
contained NS tagged entries so these accesses always read stale values
from DRAM.
Digging through the Arm MMU code it appears that the NS bit lookup was
being keyed of the `secureLookup` flag which is only used for long
descriptors. I believe 0c28712f51 should
have used isSecure instead of secureLookup. To avoid using these
uninitialized values in the future I wrapped the LPAE state in a
std::optional to ensure that it is only accessed once initialized.
Change-Id: Ibc406ed3f4cfa768f470e34a5eca3c1a2bf45cd8
The SYS_WRITEC and SYS_WRITE0 calls are specified as writing to the
debug channel, so it is a reasonable expectation for these messages to
be visibile immediately after the semihosting call.
Change-Id: I8e6e9a7aab593a59e82ecb9cf4603c18c7a8acbe
This feature has been available since Vega10 but was never implemented.
MI300 adds a few new instructions that make use of this more often
(e.g., v_mov_b64).
Change-Id: Ieeb7834462b76d77c0030f49622d0de09f90c9e4
This instruction is a simple move from accumulation register to
accumulation register. It is essentially a move with the accumulation
offset added to the register index.
Change-Id: Ic93ae72599b75c91213f56ebafe5bbd7b2867089
Flat, scratch, and global share the same instruction implementation with
different address calculations essentially. These instructions were
already implemented but not added to the decoder. This commit adds the
remaining scratch instructions which have a shared instruction
implementation.
Change-Id: I8f2e9ceb221294dce1b81c45745b642f0592d985
The lines `constexpr int B_I = std::ceil(64.0f / (N * M / H));` caused
the following compilation error in clang Version 16:
```
error: constexpr variable 'B_I' must be initialized by a constant
expression
```
`std::ceil` is not a const expression. Therefore instances of this
expression in instructions.hh have been replaced with a constant
expression friendly alternative.
This is calling our compiler tests to fail:
https://github.com/gem5/gem5/actions/runs/9288296434/job/25559409142
Change-Id: I74da1dab08b335c59bdddef6581746a94107f370
This PR is doing the following:
1) Fixing memory attributes of partial translation entries (table walks)
2) Properly setting the cacheability of table walks
Fix#1168. Prevent logical instructions like AND, OR, and TEST from
having input dependencies on the previous value of the Zaps register
(ZF+AF+PF+SF) by having them set AF=0, rather than not modifying AF.
Fix#1169. Break the input dependency of 32-bit and 64-bit 'mov'
micro-ops on the prior value in the destination register. Such a
dependency is required for 8-bit and 16-bit moves, as they do not
completely overwrite the value in the destination register. However, it
is unnecessary for 32-bit moves (which implicitly zero the upper 32
bits) and 64-bit moves.
This patch implements the fix by adding a new code template field inside
the generated constructors of X86StaticInst's, called `invalidate_srcs`,
which instruction implementations like `mov` can use to conditionally
invalidate particular source registers as needed. In `mov`'s case, this
is when the data size is 32 or 64 bits.
Change-Id: Ib2aef6be6da08752640ea3414b90efb7965be924
According to the Arm architecture reference manual, it is possible to
force the broadcast of the following TLBIs:
AArch64: TLBI VMALLE1, TLBI VAE1, TLBI ASIDE1, TLBI VAAE1, TLBI VALE1,
TLBI VAALE1, IC IALLU, TLBI RVAE1, TLBI RVAAE1, TLBI RVALE1, and TLBI
RVAALE1.
AArch32: BPIALL, TLBIALL, TLBIMVA, TLBIASID, DTLBIALL, DTLBIMVA,
DTLBIASID, ITLBIALL, ITLBIMVA, ITLBIASID, TLBIMVAA, ICIALLU, TLBIMVAL,
and TLBIMVAAL.
Via the HCR_EL2.FB bit
Change-Id: Ib11aa05cd202fadfbd9221db7a2043051196ecbd
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
When determining the cacheability of table walks,
SCTLR.C should only be used in stage1 EL1&0 translations.
Stage2 translations should rely on HCR_EL2.CD instead
Change-Id: I1b0830bc3fb5086f68d7a7a1560c7fed5d126d28
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Make table walks uncacheable if marked as uncacheable
in either inner or outer shareable domain
Change-Id: I5898a3b91b5b919e0beda6c6fe896394e3ab94df
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Those AArch64 instructions/registers were labelled as executable
from EL3 only if SCR_EL3.NS == 1. This is not valid anymore
after the introduction of FEAT_SEL2
The new static analysis in GCC 13 finds issues with operand.hh. This
commit fixes the error so that gem5 compiles when BUILD_GPU is true.
Change-Id: I6f4b0d350f0cabb6e356de20a46e1ca65fd0da55
Those AArch64 instructions/registers were labelled as executable
from EL3 only if SCR_EL3.NS == 1. This is not valid anymore
after the introduction of FEAT_SEL2
Change-Id: Ie7b56f3fe779c3a99d4f0ef937c7c8ec0530b00e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This is making it easier for TLBI instructions to share code. Common
code (under the form of tlbi* functions) are closely matching the
instruction description in the Arm pseudocode
Change-Id: If10c22fb4a7df2bcd0335e9761286ad3c458722b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The bit 0 of register should be 0 for jump address. Wrong handling the
jump address may cause infinite run or segment fault.
gem5 issue: https://github.com/gem5/gem5/issues/981
Those were not part of the performTlbi switch and simulation was
therefore panicking when they were encountered
Change-Id: Ifbe0b89e45539df4abc147ac5970b0caf0d9dfdc
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This commit fixes and refactors the implementation of viota. It also
overrides the generateDisassembly function in viota's macro/micro to
correctly print out the instruction when tacing/debugging.
For example, it changes from:
viota_m vd, vd, vs2, v0.t
to:
viota_m vd, vs2, v0.t