Commit Graph

6001 Commits

Author SHA1 Message Date
Matthew Poremba
7d46c50663 arch-vega: Swizzle multi-dword scratch requests (#1445)
Scratch memory requests that are larger than one dword are using a
different memory layout than global instructions. Rather than being
placed contiguously, each dword is interleaved 64 lanes * 4 bytes away
as described in Section 9.1.5.2. "Swizzled Buffer Addressing" in the
MI300 specification. This was verified by comparing MI300 output (which
uses scratch_ instructions) with MI200 (which uses buffer instructions).
MI300 FashionMNIST bs=1 now matches CPU reference.

This requires several changes to the instruction implementations:
- For stores, data in the GPUDynInst can be swizzled before the data is
written to memory. This is easy to do using a helper method. This is
done in the template<int N> variant of initMemWrite. To use this x2
stores are changed to use template<int N> rather than loading a U64. The
swizzle function is renamed to swizzleAddr to avoid confusion with
swizzleData.
- For loads, data is unswizzled in completeAcc when writing register
values. This is not as easy to implement as a helper and is thus
implemented for the three load instructions that load more than one
dword.
- Accessing swizzled data requires at least one packet per dword. A new
GPU memory helper is added to create these packets for scratch requests
specifically. This is called in the template<int N> variant of
initMemRead / initMemWrite. Loads and stores of x2 are changed to use
this variant instead of accessing a U64.

The GPUDynInst status vector restrictions are increased to allow for
swizzled x4 accesses. For simplicity this does not currently support
misaligned swizzled accesses and will panic upon seeing such a case.

Change-Id: Ic686c51e28e0af029a043d5a5b3d4069f2cb94f9
2024-08-12 06:58:48 -07:00
Robert Hauser
e980780efd arch-riscv: Extend wfi behavior (#1364)
At the moment, a hart does not halt if there are pending interrupts.
However, an implementation can also consider the enable status of the
individual interrupts, i.e., a halted hart would only resume if there
are locally enabled pending interrupts. This commit introduces this
behavior. The wfi behavior is controlled by the new configuration
variable wfi_pending_resume of RiscvISA.

Change-Id: I316239f9732c6e73e6ad692491bca08d773dd995

---------

Signed-off-by: Robert Hauser <robert.hauser@uni-rostock.de>
2024-08-09 11:28:15 -07:00
Yangyu Chen
ce07203c5f arch-riscv: use sign-extend for all address generation (#1316)
In gem5, we use the same code base for RISC-V 32 and 64.

However, if we need to allow modifiable XLEN control on CSR.mstatus in
the future, we should follow the RISC-V ISA manual to sign-extend all
the register results, including PC and GPR. If this feature implemented,
the simulator needs to handle user-mode in RV32 but CSR.SATP sets to
Sv39. In this case, 0x80000000 and 0xffffffff80000000 are different
addresses in the 64-bit S-Mode perspective, but they are the same in the
32-bit U-Mode perspective. We should avoid this wrong behavior happening
before we implement this feature.

Thus, we need to sign-extend the results of all the addresses, including
the PC and memory addresses, which currently use zero-extend. As
specified in the RISC-V ISA manual, we use zero-extend in narrow XLEN
mode for the physical address implemented in TLB.

Changes based on spec:
1. Sign-extend narrow XLEN:
https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-b7a445a-2024-07-02/src/machine.adoc?plain=1#L567
2. Zero-extend physical address:
https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-b7a445a-2024-07-02/src/supervisor.adoc?plain=1#L1670

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
2024-08-08 08:41:35 -07:00
Yu-Cheng Chang
5df08fdb08 arch-riscv: Move pmpReset implementation to MMU::reset() (#1406)
The PMP is part of RISC-V MMU subssystem, it should be put in
RiscvISA::MMU::reset()
2024-08-05 14:21:48 -07:00
Alexander Richardson
267817eaa1 arch-riscv: Fix implicit int-to-float conversion in .isa files (#1319)
Explicitly convert to float/double to fix compiler warnings that I have
turned on locally.
2024-07-31 04:24:54 -07:00
Yu-Cheng Chang
c13f895af0 arch,cpu: Implement generic reset method for MMU (#1342)
Implementing generic reset method for MMU allows each ISA implementing
their own reset methods. The default reset MMU method is flush all TLB
entries. For example, The RISC-V needs to do PMP reset when received the
reset signal, but the TLBs don't require to be flushed.

Change-Id: I158261570fb6e5216ec105fbdc53460f83f88d15
2024-07-30 09:47:55 +01:00
Alexander Richardson
b64aa0b9b3 arch: Dump semihosting write buffer in debug output (#1389)
This makes it easier to debug unexpected semihosting outputs (in my case
a wrong buffer argument was being passed).

Change-Id: I342610a92fb8efe121d030f7b9ea3307efc4fec3
2024-07-30 09:39:05 +01:00
Alexander Richardson
b23a4c7806 arch-arm: Add support for AArch32 PMEVCNTR*/PMEVTYPER*/PMCCFILTR (#1388)
These registers were only handled in AArch64 mode but are also
accessible as a c14 registers for AArch32.

Change-Id: I62fe54427e96265df0589308afa1b5d665dbf210
2024-07-29 18:22:00 +01:00
Alexander Richardson
b51927e7a8 arch-arm: return 64-bit cycle counter for MISCREG_PMCCNTR (#1390)
In AArch32 mode it is possible to read a 64-bit counter using mrrc.
Instead of truncating in the PMU code, just allow the instruction
implementation to truncate to 32 bits if accessed using mrc.

Change-Id: I77620f6d1852a7d9e79c1ecee50f4297b4103b1c
2024-07-29 16:57:48 +01:00
Matthew Poremba
21f6e166b7 arch-vega: Panic on SDWAB / DPP VOPC unimplemented
If SDWAB or DPP are used on a VOPC instruction and those are not
implemented, it is highly likely to be a problem for the application.
Rather than continue to execute and cause undefined behavior, exit the
simulation with a panic showing the line of the instruction causing the
issue.

Change-Id: Ib3f94df7445d068b26907470c1f733be16cd2fc2
2024-07-25 16:18:14 -07:00
Matthew Poremba
b75fe56da5 arch-vega: Panic unimplemented SDWA/DPP for VOP1/VOP2
Add a panic if SDWA or DPP is used for an instruction which does not
implement support for it. If an application uses SDWA or DPP it likely
does not operate in the same way as the base instruction and therefore
gem5 should panic rather than continue. It is likely data is incorrect
which will make it more difficult to debug an application.

Change-Id: I68ac448b0d62941761ef4efa0169f95796270f48
2024-07-25 16:18:14 -07:00
Matthew Poremba
6558821e2d arch-vega: Add SDWAB for v_cmp_{eq,ne}_u16
This shows an example of how to use the previous commit which adds an
SDWAB helper. The execute() method of both are the same with the
exception of the lambda function passed to the helper method.

Change-Id: I5ffe361440b4020b9f7669c0ed946aa6b3bbec25
2024-07-25 16:18:14 -07:00
Matthew Poremba
69338703e7 arch-vega: Implement SDWAB helper
Implement a SDWAB helper which accepts a dynamic instruction and a
lambda function defining a comparison function taking two values and
returning a comparison result of 0 or 1 for false or true.

Current instructions which implement SDWA do so on a per-instruction
basis which adds a lot of redundant code. This allows for generic SDWAB
implementations for VOPC instructions.

All modifiers are implemented assuming that SDWBA VOPC instruction
comparison types may be U32, I32, F32, U16, I16, F16 (which exist) but
is extendible to I8, U8, or F8.

Change-Id: Idab58a327c29dd19a1a5457237f3799a04f2031b
2024-07-25 16:18:13 -07:00
Matthew Poremba
a7bc4ca19a arch-vega: Fix unconditional clamps in VOP3 (#1379)
Some instructions are clamping floating point outputs unconditionally,
leading to incorrect results. This commit finds instructions with this
issue and checks the clamp bit before applying clamp.

Change-Id: Ibc6de3813d81fd4f9d2c98dd497d19dd34cf6bde
2024-07-25 08:06:00 -07:00
Matthew Poremba
7dae1a1d25 arch-vega: Multiple SOPC fixes (#1366)
Make S_CMP_LT_U32 use < instead of <=. Change types of EQ / LG for U64
to be U64.

Change-Id: Ib0b3b7a46ba1aff16a6d439302ca087d988d6417
2024-07-23 12:45:52 -07:00
Ivana Mitrovic
82c91e8edb arch-riscv: Improve widening/narrowing vectors overlap check (#1331)
This PR improves the vector register groups overlap check in
widening/narrowing
instructions.

- Fix wrong illegal overlap condition between VS2 and VD vector register
groups.
- Also check VS1 vector register group for overlap with VD in
vector-vector
instructions.
- Parametrize widening/narrowing factors in overlap check function to
potentially
handle more cases.

Fixes issue #442.
2024-07-22 10:54:02 -07:00
Alexander Richardson
fc59109429 arch,arch-arm: Fix remaining implicit float conversion warnings in .isa (#1327)
This fixes the remaining implicit int/float conversions and enables the
float conversion warnings for clang when building the Arm instruction
execution logic. This depends on the previous fixes.

Change-Id: I51aac94644a483175842c36da2d49d308aaceb49
2024-07-18 10:43:12 -07:00
Robert Hauser
9b8c84cb5d arch-riscv: Overwrite getEMI() for timing expr (#1346)
TimingExpression enables runtime calculation of the commit latency in
MinorCPU. For this, machInst is obtained by getEMI() to match it with a
given instruction. At default, getEMI() always returns 0 and is
therefore overwritten to enable timing expressions for RISC-V. This was
already done for ARM (see src/arch/arm/insts/static_inst.hh).

Change-Id: I03d669b3439fd24e00cbf893f5db9951dfe56b1f

Signed-off-by: Robert Hauser <robert.hauser@uni-rostock.de>
2024-07-12 20:52:24 -07:00
Robert Hauser
5e5e8fb9c6 arch-riscv: Update local interrupts citation (#1347)
Updated the bib information of the local RISC-V interrupts.

Change-Id: I666c3df4529e159bd1946ca1a9623e47f84d5d9e

Signed-off-by: Robert Hauser <robert.hauser@uni-rostock.de>
2024-07-12 20:51:49 -07:00
Tommaso Marinelli
e3b41291da arch-riscv: Check VS1 group for overlap when widening/narrowing
Currently, only the VS2 register group is checked for overlap with VD
when executing a widening/narrowing instruction. This commits extends
the check to VS1, when applicable (i.e. vector-vector operations).

Change-Id: I892b7717c01e25546fb41e05afbd08fc40c60c59
2024-07-12 01:17:14 +00:00
Tommaso Marinelli
a8b7e9727d arch-riscv: Generalize widening/narrowing vectors overlap check
As of now, the widening/narrowing vector register groups overlap check
always assumes a SEW multiplication factor equal to 2 (for either VD or
VS2). This commits aims at making this check more generic.

Change-Id: I4311fc3624cd324ccfdf2a1920a19efc85357120
2024-07-12 01:17:14 +00:00
Tommaso Marinelli
5b693fd8b6 arch-riscv: Remove duplicate line
Change-Id: I32200aad5a59c9fd85f6ed783a4cebb841bf6ff1
2024-07-12 01:17:14 +00:00
Tommaso Marinelli
fbe6985365 arch-riscv: Fix widening instructions vectors overlap check
This commit fixes the overlap check between VS2 and VD register groups
in vector widening instructions. While the narrowing instructions check
is correct, the widening one has to differentiate between two cases
(Vs2 EEW = 2*SEW and Vs2 EEW = SEW). In the first case, overlap is
allowed, as the EEW is the same as Vd. In the second case, the overlap
legality check has to be adapted to use the Vs2 EMUL to calculate the
boundaries. The rule has been derived again from Section 5.2 of RISC-V
"V" Vector Extension specifications, version 1.0.

The patch also includes some small code refactoring, e.g. using
already defined vlmul and constants for vector operands.

Fixes issue #442.

Change-Id: Ic87095fb9079e6c8f53b9a0d79fbf531a85dc71d
2024-07-12 01:17:14 +00:00
Saúl
8dde32d2dc arch-riscv: fix initialization for some vector reduction insts (#1340)
Vector reduce float (widening and non-widening) and integer (widening)
instructions initialize the reduce loop operation with the first element
of the destination register (i.e. `Vd[0]`).

Since all reductions per spec seem to be `Vd[0] = Vs1[0] + Vs2[*]`
(where `+` is an arbitrary binary op and `*` indicates all active
elements) gem5 will calculate this incorrectly if `Vd[0]` and/or
`Vs1[0]` are non-neutral for the operation (the later case being because
it's not taken into account at all).

To solve this we just have to initialize the reduction loop to `Vs1[0]`
(the non-widening integer reduction already does this).
2024-07-10 22:08:49 -07:00
Yu-Cheng Chang
d54dcac393 arch-riscv: Fix setRegs from GDB failed after #1099 (#1291)
The gem5 crashed when user try to update register value from GDB because
PR[1] changes the index of CSR_XSTATUS to MISCREG_XSTATUS, which is out
of NUM_PHYS_MISCREGS.

The CSR_XSTATUS should use setRegWithMask to update it.

[1] : https://github.com/gem5/gem5/pull/1099

gem5 issue: https://github.com/gem5/gem5/issues/1299

Change-Id: Iefc0d1f5adfb98ecfda0e74907964b47d1864b6d
2024-07-09 15:55:35 -07:00
Jason Lowe-Power
d20512c291 arch-riscv: add agnostic option to vector tail/mask policy for mem and arith instructions (#1135)
These two commits add agnostic capability for both tail/mask policies,
for vector memory and arithmetic instructions respectively. The common
policy for instructions is to act as undisturbed if one is (i.e. tail or
mask), or write all 1s if none.

For those instructions in which multiple micro instructions are
instantiated to write to the same register (`VlStride` and `VlIndex` for
memory, and `VectorGather`, `VectorSlideUp` and `VectorSlideDown` for
arithmetic), a (new) micro instruction named `VPinVdCpyVsMicroInst` has
been used to pin the destination register so that there's no need to
copy the partial results between them. This idea is similar to what's on
ARM's SVE code. This micro also implements the tail/mask policy for this
cases.

Finally, it's worth noting that while now using an agnostic policy for
both tail/mask should remove all dependencies with old destination
registers, there's an exception with `VectorSlideUp`. The
`vslideup_{vx,vi}` instructions need the elements in the offset to be
unchanged. The current implementation overrides the current vta/vma and
makes them act as undisturbed, since they require the old destination
register anyways. There's a minor issue with this though, as
`v{,f}slide1up` variants do not need this, but since they share the same
constructor, will act all the same.

Related issue #997.
2024-07-08 11:47:11 -07:00
Giacomo Travaglini
d825103df2 arch-arm: Implement FEAT_TTST (#1323)
Implement small translation table extension.
This feature relaxes the lower limit on the size of the translation
tables, by increasing the maximum permitted values of the T1SZ and T0SZ
field in: TCR_EL1, TCR_EL2, TCR_EL3,VTCR_EL2 and VSTCR_EL2

Change-Id: I4c2187815b2d7f14407edb38095c6bcc2004b62a

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-04 09:37:41 +01:00
Giacomo Travaglini
c9d9108978 arch-arm: MISCREG_AT_S1E2R/W are executable from S state (#1322)
Change-Id: Ieaebdf0d62b5115f8085f478b2da105633b6a26a

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-04 09:37:17 +01:00
Giacomo Travaglini
f3e3c60805 arch-arm: Proper support for NonSecure IPA space in Secure state
Change-Id: Ie2e2278ecdc5213db74999e3561b2918937c2c2e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-02 13:16:13 +01:00
Giacomo Travaglini
eb400e773b arch-arm: Remove makeStage2 from TLBIOp
Change-Id: I25276e4b5b7c491e69208044ceb193c67ddfd91c
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-02 13:15:49 +01:00
Giacomo Travaglini
49ca08b01a arch-arm: Add isStage2 qualifier to the LongDecriptor
We are currently using the LongDecriptor for both stage1
and stage2 translations. There are several cases where
the bitfield meaning changes depending on the translation
stage.

Change-Id: Ic33d9ef225a57fd79ce2b4bf47896aeb6bdd8d9c
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-02 13:15:31 +01:00
Giacomo Travaglini
9cce68ca71 arch-arm: Replace isSecure boolean with SecurityState enum
Change-Id: If01b8b2811b2c028e669ea3700174c7945b07a06
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-02 12:45:24 +01:00
Alexander Richardson
d5c0383887 arch-arm: support 64-bit PMCCNTR from AArch32 (#1304)
For ARMv8 CPUs this register allows reading a 64-bit cycle counter in
from 32-bit execution state.

Change-Id: I7cd9e2711ada5156920440cc3c89e7a74ca54a49
2024-07-02 08:59:44 +01:00
Giacomo Travaglini
b28659d4f9 arch-arm: Implement FEAT_XS (#1303)
This patch is adding a functional implementation of FEAT_XS. Unless we
operate with DVM enabled, TLBIs broadcasting is accomplished in 0 time;
so there is no timing benefit introduced by enabling FEAT_XS other than
the way it affects TLB management (invalidation)

Change-Id: I067cb8b7702c59c40c9bbb8da536a0b7f3337b5d

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-07-02 08:52:59 +01:00
Rajesh Shashi Kumar
3ce5e0584a arch-arm: This commit fixes a typo in the ARM ldaddalx instruction (#1279)
The acquire-release flavor of the ldadd instruction should read ldaddalx
(eg. ldaddalb/ldaddalh) according to specification. However, this is
currently noted as ldadd"la"x (eg. ldaddlab/ldaddlah).

Issue: https://github.com/gem5/gem5/issues/1224
Change-Id: Ib932fa0e572207729c923c27f24c34cc21dff0e5

Co-authored-by: Bobby R. Bruce <bbruce@ucdavis.edu>
2024-06-26 09:03:50 -07:00
Saúl Adserias
99f58d37da arch-riscv: add agnostic opt to vector tail/mask for arith insts
Change-Id: I693b5f3a6cc8a8f320be26b214fd9b359e541f14
2024-06-24 10:03:52 -07:00
Saúl Adserias
73c364519a arch-riscv: add agnostic opt to vector tail/mask for mem insts
Change-Id: I567a110806b77d5576810706bd3e30185b0e0b75
2024-06-24 10:03:52 -07:00
Jason Lowe-Power
013f773d31 arch-riscv: Fix TLB lookup with vaddrs (#1264)
Previously, all of the TLB lookup/insert functions were using the full
virtual addresses even though the variables in the functions said "vpn."
This change explicitly converts the virtual address to the VPN without
any least significant zeros for the offset. I.e., vpn >> page_size.

The main bug solved in this changeset is the asid was |'d with the upper
bits of the virtual address, but sometimes there were all 1's.
Therefore, you could get a TLB hit even if the ASID was different.
Interestingly, the page that seemed to cause these issues was a 1 GiB
page.

This change also starts refactoring some of the page table details to
support sv46 and sv57 page table formats.

In my testing, the Linux kernel boot uses large pages (even OpenSBI uses
large pages), so it seems that large pages also work. However, this
seems like magic to me, so I'm not sure if it's correct.

This change also updates some asserts, and debug statements with more
useful debugging information.

Partially fixes #1235. More testing needs to be done to be confident.
2024-06-20 13:24:50 -07:00
Hoa Nguyen
15e0236a8b arch,cpu,sim: Add mechanism to partially print vector regs (#1234)
Currently, gem5's inst tracer prints the whole vector register container
by default. The size of vector register containers in gem5 is the
maximum size allowed by the ISA. For vector-length agnostic (VLA) vector
registers, this means ARM SVE vector container is 2048 bits long, and
RISC-V vector container is 65535 bits long. Note that VLA implementation
in gem5 allows the vector length to be varied within the limit specified
by the ISAs.

However, in most use cases of gem5, the vector length is much less than
65535 bits. This causes two issues: (1) the vector container requires
allocating and moving around a large amount of unused data while only a
fraction of it is used, and (2) printing the execution trace of a vector
register results in a wall of text with a small amount of useful data.

This change addresses the problem (2) by providing a mechanism to limit
the amount data printed by the instruction tracer. This is done by
adding a function printing the first X bits of a vector register
container, where X is the vector length determined at runtime, as
opposed to the vector container size, which is determined at compilation
time.

Change-Id: I815fa5aa738373510afcfb0d544a5b19c40dc0c7

---------

Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2024-06-17 14:05:47 -07:00
Hoa Nguyen
500da4306b arch: Mark FailUnimplemented instructions as Invalid instructions (#1247)
This is a follow-up on the discussion here [1].

The IsInvalid flag was previously defined as an instruction that does
not appear in the ISA. However, a micro-architecture can choose to not
recognize an instruction in and raise illegal instruction fault even if
the instruction is in the ISA.

This change modifies the definition of a Invalid instruction such that,
if a StaticInst instruction is marked as IsInvalid, it means the
instruction is not recognized by the decoder. This means that any
instruction recognized by the decoder are not invalid, even if the
instruction is not in the official ISA spec; e.g., m5
pseudo-instructions.

Note that instructions that are recognized by the decoder but are chosen
to act as a nop are not invalid. This applies to WarnUnimplemented
instructions, e.g. hint instructions.

[1] https://github.com/gem5/gem5/pull/1071

Change-Id: I1371b222d8b06793d47f434d0f148c5571672068

Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2024-06-17 12:44:05 -07:00
Matthew Poremba
2f5842d253 arch-vega: Add valid flag to ds_swizzle_b32
Currently the flag is just Load and there is a long comment explaining
why. This does not meet any of the scoreboard check requirements:

https://github.com/gem5/gem5/blob/develop/src/gpu-compute/scoreboard_check_stage.cc#L230-L241

Add a generic ALU flag as well so the instruction executes instead of
panicking.

Change-Id: I54b2d20d47fad5e8f05f927328433aab7db7d862
2024-06-15 14:28:59 -07:00
Matthew Poremba
42369eab2c arch-vega: Implement MI300 FLAT SVE bit
For scratch instructions only, this bit specifies if an offset in a VGPR
should be used for address calculation. This is new in MI300 and was
previously the LDS bit. The LDS bit is rarely used and in fact gem5 does
not even check this bit.

This fixes a bug when SADDR == 0x7f (i.e., no SGPR should be used) where
a VGPR was being added to the address when it should have been ignored.

Change-Id: I9864379692df6795b25b58b98825da05d18fc5db
2024-06-15 14:28:59 -07:00
Matthew Poremba
1dab4be002 arch-vega: Implement VOP3 V_FMAC_F32
A version of V_FMAC_F32 with extra modifiers from VOP3 format.

Change-Id: Ib6b41b0a3ceb91269b91a0287dfc94bc73e4d217
2024-06-15 14:28:58 -07:00
Matthew Poremba
f91d14fe46 gpu-compute: Add MFMA stats (#1248)
Add dynamic instruction counts for MFMAs.

Change-Id: I976b01344577cf011aeb3dd648a8c0017281c4e3
2024-06-15 13:04:00 -07:00
Hoa Nguyen
d528a6bd2d arch: Flag all ISAs Unknown instruction as IsInvalid
Change-Id: I096138a157c4e2063c5f4f4324c21c1463dddb65
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2024-06-11 18:48:29 +00:00
Alexander Richardson
3cfc550fc0 arch-arm,mem: Don't hardcode secure mode accesses for semihosting (#1200)
When accessing memory using functionalAccess(), the MMU could tell us to
use a nonsecure access even though the CPU is operating in secure mode.
I noticed this while trying to run a simple semihosting hello world with
the MMU+caches enabled and the semihosting calls ended up reading from
memory instead of the caches due to an S/NS mismatch.

See also https://github.com/gem5/gem5/pull/1198 which happens to also
mask the issue I saw, but I believe both changes are needed.

Change-Id: I9e6b9839b194fbd41938e2225449c74701ea7fee
2024-06-09 14:08:54 -07:00
Saúl
5cfad84a98 arch-riscv: correctly set dynamic VLEN for all arith instructions (#1187)
Some arithmetic instructions of the riscv vector extension where still
using the default VLEN=256 instead of the dynamic one through the
inherited `vlen` attribute. Most of them only use this to calculate the
effective index for the mask element like so:

```
uint32_t ei = i + vtype_VLMAX(vtype, vlen, true) * this->microIdx;
if (this->vm || elem_mask(v0, ei)) {
...
```

This means that instructions will wrongly compute the mask index in the
second and subsequent micro instructions (`microIdx` > 0). This commit
fixes this by adding the corresponding `set_vlen` snippet to the
affected instruction formats.

Change-Id: Ib041de972d6938490741a9fb4c214a6a5172c34e
2024-06-07 22:33:56 -07:00
Alexander Richardson
ec5881ec4e arch-arm: avoid using an uninitialized variable use in MMU walks (#1198)
While running a simple Arm32 binary, I noticed that all memory
transactions were being marked as NS instead of S once I turn on the MMU
(even though the page tables have the NS bit set to zero). The result of
this was that semihosting calls were failing since they were using
functional accesses with the SECURE flag set, but the caches only
contained NS tagged entries so these accesses always read stale values
from DRAM.

Digging through the Arm MMU code it appears that the NS bit lookup was
being keyed of the `secureLookup` flag which is only used for long
descriptors. I believe 0c28712f51 should
have used isSecure instead of secureLookup. To avoid using these
uninitialized values in the future I wrapped the LPAE state in a
std::optional to ensure that it is only accessed once initialized.

Change-Id: Ibc406ed3f4cfa768f470e34a5eca3c1a2bf45cd8
2024-06-07 08:59:28 +01:00
Alexander Richardson
8e5fbcbbbb arch-generic: flush streams after semihosting write calls (#1202)
The SYS_WRITEC and SYS_WRITE0 calls are specified as writing to the
debug channel, so it is a reasonable expectation for these messages to
be visibile immediately after the semihosting call.

Change-Id: I8e6e9a7aab593a59e82ecb9cf4603c18c7a8acbe
2024-06-06 09:57:36 +01:00
Yu-Cheng Chang
5d3f1c3316 arch-riscv: Add rvZext to BranchTarget (#1173)
Ensure the upper xlen bits are all zeros

Change-Id: Id81330eced907d21320bc1af85ad38fb6e95f6b1
2024-06-03 10:03:51 -07:00