M5Ops C / C++ functions partially use 64 bit arguments and return value.
In general, 64 bit arguments and return values are possible for 32 bit
RISC-V systems as well, since the arguments and the return value is
split into two registers. However, at the moment, this does not work for
32 bit RISC-V systems on the simulator side, since there is a one to one
mapping between argument registers and m5op function parameters.
To solve this problem, the get() function of the RISC-V reg_abi is
updated. It now will merge two registers if there is a 64 bit argument.
For this, the function code has to be passed to the get() function. The
default value of this function code is set to 0xF00, since 0x00 is
already used for M5_ARM. The parameter list of other get() functions for
argument return is also extended by this function code parameter with
the keyword [[maybe_unused]].
To enable a return value of size 64 bit, a0 is assigned with the lower
32 bit and a1 with the higher 32 bit.
Related Issue: https://github.com/gem5/gem5/issues/881
The `reset_vect` has exist for a long time and `reset_vect` will not
effect if the user gonna to use customized reset_vect. The CL added the
`auto_reset_vect` to let the config determine the `reset_vect` from
workload entry point or user-specified
Ref: https://gem5-review.googlesource.com/c/public/gem5/+/42053
Change-Id: I928c0dc42aaa85ceabf8d75f9654486496e0ffee
Currently, the citation string has a Unicode character. This works well
in gem5, but it breaks the gem5+SST simulation [1]. This change modifies
the letter "u" with umlaut to use TeX's escape sequence for this letter
instead of using the UTF-8 character.
[1] https://github.com/gem5/gem5/issues/982
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
A constructor is added to GuestAddr as suggested in the pull request
feedback. This allows a cast conversion from uint64_t GuestAddr. Hence,
the casting from uint64_t to GuestAddr by reinterpret_cast is removed
(was added in a previous commit).
using namespace pseudo_inst is also removed as requested.
Comments are added to GuestAddr.
Change-Id: Ib76de2bff285f4e53ad03361969c27f7bb2dfe9e
AMD's MI100 introduced a new register file called accumulation registers
for the matrix cores. In MI200 these were recombined into the same
register file according to the documentation. The accumulation register
file is the same size as the architectural register file, hence the size
is doubled.
The ISA spec does not explicitly state the register selector values,
however it does say that the accumulation offset from the kernel
dispatch packet should be added to the architecture register file
selector number when an instruction sets the ACC bit. Therefore we can
infer that the value must simply be an extension beyond the
architectural VGPRs.
This fixes errors of the form "invalid register selector: 512" (or
higher value). This was tested with the Learn the Basics tutorial
example on pytorch.org
Change-Id: I48ced1532fc166d2f5032fe21fbeba70ac77f258
The existing implementation of vfmv instruction did not type cast the
first element of the source vector, which caused the "freg" to interpret
the result as a NaN.
With the type cast to f32, the value is correctly recognized as float
and sign extended to be stored in the fd register.
Git issue: https://github.com/gem5/gem5/issues/827
Change-Id: Ibe9873910827594c0ec11cb51ac0438428c3b54e
---------
Co-authored-by: Debjyoti B <bhatta53@imec.be>
Co-authored-by: Tommaso Marinelli <tommarin@ucm.es>
This commit adds support for vector unit-stride segment store operations
for RISC-V (vssegXeXX). This implementation is based in two types of
microops:
- VsSegIntrlv microops that properly interleave source registers into
structs.
- VsSeg microops that store data in memory as contiguous structs of
several fields.
Change-Id: Id80dd4e781743a60eb76c18b6a28061f8e9f723d
Gem5 issue: https://github.com/gem5/gem5/issues/382
Implement several features new in ROCm 6.0 and features required for
future devices. Includes the following:
- Support for multiple command processors
- Improve handling of unknown register addresses
- Use AddrRange for MMIO address regions
- Handle GART writes through SDMA copy
- Implement PCIe indirect reads and writes
- Improve PM4 write to check dword count
- Implement common MI300X instruction
The main decoder for GPU instructions looks at the first 9 bits of a
dword to determine either the instruction or a subDecode table with more
information for specific instructions types. For flat instructions the
first 9 bits currently consist of 6 fixed encoding bits, a reserved bit,
and the first two bits of the opcode. Hence to support all opcodes there
are four indirections to the flat subDecode table. In MI300 the reserved
bit is part of a field to determine memory scope and therefore may be
non-zero.
This commit adds four addition calls to the subDecode table for the
cases where the scope bit is 1. See page 468 (PDF page 478) below:
https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/
instruction-set-architectures/
amd-instinct-mi300-cdna3-instruction-set-architecture.pdf
Change-Id: Ic3c786f0ca00a758cbe87f42c5e3470576f73a32
MI200 adds support for four FP32 packed math instructions. These are
VOP3P instructions which have a negative input modifier field. The
description made it unclear if these were used for F32 packed math
however the assembly of some Tensile kernels are using these modifiers
therefore adding support for them. Tested with PyTorch nn.Dropout kernel
which is using negative modifiers.
Change-Id: I568a18c084f93dd2a88439d8f451cf28a51dfe79
The datatype is U32 but should be F32. This is causing an implicit cast
leading to incorrect results. This fixes nn.Dropout in PyTorch.
Change-Id: I546aa917fde1fd6bc832d9d0fa9ffe66505e87dd
In the RISC-V unprivileged spec[1], the misaligned load/store support is
depend on the EEI.
In the RISC-V privileged spec ver1.12[2], the PMA specify wether the
misaligned access is support for each data width and the memory region.
In the [3] of `mcause` spec, we cloud directly raise misalign exception
if there is no memory region misalignment support. If the part of memory
region support misaligned-access, we need to translate the `vaddr` to
`paddr` first then check the `paddr` later. The page-fault or
access-fault is rose before misalign-fault.
The benefit of moving check_alignment option from ISA option to PMA
option is we can specify the part region of memory support misalign
load/store.
MMU will check alignment with virtual addresss if there is no misaligned
memory region specified. If there are some misaligned memory region
supported, translate address first and check alignment at final.
[1]
https://github.com/riscv/riscv-isa-manual/blob/main/src/rv32.adoc#base-instruction-formats
[2]
https://github.com/riscv/riscv-isa-manual/blob/main/src/machine.adoc#physical-memory-attributes
[3]
https://github.com/riscv/riscv-isa-manual/blob/main/src/machine.adoc#machine-cause-register-mcause
Only the lower 32 bit of return values of pseudo instructions are
stored (in a0). Therefore, the upper 32 bit are stored in a1 to
enable a correct return value.
Change-Id: Idf33c325033281fc191a9285eb5d34fd4965cde9
With the previously introduced struct wrapper GuestAddr, the asm
tests fail. This patch substitutes implements SyscallABI32 similar
to RegABI32, i.e., as a struct based on GenericSyscallABI32.
Furthermore, a get function for arguments is implemented for wide
arguments. It returns the lower 32 bits of a register.
Change-Id: I233a67a5d5c15ab0d019a63bc57f1225288e33cc
In this patch, Addr is subtituted by a struct wrapper (uint64_t) in the
pseudo instruction functions. This enables a correct argument handling
in systems where pointer size != 64 bit.
Change-Id: Ie84b43b4ab8e6c0d38c7b6b16e19fc043110681b
This commit adds support for vector unit-stride segment load operations
for RISC-V (vlseg<NF>e<X>). This implementation is based in two types of
microops:
- VlSeg microops that load data as it is organized in memory in structs
of several fields.
- VectorDeIntrlv microops that properly deinterleave structs into
destination registers.
Gem5 issue: https://github.com/gem5/gem5/issues/382
With the introduction of multi-ISA gem5, we don't store the TARGET_ISA
anymore as a string in the root section of the checkpoint [1]. There is
therefore no way at the moment to asses the ISA of a CPU/ThreadContext.
This is a problem when it comes to checkpoint updates which are ISA
specific.
By explicitly serializing the ISA as a string under the cpu.isa section
we avoid this problem and we let cpt_upgraders be aware of the ISA in
use.
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/48884
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: I1e75230cbc370cab84f4a54141b1e425af2dbfac
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
This commit adds statistics showing completed page walks for 4KB and 2MB
pages. This will add to stats.txt the variables num_4kb_walks,
num_2mb_walks and the corresponding values. This is done based on the
level of page table walk traversed specific to Sv39 Virtual Memory
System.
This PR implements a few changes related to the accumulation offset
which is new in MI200. Previously MI100 contained two vector register
files: the architectural and accumulation register files. These have now
been unified and the architectural register file is twice the size. As a
result of this the dispatch packet set an offset into the unified vector
register file for where the former accumulation registers would go. The
changes are:
- Calculate the accumulation offset from dispatch packet and store in
HSA task.
- Update the accumulation move instructions (v_accvgpr_read/write) to
use it.
- Update the current MFMA instructions to use it.
- Make the MFMA examples more clean.
Initialize x86 process' max stack size to the value given in the process
params, rather than hard-coding it to 8 MB, which made it impossible to
run x86 programs requiring more than 8 MB of stack.
Change-Id: I0b17fe60b016b1e4a82d704ef7ad367974ea6a08
This commit update the two exiting MFMA instructions to support the
accumulation offset for A, B, and C/D matrix. Additionally uses array
indexed C/D matrix registers to reduce duplicate code. Future MFMA
instructions have up to 16 registers for C/D and this reduces the amount
of code being written.
Change-Id: Ibdc3b6255234a3bab99f115c79e8a0248c800400
The accum offset is used as an index into the unified VGPR register file
in MI200 and is not the same as a move if accum_offset in the dispatch
packet is non-zero.
Change these instructions to use the stored accum_offset value.
Change-Id: Ib661804f8f5b8392e4c586082c423645f539e641
According to the RISC-V spec [1]. Any float-point instructions
accumulate FFLAGS register rather than write it to reflect the CSR
behavior.
In the previous implementation. We read the FFLAGS, set the exception
flags, and write the result back to the FFLAGS. This works in the gem5
simple and minor CPU model as they are actually written to `regFile`
after executing the instructions. However, in the gem5 O3 CPU model, it
will record in the `destMiscReg` buffer until the commit stage when
writing to the `miscReg` in the execution stage. The next instruction
will get the old FFLAGS and cause the incorrect result.
The CL introduced the `MISCREG_FFLAGS_EXE` and used the same size of
`miscRegFile` because the `MISCREG_FFLAGS_EXE` and `MISCREG_FFLAGS`
shared the same space. When executing the float-pointing instruction,
any exception flags should be updated via `MISCREG_FFLAGS_EXE` to
accumulate the FFLAGS in `setMiscReg` method. For the MISCREG_FFLAGS, it
should only be called in the CSROp.
[1] Syntactic Dependencies: Appendix A
c80ecada1c/src/mm-eplan.adoc (syntactic-dependencies-rules-9-11)
gem5 issue: https://github.com/gem5/gem5/issues/755
Change-Id: Ib7f13d95b8a921c37766a54a217a5a4b1ef17c6f
Fix#876. The x87 floating-point control word (FCW) was not initialized
at process startup in syscall emulation mode. This resulted in floating
point exceptions in KVM mode when executing x87 floating-point
instructions.
This patch fixes the bug by initializing FCW to its reset value, 0x37F.
Change-Id: Idd1573c6951524ef59466cc5c9f1e640ea7658ae
This PR is fixing https://github.com/gem5/gem5/issues/668. It fixes it
for all ISAs other than Arm with the first commit, which is setting the
number of architectural Matrix registers to 0 for those ISA which are
not using them.
It then partly fixes it for Arm as well with the 2nd commit: by removing
RenameMap::numFreeEntries we don't stall renaming unless a matrix
instruction is encountered... This means most binaries will run with SMT
as long as they don't use FEAT_SME instructions. Please note: this is
not simply a SMT fix, it will generally address a shortcoming in the way
we were renaming instructions.
If an Arm binary wants to use SMT with FEAT_SME, the 4th commit will
make sure the lack of physical registers is notified explicitly at the
beginning of simulation, rather than silently blocking renaming
Those are supposed to control trapping for accesses to debug registers
Change-Id: I4a25a379e718ea6d5ea8ae22ac7edbeb452d1836
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
`assert(interruptID >=0)` is always true as `interruptID` is an unsigned
int.
This was causing compilation tests failures in GCC-8 with the following
error:
```sh
src/arch/riscv/interrupts.cc:47:32: error: comparison is always true due to limited range of data type [-Werror=type-limits]
assert(interruptID >= 0);
```
Change-Id: I356be78d7f75ea5d20d34768fb8ece0f746be2fc
Previously, the S_ICACHE_INV instruction was unimplemented and
simulation panicked if it was encountered. This commit adds support for
executing the instruction by injecting a memory barrier in the scalar
pipeline and invalidating the ICACHE (or SQC)
Change-Id: I0fbd4e53f630a267971a23cea6f17d4fef403d15
The vlm.v and vsm.v unit-stride mask load/store instructions are
constructed with an incorrect VL when the current one is larger than
than VLEN/EEW (i.e. when LMUL > 1). This commit fixes the issue for both
instructions.
This patch provides unit-stride fault-only-first loads (i.e. vle*ff) for
the RISC-V architecture.
They are implemented within the regular unit-stride load (i.e. vle*). A
snippet named `fault_code` is inserted with templating to change their
behaviour to fault-only-first.
A part from this, a new micro based on the vset\*vl\* instructions
(VlFFTrimVlMicroOp) is inserted as the last micro in the macro
constructor to trim the VL to it's corresponding length based on the
faulting index.
This trimming micro waits for the load micros to finish (via data
dependency) and has a reference to the other micros to check whether
they faulted or not. The new VL is calculated with the VL of each micro,
stopping on the first faulting one (if there's such a fault).
I've tested this with VLEN=128,256,...,16384 and all the corresponding
SEW+LMUL configurations.
Change-Id: I7b937f6bcb396725461bba4912d2667f3b22f955
Vector unit-stride instructions have an EEW encoded directly in the instruction,
We should use that instead of SEW in vtype.
Change-Id: I282041ce8ed57fbcca899f7497ef6c6fb2dfcf85
Besides the standard RISC-V interrupts software, timer, and external
interrupt, the RISC-V specification also offers the possibility to
implement local interrupts. With this patch, we contribute an extension
of RiscvInterrupts that enables connecting interrupt sources to the
local interrupt controller. We assigned the local interrupts to
machine-level and gave them the highest priority. If two local
interrupts are pending, there exception code will be the tie-breaker
(higher ID > lower ID). 32 Bit systems only recognize the local
interrupts 16 to 31, 64 Bit systems 16 to 63.
Change-Id: Iff8d34e740b925dce351c0c6f54f4bd37a647e0c
---------
Co-authored-by: Robert Hauser <robert.hauser@uni-rostock.de>
The RISC-V privilege spec don't specify the implementation of
PMA(physical memory attribute), which is addressed in the previous
CL[1].
This CL creates the BasePMAChecker to support customized PMA so that we
can only focus on the features wanted in the study. The CL also leaves
the common methods `check` and `takeOverFrom` to make MMU easy to
interact with PMA.
[1] https://gem5-review.googlesource.com/c/public/gem5/+/40596
Change-Id: I9725e3a8f7f9276e41f0d06988259456149d2a77