The Vega instructions.cc file is 47k lines long which results in both
large compilation times whenever it is modified and long style check
times. This makes iterating over more complex instruction
implementations very time consuming.
This commit moves the instruction definitions to multiple files based on
the instruction encoding (SOP2, VOP2, FLAT, DS, etc.). The resulting
files are much smaller (max is 8k lines) and compilation and style check
times are much more reasonable. Other than moving code around, there are
no functional changes in this commit.
Change-Id: Id4ac8e98ef11a58de5fd328f8a0cd7ce60a11819
The addition of std::optional in #732 caused a compile error. This
change fixes the error by checking to see if the value is present and
panicing otherwise.
Change-Id: I46c3fb76eb0e14ba7bede7c336293fbe9add8c84
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
1. Add the new double width for int64_t and uint64_t
2. Use the wider type to get the upper result of multiplication
Change-Id: Id6cfa6f274c65592b2b3e2b70c00f82954b41f1a
I’ve been working on a fix for the issue #759 where ‘vd’ incorrectly
stores all zeros when ‘vl’ is set to 0 in VectorIntMaskMacroConstructor.
My solution seems to work, but it behaves differently from other macros
when ‘vl’ = 0. Instead of pushing a ‘nop’ to ‘microops’, it pushes a
micro operation that remains ineffective due to ‘vl’ being 0.
Newer compilers error on -Warray-length in the recent MI200 patches due
to casting from a 32-bit data type to a 64-bit type. Change it to cast
the 32-bit integer first then 64-bit integer latter to remove the
warning.
Rerun of validation tests on the three instructions passed.
Change-Id: I0309e5f7b5b8cc8ce1651660ddddb120fa6e7666
Currently, the TLB enforces that the bit 63 of a physical address to be
zero. This check stems from the riscv-tests that checks for the bit 63
of a physical address [1]. This is due to the fact that the ISA
implicitly says that the physical address must be zero-extended on the
most significant bits that are not translated [2]. More details on this
issue is here [3].
The check for bit 63 of a physical address in the TLB is rather too
specific, and I believe the check of invalid physical address is alread
implemented in PMA. Thus, this change proposes to remove this check from
RISC-V TLB.
[1]
bd0a19c136/isa/rv64mi/access.S (L18)
[2] https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/8kO7X0y4ubo
[3] https://github.com/gem5/gem5/issues/238
Change-Id: I247e4d4c75c1ef49a16882c431095f6e83f30383
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
The PMAChecker and PMP are only used in the RisvISA and it should be in
the RiscvISA to simply the implementation
Change-Id: I4968e2de4c028cb2dceed977f2173fc8b1efd175
This patch is amending encodeAArch64SysReg so that it covers the case
where there are no arch numbers available for the misc index passed as
an argument.
This could happen if the register ID is a gem5 pseudo register which is
not associated with any architected op1/op2/crn/crm tuple.
Rather than panicking we return a nullopt.
Change-Id: I7ab70467105ef93c0c78ac4e999c7dc8e5e09925
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Implemented according to the ISA spec. Validated with silion. In
particular the sign extend is important for the signed variants and the
unsigned variants seem to overflow lanes (hence why there is no mask()
in the unsigned varints. FP16 -> FP32 continues using ARM's fplib.
Tested vs. an MI210. Clamp has not been verified.
Change-Id: Ifc09aecbc1ef2c92a5524a43ca529983018a6d59
Starting with MI200, packed math can operate on double dword inputs. In
this case, 64-bits of inputs (two VGPRs per lane) contain two FP32
values.
Add instructions to perform add, multiply, and FMA on packed FP32 types.
Change-Id: Ib838bff91a10e02e013cc7c33ec3d91ff08647b0
This change adds all of the missing flat/global atomics up to including
the new atomics in gfx90a (MI200). Adds all decodings and instruction
implementations with the exception of __half2 which does not have a
corresponding data type in gem5. This refactors the execute() and
completeAcc() methods by creating helper functions similar to what
initiateAcc() uses. This reduces redundant code for global atomic
instruction implementations.
Validated all except PK_ADD_F16, ADD_F32, and ADD_F64 which will be done
shortly. Verified the source/dest register sizes in the header are
correct and the template parameters for the new execute()/completeAcc()
methods are correct.
Change-Id: I4b3351229af401a1a4cbfb97166801aac67b74e4
Use the opSelectorToRegSym which will print the full range of VGPRs
(e.g., will now print v[2:3] instead of v2 when the source / dest is
64-bits). Fixes atomic disassembly prints. Now shows "glc" if GLC bit is
enabled. Fixes some VGPR fields being printed as an SGPR in places where
the 9-bit register index bit is implied (e.g., VDST).
This makes it easier to use a GPUExec trace to match with LLVM
disassembly when debugging.
Change-Id: Ia163774850f0054243907aca8fc8d0361e37fdd5
This adds the VOP3P and VOP3P_MAI encodings from the MI200 spec. These
instructions are used for packed math and miSIMD instructions. The first
19 VOP3P opcodes are implemented and validated against hardware. This
includes all instructions which operate on one dword containing two
packed 16-bit values of fp16, int16_t, or uint16_t.
Implement one MFMA instruction for now which was also validated against
hardware.
This is useful in other ISAs to implement FP16 computation. For example,
it can be used in the GPU model. The ARM specific misc register is
ignored in that case.
Change-Id: I339ac0ccd9be4371b0f220ad99068e5e12b3d263
Because each vector load is fragmented into 64 byte cache-aligned
chunks, and one page-table walk is issued per fragment on tlb miss,
walks start to accumulate on a pending queue, which is processed in a
blocking way (no pending walks can be issued while one is being
processed). This adds noticeable latency on vector loads when VLEN is
sufficiently large.
This commit fixes the issue by allowing walks to be squashed if a TLB
lookup hits just before starting the walk on `startWalkWrapper`. This
idea was taken from the ARM walker.
Correct the instruction flags of RISC-V vector store instructions, such
as `vse64_v`, `vse32_v`. The `vse64_v` in `decoder.isa` is
`Mem_vc.as<uint64_t>()[i] = Vs3_ud[i];` and it will generate the code
`Mem.as<uint64_t>()[i] = Vs3[i];`. The current regex of assignRE only
mark the operand `Mem` as `dest` only if meet the formats like `Mem =
Rd` or `Mem[i] = Rd` because the code ` = Rd` or `[i] = Rd` match the
`assignRE` respectively. For the expression `Mem.as<uint64_t>()[i]`, the
operand `Mem` will falsely mark the operand as `src` because the code
`.as<uint64_t>()[i]` is not match the `assignRE`.
The PR will ensure the operand `Mem` is dest for the format like
`Mem.as<xxx>()[i] = yyy`.
Correct the instruction flags of RISC-V vector store instructions, such
as `vse64_v`, `vse32_v`. The `vse64_v` in `decoder.isa` is
`Mem_vc.as<uint64_t>()[i] = Vs3_ud[i];` and it will generate the code
`Mem.as<uint64_t>()[i] = Vs3[i];`. The current regex of assignRE only
mark the operand `Mem` as `dest` only if meet the formats like `Mem = Rd`
or `Mem[i] = Rd` because the code ` = Rd` or `[i] = Rd` match the
`assignRE` respectively. For the expression `Mem.as<uint64_t>()[i]`,
the operand `Mem` will falsely mark the operand as `src` because the
code `.as<uint64_t>()[i]` is not match the `assignRE`.
The PR will ensure the operand `Mem` is dest for the format like
`Mem.as<xxx>()[i] = yyy`.
Change-Id: I9c57986a64f1efb81eb9c7ade90712b118e0788d
This includes mla/s index version implementation using the existing template code
to avoid code repeatition.
Change-Id: If1de84e01dec638e206c979ca832308ebc904212
Inspired by a similar feature in ARM's full system workload, this change adds
an option to halt gem5 simulation if the guest system encounter kernel panic
or kernel oops.
On RiscvISA::BootloaderKernelWorkload, by default, the simulation
will exit upon kernel panic, while kernel oops will not induce simulation halt.
This is because the system will essentially do nop after a kernel panic, while the
system might be still functional after a kernel oops.
Dumping kernel's dmesg is useful for diagonizing the cause of kernel panic, so
ideally, we want to dump the guest's dmesg to the host. However, due to a bug
described in [1], kernel v5.18+ dmesg might not be dumped properly. Hence, the
dmesg will not be dumped to the host.
On RiscvISA::FsLinux, this feature is turned off by default as the symbols from the
official RISC-V kernel resource are stripped from the binary. However, if this feature
is enable, the dmesg will be dumped to the host system.
[1] https://github.com/gem5/gem5/issues/550
Change-Id: I8f52257727a3a789ebf99fdd4dffe5b3d89f1ebf
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
Currently, if the Capstone header file is found in the host system,
scons will try to build the ArmCapstoneDisassembler regardless of the
gem5 target ISA. This is causing problem when the host has Capstone, but
the gem5 target ISA is not arm. Compiling gem5 in this case will cause
errors, e.g., ArmISA and ArmSystem is not found.
This change aims to prevent building the ArmCapstoneDisassembler when
the gem5 target ISA is not arm.
Ref:
[1] The Arm Capstone PR https://github.com/gem5/gem5/pull/494
Change-Id: I1e714d34aec8fe2a2af8cd351536951053a4d8a5
This Pull-Request addresses gem5 Issue #550. The code that dumps the
Dmesg buffer is now templated on the two variants of the `Metadata`
structure, and the correct one is chosen based on the detected Kernel
version.
To support this functionality, the pull request also adds Symbol Size
data to the loader Symbol Table, and adds a method to query the Kernel
Version from the image in guest memory. The new attributes in the Symbol
class are de-serialized speculatively, so no checkpoint upgrader is
required to support this change.
This patch reworks the Linux Kernel panic and oops events. The code has
been re-factored to provide re-usable events that can be applied to all
ISAs from the base `KernelWorkload` `SimObject`. At the moment they are
installed for the Arm workloads.
This update also provides more configuration options that can be
specified using the new `KernelPanicOopsBehaviour` enum. The options are
applied to the Kernel Workload parameters `on_panic` and `on_oops` which
are available to all subclasses of `KernelWorkload`.
The main rationale for this reworking is to add the option to cleanly
exit the simulation after dumping the Dmesg buffer. Without this option,
the simulation would continue running after a Kernel panic. If system
components (e.g. a system timer) keep the event queue alive, this causes
the simulation to run slowly to the maximum allowed tick.
This commit converts `gem5::loader::Symbol` to a full class with
private members, enforcing encapsulation. Until now client code has
been able to (and does) access members directly.
This change will enable class invariants to be enforced via accessor
methods.
Change-Id: Ia0b5b080d4f656637a211808e13dce1ddca74541
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
The gem5 standard library hardcoded some parameters of the workload.
E.g., the kernel filename must be `object_file`.
Change-Id: I5eeb7359be399138693eaba0738eaf524c59408f
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
The change will only add include and library path if the fastmodel is
required to build. The change will benefit for most of gem5 build.
Change-Id: I98c20bd1470b7227940036199e02bc001e307eac
Converts CLFLUSHOPT/WB/FLUSH operations from Write to Read operations
during address translation so that they don't trigger a page fault when
done on write-protected pages.
Solves #226
Some variables hava narrow datatypes that overflow on large VLEN values.
For example, the maximum number of microops for LMUL=8 SEW=8 and
VLEN=64K is 2^16.
Change-Id: I5cce759f040884e09ce83bee7e54a62c4b42c5aa
Co-authored-by: Adrià Armejach <adria.armejach@bsc.es>
MOV instructions 8C and 8E can be prefixed with a REX prefix to extend
the source/destination register.
However, the R bit in REX will be applied to the segment register.
The decoder file checks for valid segment registers, checking the
MODRM_REG only, however, later this will be extended with the REX_R when
adding the register to the sources/destinations of the instruction.
This will trigger an assert.
Additionally, MOV instructions of various miscelaneous registers are
also not check for being valid when taking into account the REX_R bit.
This patch checks that the REX_R is not set, otherwise, UD2 will be
generated.
The memory translation require supervisor mode implement. If the
supervisor mode is not implemented, the satp CSR is not exists and
should not do address translation
Change-Id: Ie6c8a1a130d0aab0647b35e0f731f6b930834176