Commit Graph

5780 Commits

Author SHA1 Message Date
Matthew Poremba
31e63b01ad arch-vega: Add vop3p DOT instructions
Implemented according to the ISA spec. Validated with silion. In
particular the sign extend is important for the signed variants and the
unsigned variants seem to overflow lanes (hence why there is no mask()
in the unsigned varints. FP16 -> FP32 continues using ARM's fplib.

Tested vs. an MI210. Clamp has not been verified.

Change-Id: Ifc09aecbc1ef2c92a5524a43ca529983018a6d59
2024-01-03 15:41:06 -06:00
Matthew Poremba
420cda1bef arch-vega: Implement FP32 packed math
Starting with MI200, packed math can operate on double dword inputs. In
this case, 64-bits of inputs (two VGPRs per lane) contain two FP32
values.

Add instructions to perform add, multiply, and FMA on packed FP32 types.

Change-Id: Ib838bff91a10e02e013cc7c33ec3d91ff08647b0
2024-01-03 15:41:06 -06:00
Matthew Poremba
7b0c47d52f arch-vega: Implement all global atomics up to gfx90a
This change adds all of the missing flat/global atomics up to including
the new atomics in gfx90a (MI200). Adds all decodings and instruction
implementations with the exception of __half2 which does not have a
corresponding data type in gem5. This refactors the execute() and
completeAcc() methods by creating helper functions similar to what
initiateAcc() uses. This reduces redundant code for global atomic
instruction implementations.

Validated all except PK_ADD_F16, ADD_F32, and ADD_F64 which will be done
shortly. Verified the source/dest register sizes in the header are
correct and the template parameters for the new execute()/completeAcc()
methods are correct.

Change-Id: I4b3351229af401a1a4cbfb97166801aac67b74e4
2024-01-03 15:41:06 -06:00
Matthew Poremba
472c697d88 arch-vega: Implement v_mfma_i32_16x16x16i8
Tested using AMD labs notes examples located on github:

https://github.com/amd/amd-lab-notes/blob/release/matrix-cores/
    src/mfma_i32_16x16x16i8.cpp

Change-Id: Ib0e50162288528012b6d3395e1f629ebf12e8e54
2024-01-03 15:41:06 -06:00
Matthew Poremba
7e1b27969f arch-vega: Improve FLAT disassembly
Use the opSelectorToRegSym which will print the full range of VGPRs
(e.g., will now print v[2:3] instead of v2 when the source / dest is
64-bits). Fixes atomic disassembly prints. Now shows "glc" if GLC bit is
enabled. Fixes some VGPR fields being printed as an SGPR in places where
the 9-bit register index bit is implied (e.g., VDST).

This makes it easier to use a GPUExec trace to match with LLVM
disassembly when debugging.

Change-Id: Ia163774850f0054243907aca8fc8d0361e37fdd5
2024-01-03 10:40:34 -06:00
Matthew Poremba
bc69ab0a1f arch-vega: Add VOP3P encodings and packed 16b insts
This adds the VOP3P and VOP3P_MAI encodings from the MI200 spec. These
instructions are used for packed math and miSIMD instructions. The first
19 VOP3P opcodes are implemented and validated against hardware. This
includes all instructions which operate on one dword containing two
packed 16-bit values of fp16, int16_t, or uint16_t.

Implement one MFMA instruction for now which was also validated against
hardware.
2024-01-03 10:40:34 -06:00
Matthew Poremba
4903fe2db1 arch-arm: Allow fplib to be used outside of ARM build
This is useful in other ISAs to implement FP16 computation. For example,
it can be used in the GPU model. The ARM specific misc register is
ignored in that case.

Change-Id: I339ac0ccd9be4371b0f220ad99068e5e12b3d263
2024-01-03 10:40:34 -06:00
Bobby R. Bruce
da3e3b806d arch-riscv: squash walks with tlb hits in startWalkWrapper (#672)
Because each vector load is fragmented into 64 byte cache-aligned
chunks, and one page-table walk is issued per fragment on tlb miss,
walks start to accumulate on a pending queue, which is processed in a
blocking way (no pending walks can be issued while one is being
processed). This adds noticeable latency on vector loads when VLEN is
sufficiently large.

This commit fixes the issue by allowing walks to be squashed if a TLB
lookup hits just before starting the walk on `startWalkWrapper`. This
idea was taken from the ARM walker.
2023-12-13 12:45:40 -08:00
Saúl Adserias
78f23ad2df arch-riscv: squash walks with tlb hits in startWalkWrapper
Change-Id: I1bdfd7b2ee02ddee5a2d4c13bafc8c472f555f61
2023-12-13 16:40:46 +01:00
Giacomo Travaglini
8d09e95420 arch-arm: Partial SVE2 Implementation (#657)
Instructions added:

BGRP, RAX1, EOR3, BCAX,
XAR & TBX, PMUL, PMULLB/T, SMULLB/T and UMULLB/T

Move from gerrit [1]

[1]: https://gem5-review.googlesource.com/c/public/gem5/+/70277

Change-Id: Ia135ba9300eae312b24342bcbda835fef6867113
2023-12-13 10:27:19 +00:00
Bobby R. Bruce
c8cc193db8 arch,arch-riscv: Fix inst flag of RISC-V vector store macro instructions (#674)
Correct the instruction flags of RISC-V vector store instructions, such
as `vse64_v`, `vse32_v`. The `vse64_v` in `decoder.isa` is
`Mem_vc.as<uint64_t>()[i] = Vs3_ud[i];` and it will generate the code
`Mem.as<uint64_t>()[i] = Vs3[i];`. The current regex of assignRE only
mark the operand `Mem` as `dest` only if meet the formats like `Mem =
Rd` or `Mem[i] = Rd` because the code ` = Rd` or `[i] = Rd` match the
`assignRE` respectively. For the expression `Mem.as<uint64_t>()[i]`, the
operand `Mem` will falsely mark the operand as `src` because the code
`.as<uint64_t>()[i]` is not match the `assignRE`.

The PR will ensure the operand `Mem` is dest for the format like
`Mem.as<xxx>()[i] = yyy`.
2023-12-12 13:07:50 -08:00
Bobby R. Bruce
37e4173351 arch-x86: Fix two_byte_opcodes.isa 0x6 -> 0x0 (#666)
This bug was introduced by https://github.com/gem5/gem5/pull/593 and
caused Issue https://github.com/gem5/gem5/issues/664.

Change-Id: Ia55de364ee8260e1fe315e37e1cffbc71ab229fb
2023-12-12 08:21:27 -08:00
Roger Chang
bedc3c597c arch: Fix inst flag of RISC-V vector store macro instructions
Correct the instruction flags of RISC-V vector store instructions, such
as `vse64_v`, `vse32_v`. The `vse64_v`  in `decoder.isa` is
`Mem_vc.as<uint64_t>()[i] = Vs3_ud[i];` and it will generate the code
`Mem.as<uint64_t>()[i] = Vs3[i];`. The current regex of assignRE only
mark the operand `Mem` as `dest` only if meet the formats like `Mem = Rd`
or `Mem[i] = Rd` because the code ` = Rd` or `[i] = Rd` match the
`assignRE` respectively. For the expression `Mem.as<uint64_t>()[i]`,
the operand `Mem` will falsely mark the operand as `src` because the
code `.as<uint64_t>()[i]` is not match the  `assignRE`.

The PR will ensure the operand `Mem` is dest for the format like
`Mem.as<xxx>()[i] = yyy`.

Change-Id: I9c57986a64f1efb81eb9c7ade90712b118e0788d
2023-12-12 17:04:31 +08:00
Roger Chang
10d344a942 arch-riscv: Fix the vector store indexed instructions declaration
Change-Id: I6f8701ef0819c22eda8cb20d09c40101f2d001a0
2023-12-12 16:36:49 +08:00
Giacomo Travaglini
81d3c6307d arch-arm: add Sve mla and mls indexed (#596)
This contains the implementation of mla and MLS index version
instructions from ARM SVE2 ISA specification.
2023-12-07 21:47:35 +00:00
Nitesh Narayana
d962d2588d arch-arm: This commit cleans .isa files
This commit cleans extra new lines from .isa files from this branch

Change-Id: I4087ed230aa041747038b49360c2aba3f82c0790
2023-12-06 16:03:21 +01:00
Matthias Boettcher
e4dccbea8a arch-arm: Partial SVE2 Implementation
Instructions added:

BGRP, RAX1, EOR3, BCAX,
XAR & TBX, PMUL, PMULLB/T, SMULLB/T and UMULLB/T

Change-Id: Ia135ba9300eae312b24342bcbda835fef6867113
2023-12-06 14:26:31 +00:00
Nitesh Narayana
db8e1652e8 arch-arm: This commit uses existing template code for mla/s index
This includes mla/s index version  implementation using the existing template code
to avoid code repeatition.

Change-Id: If1de84e01dec638e206c979ca832308ebc904212
2023-12-05 23:40:06 +01:00
Hoa Nguyen
cf087d4d11 arch-riscv: Add PCEvent for RISCV FS Workload kernel panic/oops
Inspired by a similar feature in ARM's full system workload, this change adds
an option to halt gem5 simulation if the guest system encounter kernel panic
or kernel oops.

On RiscvISA::BootloaderKernelWorkload, by default, the simulation
will exit upon kernel panic, while kernel oops will not induce simulation halt.
This is because the system will essentially do nop after a kernel panic, while the
system might be still functional after a kernel oops.

Dumping kernel's dmesg is useful for diagonizing the cause of kernel panic, so
ideally, we want to dump the guest's dmesg to the host. However, due to a bug
described in [1], kernel v5.18+ dmesg might not be dumped properly. Hence, the
dmesg will not be dumped to the host.

On RiscvISA::FsLinux, this feature is turned off by default as the symbols from the
official RISC-V kernel resource are stripped from the binary. However, if this feature
is enable, the dmesg will be dumped to the host system.

[1] https://github.com/gem5/gem5/issues/550

Change-Id: I8f52257727a3a789ebf99fdd4dffe5b3d89f1ebf
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
2023-12-04 14:59:26 -08:00
Harshil Patel
5eba3941f4 arch-riscv: fix o3 cpu stuck in spinlock bug (#641) 2023-12-03 13:22:46 -08:00
Hoa Nguyen
7a5052b3a0 arch-arm: Only build ArmCapstoneDisassembler when ISA is arm (#553)
Currently, if the Capstone header file is found in the host system,
scons will try to build the ArmCapstoneDisassembler regardless of the
gem5 target ISA. This is causing problem when the host has Capstone, but
the gem5 target ISA is not arm. Compiling gem5 in this case will cause
errors, e.g., ArmISA and ArmSystem is not found.

This change aims to prevent building the ArmCapstoneDisassembler when
the gem5 target ISA is not arm.

Ref:
[1] The Arm Capstone PR https://github.com/gem5/gem5/pull/494

Change-Id: I1e714d34aec8fe2a2af8cd351536951053a4d8a5
2023-12-03 13:22:11 -08:00
Bobby R. Bruce
21919addca Fix for gem5 Issue #550 (#636)
This Pull-Request addresses gem5 Issue #550. The code that dumps the
Dmesg buffer is now templated on the two variants of the `Metadata`
structure, and the correct one is chosen based on the detected Kernel
version.

To support this functionality, the pull request also adds Symbol Size
data to the loader Symbol Table, and adds a method to query the Kernel
Version from the image in guest memory. The new attributes in the Symbol
class are de-serialized speculatively, so no checkpoint upgrader is
required to support this change.
2023-12-01 18:06:20 -08:00
Richard Cooper
d9c870f641 sim: Rework the Linux Kernel exit events (#639)
This patch reworks the Linux Kernel panic and oops events. The code has
been re-factored to provide re-usable events that can be applied to all
ISAs from the base `KernelWorkload` `SimObject`. At the moment they are
installed for the Arm workloads.

This update also provides more configuration options that can be
specified using the new `KernelPanicOopsBehaviour` enum. The options are
applied to the Kernel Workload parameters `on_panic` and `on_oops` which
are available to all subclasses of `KernelWorkload`.

The main rationale for this reworking is to add the option to cleanly
exit the simulation after dumping the Dmesg buffer. Without this option,
the simulation would continue running after a Kernel panic. If system
components (e.g. a system timer) keep the event queue alive, this causes
the simulation to run slowly to the maximum allowed tick.
2023-12-01 17:33:59 -08:00
Richard Cooper
2fbbdad618 base: Add encapsulation to the loader::Symbol class
This commit converts `gem5::loader::Symbol` to a full class with
private members, enforcing encapsulation. Until now client code has
been able to (and does) access members directly.

This change will enable class invariants to be enforced via accessor
methods.

Change-Id: Ia0b5b080d4f656637a211808e13dce1ddca74541
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2023-12-01 22:00:26 +00:00
Hoa Nguyen
bbe5216d88 arch-riscv: Rename BootloaderKernelWorkload parameters
The gem5 standard library hardcoded some parameters of the workload.
E.g., the kernel filename must be `object_file`.

Change-Id: I5eeb7359be399138693eaba0738eaf524c59408f
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2023-12-01 07:28:30 +00:00
Yu-Cheng Chang
a16fd8a592 scons: Limit adding fastmodel files and libpath (#629)
The change will only add include and library path if the fastmodel is
required to build. The change will benefit for most of gem5 build.

Change-Id: I98c20bd1470b7227940036199e02bc001e307eac
2023-11-30 07:36:26 -08:00
Andreas Sandberg
dcdebec0f6 misc,python: Add isort hook to pre-commit (#431) 2023-11-30 09:54:12 +00:00
Bobby R. Bruce
d11c40dcac misc: Run pre-commit run --all-files
This ensures `isort` is applied to all files in the repo.

Change-Id: Ib7ced1c924ef1639542bf0d1a01c5737f6ba43e9
2023-11-29 22:06:41 -08:00
Bobby R. Bruce
fcbcd1ce72 arch-x86: Fixes page fault for CLFLUSH on write-protected pages (#592)
Converts CLFLUSHOPT/WB/FLUSH operations from Write to Read operations
during address translation so that they don't trigger a page fault when
done on write-protected pages.

Solves #226
2023-11-29 14:25:21 -08:00
Yu-Cheng Chang
57ba3fccb7 scons: Move CPPPATH systemc_home to "src/systemc" folder (#617)
Files under src/systemc require the include path of systemc_home

Change-Id: Ibcbac2762259a0b997ac444b2c63a218c27af9ee
2023-11-29 13:56:23 -08:00
Bobby R. Bruce
a2e7bd4698 arch-riscv: Support combination of privilege modes configuration (#522)
The user can select privilege modes witch is included in the system, not
always enable the user and supervisor privilege modes.
2023-11-29 10:12:57 -08:00
Adrià Armejach
b0cefac9b2 arch-riscv: Fix narrow datatypes in RVV isa files (#606)
Some variables hava narrow datatypes that overflow on large VLEN values.
For example, the maximum number of microops for LMUL=8 SEW=8 and
VLEN=64K is 2^16.

Change-Id: I5cce759f040884e09ce83bee7e54a62c4b42c5aa

Co-authored-by: Adrià Armejach <adria.armejach@bsc.es>
2023-11-29 10:11:06 -08:00
Harshil Patel
089b82b2e9 arch-riscv: fix tlb bug (#610)
- one tlb miss was getting counted twice by the lookup function.

Change-Id: I5fee08bd6e936896704e7dbbd242720b8d23b547
2023-11-29 08:39:02 -08:00
Jason Lowe-Power
3fe5e58f28 arch-x86: Fix misc registers in mov instructions (#593)
MOV instructions 8C and 8E can be prefixed with a REX prefix to extend
the source/destination register.
However, the R bit in REX will be applied to the segment register.  
The decoder file checks for valid segment registers, checking the
MODRM_REG only, however, later this will be extended with the REX_R when
adding the register to the sources/destinations of the instruction.
This will trigger an assert.

Additionally, MOV instructions of various miscelaneous registers are
also not check for being valid when taking into account the REX_R bit.

This patch checks that the REX_R is not set, otherwise, UD2 will be
generated.
2023-11-28 11:14:53 -08:00
Roger Chang
9a0c671cce arch-riscv: Handle the exception following the privilege mode set
Change-Id: I4867941ec286fe485e01db848b8c7357488f6cf4
2023-11-28 09:26:27 +08:00
Roger Chang
d56801c240 arch-riscv: Add misa rvs check for memory translation
The memory translation require supervisor mode implement. If the
supervisor mode is not implemented, the satp CSR is not exists and
should not do address translation

Change-Id: Ie6c8a1a130d0aab0647b35e0f731f6b930834176
2023-11-28 09:26:27 +08:00
Roger Chang
6fd4feb797 arch-riscv: fatal_if the process run without SU modes
Change-Id: Ifce7eec6cea10881964c29d206a92f3d10271de6
2023-11-28 09:26:27 +08:00
Roger Chang
9e738a65ea arch-riscv: Add isaExts field for CSR registers
Change-Id: Idd94af57f3a721d455ea7fb9d335fab7b16a0f7e
2023-11-28 09:26:27 +08:00
Roger Chang
0e4f82a119 arch-riscv: define the CSR masks for each privilege modes
Change-Id: I9936d9bc816921a827b94550847d4898b3aa3292
2023-11-28 09:26:27 +08:00
Roger Chang
f745e8cf89 arch-riscv: Initial the privilege modes configuration
1. Declare the new enum type PrivilegeModes
2. Disallow setting the MISA register RVU and RVS.

Change-Id: I932d714bc70c9720a706353c557a5be76c950f81
2023-11-28 09:26:27 +08:00
Aditya K Kamath
9a0566e295 arch-x86: Fixes page fault for CLFLUSH on write-protected pages
Converts CLFLUSHOPT/WB/FLUSH operations from Write to Read operations
during address translation so that they don't trigger a page fault
when done on write-protected pages.

Change-Id: I20e89cc0cb2b288b36ba1f0ba39a2e1bf0f728af
2023-11-28 00:42:17 +00:00
Bobby R. Bruce
d4b7c8a26d Merge branch 'develop' into develop-kconfig 2023-11-27 09:39:08 -08:00
Matthew Poremba
cc9f81b08a arch-vega,arch-gcn3: Bugfix V_PERM_B32 and V_OR3_B32 (#599)
The V_PERM_B32 instruction is selecting the correct byte, but is
shifting into place moving by bits instead of bytes. The V_OR3_B32
instruction is calling the wrong instruction implementation in the
decoder.

This patch fixes both issues plus a bonus fix for GCN3's V_PERM_B32.
(GCN3 does not have V_OR3_B32).

Change-Id: Ied66c43981bc4236f680db42a9868f760becc284
2023-11-26 23:22:01 -08:00
Nitesh Narayana
35ccd7f907 arch-arm: This commit adds the mla/s indexed versions
This includes the isa and instruction implementations
of mla and mls indexed versions from ARM SVE2 ISA spec.

Change-Id: I4fbd0382f23d8611e46411f74dc991f5a211a313
2023-11-24 15:20:30 +01:00
Eduardo José Gómez Hernández
670bf6a488 arch-x86: Check REX_R for MOV misc registers
Change-Id: I08ea37ffe695df500ea84cbddd94be246f916caf
2023-11-24 13:41:24 +01:00
Eduardo José Gómez Hernández
cea169f5e7 arch-x86: Fix segment registers in instructions 8C and 8E
MOV instructions 8C and 8E can be prefixed with a REX prefix to extend
the source/destination register. However, the R bit in REX will be
applied to the segment register.  The decoder file checks for valid
segment registers, checking the MODRM_REG only, however, later this
will be extended with the REX_R when adding the register to the
sources/destinations of the instruction.  This will trigger an assert.

This patch checks that the REX_R is not set, otherwise, UD2 will be
generated.

Change-Id: I78a93c35116232fe37e5ec50025e721b8c633c5f
2023-11-23 10:18:17 +01:00
Roger Chang
92670e9745 fastmodel: Simply the logic of USE_ARM_FASTMODEL setting
Change-Id: Ib00cf83ca881727987050a987a2adb1e9f9d31ef
2023-11-23 14:15:28 +08:00
Roger Chang
d758df4b5c scons: Update the Kconfig build options
The CL updates the Kconfig:
1. Replace the USE_NULL_ISA with BUILD_ISA
2. The USE_XXX_ISAs are depends on BUILD_ISA
3. If the BUILD_ISA is set, at least one of USE_XXX_ISAs must be set
4. Refactor the USE_KVM option

Change-Id: I2a600dea9fb671263b0191c46c5790ebbe91a7b8
2023-11-23 08:26:11 +08:00
Gabe Black
db3a6e8e84 scons: Use Kconfig to configure gem5.
These are not yet consumed by anything, but convert all the settings
from SCons variables to Kconfig variables.

If you have existing SConsopts files which need to be converted, you
should take a look at KCONFIG.md to learn about how kconfig is used in
gem5. You should decide if any variables need to be available to C++ or
kconfig itself, and whether those are options which should be detected
automatically, or should be up to the user. Options which should be
measured automatically should still be in SConsopts files, while user
facing options should be added to new or existing Kconfig files.

Generally, make sure you're storing c++/kconfig visible options in
env['CONF'][...]. Also remove references to sticky_vars since persistent
options should now be handled with kconfig, and export_vars since
everything in env['CONF'] is now exported automatically.

Switch SCons/gem5 to use Kconfig for configuration, except EXTRAS which
is still a sticky SCons variable. This is necessary because EXTRAS also
controls what config options exist. If it came from Kconfig itself, then
there would be a circular dependency. This dependency could
theoretically be handled by reparsing the Kconfig when EXTRAS
directories were added or removed, but that would be complicated, and
isn't supported by kconfiglib. It wouldn't be worth the significant
effort it would take to add it, just to use Kconfig more purely.

Change-Id: I29ab1940b2d7b0e6635a490452d05befe5b4a2c9
2023-11-23 08:26:10 +08:00
Giacomo Travaglini
098feb4042 arch-arm: Fix WFI sleeping in secure mode
The CPU should not sleep with a pending virtual interrupt
if secure mode EL2 is supported (FEAT_SEL2)

Change-Id: Ib71c4a09d76a790331cf6750da45f83694946aee
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-11-21 13:39:41 +00:00