Commit Graph

21633 Commits

Author SHA1 Message Date
Matthew Poremba
e82cf20150 mem-ruby: Remove VIPER StoreThrough temp cache storage (#1156)
StoreThrough in VIPER when the TCP is disabled, GLC bit is set, or SLC
bit is set will bypass the TCP, but will temporarily allocate a cache
entry seemingly to handle write coalescing with valid blocks. It does
not attempt to evict a block if the set is full and the address is
invalid. This causes a panic if the set is full as there is no spare
cache entry to use temporarily to use for DataBlk manipulation. However,
a cache block is not required for this.

This commit removes using a cache block for StoreThrough with invalid
blocks as there is no existing data to coalesce with. It creates no
allocate variants of the actions needed in StoreThrough and pulls the
DataBlk information from the in_msg instead. Non-invalid blocks do not
have this panic as they have a cache entry already.

Fixes issues with StoreThroughs on more aggressive architectures like
MI300.

Change-Id: Id8687eccb991e967bb5292068cbe7686e0930d7d
2024-05-28 11:02:00 -07:00
Ivana Mitrovic
5ec1acaf5f arch-arm: TLBIs targeting EL2 regime are executable from S state (#1176)
Those AArch64 instructions/registers were labelled as executable
from EL3 only if SCR_EL3.NS == 1. This is not valid anymore
after the introduction of FEAT_SEL2
2024-05-28 10:54:18 -07:00
Matthew Poremba
1dfaa224ff arch-vega: Fix GCC 13 build errors (#1162)
The new static analysis in GCC 13 finds issues with operand.hh. This
commit fixes the error so that gem5 compiles when BUILD_GPU is true.

Change-Id: I6f4b0d350f0cabb6e356de20a46e1ca65fd0da55
2024-05-28 07:58:28 -07:00
Giacomo Travaglini
27c7647fee arch-arm: Use monWrite a shorter version
Change-Id: I8da8a39238eb100315d3df496f55a6bf3da948c6
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-05-28 11:20:52 +01:00
Giacomo Travaglini
6995a99d77 arch-arm: TLBIs targeting EL2 regime are executable from S state
Those AArch64 instructions/registers were labelled as executable
from EL3 only if SCR_EL3.NS == 1. This is not valid anymore
after the introduction of FEAT_SEL2

Change-Id: Ie7b56f3fe779c3a99d4f0ef937c7c8ec0530b00e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-05-28 11:20:32 +01:00
Giacomo Travaglini
10dbfb8bb7 arch-arm: Rewrite performTlbi to use map instead of switch (#1166)
This is making it easier for TLBI instructions to share code. Common
code (under the form of tlbi* functions) are closely matching the
instruction description in the Arm pseudocode

Change-Id: If10c22fb4a7df2bcd0335e9761286ad3c458722b

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-05-28 11:03:07 +01:00
Yu-Cheng Chang
4f6fdbf8bf arch-riscv: Fix c.jalr and c.jr instruction (#1163)
The bit 0 of register should be 0 for jump address. Wrong handling the
jump address may cause infinite run or segment fault.

gem5 issue: https://github.com/gem5/gem5/issues/981
2024-05-25 20:18:42 -07:00
Lukas Zenick
96fbc2068a util, ext: Fix building TLM (#1105)
Fixed the issue that did not allow building TLM.

Build commands:
```bash
scons build/ARM/gem5.opt
scons setconfig build/ARM USE_SYSTEMC=n
scons --with-cxx-config --without-python --without-tcmalloc build/ARM/libgem5_opt.so
cd util/tlm
scons
```
Following this README, I tested it successfully with the simple examples:
https://gem5.googlesource.com/public/gem5/+/master/util/tlm/README

GitHub Issue: #591 
Change-Id: If07fae2eb20ad62627e733573f61bc42d594f970

---------

Co-authored-by: Ivana Mitrovic <ivanamit91@gmail.com>
2024-05-24 13:29:58 -07:00
Matthew Poremba
1616d34003 arch-vega: Template MFMA instructions (#1128)
templated
- v_mfma_f64_16x16x4f64

added support for
- v_mfma_f32_32x32x2f32
- v_mfma_f32_4x4x1_16b_f32
- v_mfma_f32_16x16x4f32

[formula for gprs
needed](https://github.com/ROCm/amd_matrix_instruction_calculator)

[formulas for register layouts and lanes used in
computation](https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf)

Change-Id: I15d6c0a5865d58323ae8dbcb3f6dcb701a9ab3c7
2024-05-22 08:53:25 -07:00
Ivana Mitrovic
1a68d71f07 util: Update gem5-resource-manager requirements (#1154)
Bumps [requests](https://github.com/psf/requests) from 2.31.0 to 2.32.0.

Change-Id: I34df01fdd32cb300c4efc8cf072c0aa1137371bc
2024-05-22 07:32:52 -07:00
Bobby R. Bruce
52fbc8ebcf misc: Revert Dramsys Ubuntu to 22.04 to compile in gcc <13 (#1146)
Until https://github.com/gem5/gem5/issues/1121 is fixed, this change
will ensure our Weekly tests pass.
2024-05-21 10:57:16 -07:00
Bobby R. Bruce
6adb7a8637 misc: Remove gcc 8 support, gem5 support GCC >= v10 (#1145)
note: Due to #556 / #555, we don't support GCC 9. This PR removes gcc-8
which means gem5 would support GCC >= version 10.

The reason for removing gcc-8:

1. We already dropped support for gcc-9. I don't see any good reason to
support anything <9 as a result.
2. GCC is relatively old, and we're probably supporting a bit too many
compiler versions anyway. In Ubuntu 22.04, gcc-11 is downloaded by
default with `apt`. It doesn't seem many system are still using gcc.
3. There is a weird compiler bug in gcc-8 which is causes failure when
compiling gem5 since the inclusion of #1123. The error received is as
follows:

```sh
In file included from src/arch/riscv/tlb.hh:42,
                 from src/arch/riscv/mmu.hh:45,
                 from build/ALL/arch/riscv/generated/exec-g.cc.inc:14,
                 from build/ALL/arch/riscv/generated/generic_cpu_exec.cc:5:
src/arch/riscv/utility.hh: In instantiation of ‘FloatType gem5::RiscvISA::ftype(IntType) [with FloatType = float8_t; IntType = unsigned char]’:
build/ALL/arch/riscv/generated/exec-ns.cc.inc:38839:42:   required from ‘gem5::Fault gem5::RiscvISAInst::Vfwcvt_xu_f_vMicro<ElemType>::execute(gem5::ExecContext*, gem5::trace::InstRecord*) const [with ElemType = float8_t; gem5::Fault = std::shared_ptr<gem5::FaultBase>]’
build/ALL/arch/riscv/generated/exec-ns.cc.inc:38856:16:   required from here
src/arch/riscv/utility.hh:327:15: error: parameter ‘a’ set but not used [-Werror=unused-but-set-parameter]
 ftype(IntType a) -> FloatType
       ~~~~~~~~^
src/arch/riscv/utility.hh: In instantiation of ‘IntType gem5::RiscvISA::f_to_wui(FloatType, uint_fast8_t) [with FloatType = float8_t; IntType = short unsigned int; uint_fast8_t = unsigned char]’:
build/ALL/arch/riscv/generated/exec-ns.cc.inc:38838:49:   required from ‘gem5::Fault gem5::RiscvISAInst::Vfwcvt_xu_f_vMicro<ElemType>::execute(gem5::ExecContext*, gem5::trace::InstRecord*) const [with ElemType = float8_t; gem5::Fault = std::shared_ptr<gem5::FaultBase>]’
build/ALL/arch/riscv/generated/exec-ns.cc.inc:38856:16:   required from here
src/arch/riscv/utility.hh:570:20: error: parameter ‘a’ set but not used [-Werror=unused-but-set-parameter]
 f_to_wui(FloatType a, uint_fast8_t mode)
```

Note: This is currently causing our SST Daily tests to fail, and our
compiler tests to fail.
2024-05-21 10:56:41 -07:00
Harshil Patel
33cebe9376 dev: add reset wrap mode to mouse.cc (#1149)
This change fixes #1148 

I have only added an acknowledged return, as we dont ahve remote and
wrap mode so it can only be in stream mode.

Change-Id: I1882042d873ff0e9465c9491238554c8fbb9aa76
2024-05-21 10:55:03 -07:00
Robert Hauser
688f8fb03b arch-riscv: add exception code to DPRINTFS msg (#1153)
Change-Id: Ib5d1dc991f18256ec634c604c776629ea31317a9
2024-05-21 09:59:25 -07:00
Yu-Cheng Chang
5e20438c1c arch-riscv: Fix GDB connection failed after #1099 (#1152)
GDB connection failed after the PR[1] changed the index of CSR_FCSR to
MISCREG_FCSR itself. It cause the out of bound error.

[1]: https://github.com/gem5/gem5/pull/1099

gem5 issue: https://github.com/gem5/gem5/issues/1151
Change-Id: I402febe5a3a9addf3d4821ad716ade14e227d5d7
2024-05-21 09:58:15 -07:00
Harshil Patel
0824d7f2cd Revert "cpu-kvm: Support perf counters on hybrid host architectures" (#1127)
Reverts gem5/gem5#1065

Reverting this change because this PR breaks X86 kvm as mentioned in the
issue #1126.
2024-05-21 08:14:10 -07:00
Giacomo Travaglini
6f4ba0b422 arch-arm: Add missing outer-shareable TLBIs to the list (#1147)
Those were not part of the performTlbi switch and simulation was
therefore panicking when they were encountered

Change-Id: Ifbe0b89e45539df4abc147ac5970b0caf0d9dfdc

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-05-20 19:24:45 -07:00
Chong-Teng Wang
13924336b1 arch-riscv: Fix viota instruction (#1137)
This commit fixes and refactors the implementation of viota. It also
overrides the generateDisassembly function in viota's macro/micro to
correctly print out the instruction when tacing/debugging.

For example, it changes from:
viota_m vd, vd, vs2, v0.t
to:
viota_m vd, vs2, v0.t
2024-05-20 12:19:22 -07:00
Matthew Poremba
82318e85af arch-x86: Improve KVM set XCR (#1138)
This adds two failsafes which may cause a panic on some machines. First,
check the host machine has the KVM XCR capability before calling getXCRs
or setXCRs. Second, ensure the x87 bit, which must always be one, will
always return at least one by modifying the return value in readMiscReg.

Change-Id: I5e778acc926a47443ef6cef29fabd84eb69bb9ba
2024-05-20 10:22:48 -07:00
Matthew Poremba
b91c9be102 arch-vega: Load/stores commonly used with 16b MFMA
This implements some missing loads and store that are commonly used in
applications with MFMA instructions to load 16-bit data types into
specific register locations: DS_READ_U16_D16, DS_READ_U16_D16_HI,
BUFFER_LOAD_SHORT_D16, BUFFER_LOAD_SHORT_D16_HI.

Change-Id: Ie22d81ef010328f4541553a9a674764dc16a9f4d
2024-05-20 09:29:46 -05:00
Matthew Poremba
a4f0d9e6be arch-vega: Implement v_mfma_f32_32x32x8_bf16
Implement a bfloat16 MFMA. This was tested with PyTorch using
dtype=torch.bfloat16.

Change-Id: I35b4e60e71477553a93020ef0ee31d1bcae9ca5d
2024-05-20 09:28:58 -05:00
Matthew Poremba
10f8fdcd14 arch-vega: Unit test for MXFP types
Add a unit test for the MXFP types (bf16, fp16, fp8, bf8). These types
are not currently operated on directly. Instead the are cast to float
values and then arithmetic is performed. As a result, the unit test
simply checks that when we convert a value from MXFP type to float and
back that the values of the MXFP type match. Exact values are used to
avoid discrepancies with rounding.

Can be run using scons build/VEGA_X86/unittests.opt .

Change-Id: I596e9368eb929d239dd2d917e3abd7927b15b71e
2024-05-20 09:28:58 -05:00
Matthew Poremba
de11daec5f arch-vega: Implement F32 <-> F16 conversions
These instructions are used in some of the F16 MFMA example applications
to convert to/from floating point types.

Change-Id: I7426ea663ce11a39fe8c60c8006d8cca11cfaf07
2024-05-20 09:28:58 -05:00
Matthew Poremba
a062229ac3 arch-vega: Implement v_mov_b64
This instruction is new in MI300 and is used in some of the example
applications used to test MFMAs.

Change-Id: I739f8ab2be6a93ee3b6bdc4120d0117724edb0d4
2024-05-20 09:27:12 -05:00
Matthew Poremba
91955ae879 arch-vega: Decodings for all MFMA/SMFMACs up to MI300
This adds the decodings for all of the matrix fused multiply add (MFMA)
and sparse matrix fused multiply accumulate (SMFMAC) instructions up to
and including MI300. This does not yet provide the implementation for
these instructions, however it is easier and less tedious to add them in
bulk rather that one at a time.

Change-Id: I5acd23ca8a26bdec843bead545d1f8820ad95b41
2024-05-20 09:27:12 -05:00
Matthew Poremba
ce578c8831 arch-vega: MFMA templates for MXFP and INT8 types
The microscaling formats (MXFP) and INT8 types require additional size
checks which are not needed for the current MFMA template. The size
check is done using a constexpr method exclusive to the MXFP type,
therefore create a special class for MXFP types. This is preferrable to
attempting to shoehorn into the existing template as it helps with
readability. Similar, INT8 requires a size check to determine number of
elements per VGPR, but it not an MXFP type. Create a special template
for that as well.

This additionally implements all of the MFMA types which have test cases
in the amd-lab-notes repository (https://github.com/amd/amd-lab-notes/).
The implementations were tested using the applications in the
matrix-cores subfolder and achieve L2 norms equivalent or better than
MI200 hardware.

Change-Id: Ia5ae89387149928905e7bcd25302ed3d1df6af38
2024-05-20 09:27:12 -05:00
Matthew Poremba
994c5ad1cc arch-vega: Add PackedReg helper class
This class can be used to load multiple operand dwords into an array and
then select bits from the span of that array. It handles cases where the
bits span two dwords (e.g., you have four dwords for a 128-bit value and
want to select bits 35:30) and cases where multiple values < 32-bits are
packed into a single dword (e.g., two bf16 values).

This is most useful for packed arrays and instructions which have more
than two dwords. Beyond two dwords, the operator[] overload of
VectorOperand is not available requiring additional logic to select from
an operand. This helper class handles that additional logic itself.

Change-Id: I74856d0f312f7549b3b6c405ab71eb2b174c70ac
2024-05-20 09:27:12 -05:00
Matthew Poremba
2bb62a05e1 arch-vega: Implement v_cvt_pk_fp8_f32
This instruction serves as a test for the MXFP8 type.

Change-Id: I2ce30bf7f3a3ecc850a445aebdf971c37c39a79e
2024-05-20 09:27:12 -05:00
Matthew Poremba
d420a0a1e7 arch-vega: Add OCP microscaling formats
The open compute project (OCP) microscaling formats (MX) are used in the
GPU model. The specification is available at [1]. This implements a C++
version of MXFP formats with many constraints that conform to the
specification.

Actually arithmetic is not performed directly on the MXFP types. They
are rather converted to fp32 and the computation is performed. For most
of these types this is acceptable for the GPU model as there are no
instruction which directly perform arithmetic on them. For example, the
DOT/MFMA instructions operating may first convert to FP32 and then
perform arithmetic.

Change-Id: I7235722627f7f66c291792b5dbf9e3ea2f67883e
2024-05-20 09:27:12 -05:00
Marco Kurzynski
d5a734c252 arch-vega: Template MFMA instructions
templated
- v_mfma_f64_16x16x4f64

added support for
- v_mfma_f32_32x32x2f32
- v_mfma_f32_4x4x1_16b_f32
- v_mfma_f32_16x16x4f32

[formula for gprs needed](https://github.com/ROCm/amd_matrix_instruction_calculator)

[formulas for register layouts and lanes used in computation](https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf)

Change-Id: I15d6c0a5865d58323ae8dbcb3f6dcb701a9ab3c7
2024-05-20 09:27:12 -05:00
Bobby R. Bruce
8b30d848e9 scons: Setup scons for gem5 only supporting gcc >=10
Change-Id: I66f83498a38def3d00d1c9e981aa90706ee20bbb
2024-05-20 07:05:08 -07:00
Bobby R. Bruce
ba1c22f143 misc,tests: Remove gcc-8 from compiler tests
GCC Version 10 is no longer supported by the gem5 project.

Change-Id: If657654299c1a018764d5f92e814ed5cd18c50f0
2024-05-20 06:27:45 -07:00
Bobby R. Bruce
d011fe47a9 util-docker: Upgrade sst-env docker image to use GCC 10
Previously was GCC 9 which is no longer supported by gem5.

Change-Id: Ife715446e3f1179d19db544953fbd6ded25f5b4d
2024-05-20 06:24:14 -07:00
Bobby R. Bruce
321c34d0bd util-docker: Remove GCC-8 from docker-compose.yaml
Change-Id: Ia1aba03412b138b05b569b08a146a2123f7142e4
2024-05-20 06:23:28 -07:00
Matthew Poremba
2b3beb92ff dev-amdgpu,gpu-compute,configs: MI300X (#1141)
Release of MI300X simulation capability:

- Implements the required MI300X features over MI200 (currently only
architecture flat scratch).
- Make the gpu-compute model use MI200 features when MI300X / gfx942 is
configured.
- Fix up the scratch_ instructions which are seem to be preferred in
debug hipcc builds over buffer_.
- Add mi300.py config similar to mi200.py. This config can optionally
use resources instead of command line args.
2024-05-17 09:26:04 -07:00
Alexander Richardson
716fe6d31d arch-arm: Fix 32-bit semihosting ABI (#1142)
It appears we have been trying to read 64-bit arguments for ARM32 since
695583709b. I noticed that SYS_OPEN was
trying to read a really long string as the pathname argument and it
turned out it was reading from the wrong stack offset. With this change
I can successfully run some of the semihosting tests for ARM32.

Change-Id: Ie154052dac4211993fb6c4c99d93990123c2eacf
2024-05-16 10:28:45 -07:00
Alexander Richardson
6b34765d5d arch-generic: Avoid out-of-memory errors for bad semihosting calls (#1143)
In BaseSemihosting::readString() we were using the len argument to
allocate a std::vector without checking whether the value makes any
sense. This resulted in a std::bad_alloc exception being raised prior to
https://github.com/gem5/gem5/pull/1142 for my semihosting tests. This
commit prevents semihosting from reading more than 64K for string
arguments which should be more than sufficient for any valid code.

Change-Id: I059669016ee2c5721fedb914595d0494f6cfd4cd
2024-05-16 10:28:10 -07:00
Chong-Teng Wang
adb177dab6 arch-riscv: Fix vrgather instruction (#1134)
This commit fixes the implementation of vrgather instruction based on
rvv 1.0.

In section 16.4. Vector Register Gather Instructions,

> Vector-scalar and vector-immediate forms of the register gather are
also provided. These read one element from the source vector at the
given index, and write this value to the active elements of the
destination vector register. The index value in the scalar register and
the immediate, zero-extended to XLEN bits, are treated as unsigned
integers. If XLEN > SEW, the index value is not truncated to SEW bits.

The fix zero-extends the index value in the scalar register and the
immediate.
2024-05-16 10:12:35 -07:00
Hossam ElAtali
97a87a7c84 util: Fixed gem5img.py script (#990)
Made the script more robust to different names.

Co-authored-by: Hossam ElAtali <hossam.elatali@uwaterloo.ca>
2024-05-16 10:09:27 -07:00
Yu-Cheng Chang
321bd07163 cpu: Don't change to suspend if the thread status is halted (#1039)
In our gem5 model, there are four types represent thread context:
Active, Suspend, Halting and Halted


5641c5e464/src/cpu/thread_context.hh (L99-L117)

When initializing the gem5 instance, all of the thread contexts are set
Halted. The status of thread context will not be active until the
Workload initializes start up, except the StubWorkload. So if the user
uses the StubWorkload, and the CPU is connected with the model_reset
port. The thread context of the CPU will be activated possibly.

The following is the steps of activating thread context of the CPU
without Workload[1] initialization or lower model_reset port[2].

1. Raise the model_reset port (Change the state from Halted to Suspend)
5641c5e464/src/cpu/base.cc (L671-L673)

2. Post the interrupt to CPU (Change the state from Suspend to Active)
5641c5e464/src/cpu/base.cc (L231-L239)

Implementation of wakeup

SimpleCPU:

5641c5e464/src/cpu/simple/base.cc (L251-L259)

MinorCPU:

5641c5e464/src/cpu/minor/cpu.cc (L143-L151)

O3CPU:

5641c5e464/src/cpu/o3/cpu.cc (L1337-L1346)

This CL fixed the issue when raising the model reset port to CPU(let CPU
sleep) if the CPU is not activated by workload. If the CPU status is
halted, it's should not change to Suspend to avoid wake up

Reference

The model_reset is introduced in the CL:
https://gem5-review.googlesource.com/c/public/gem5/+/67574/4

[1] Activate by workload (ARM example):

5641c5e464/src/arch/arm/fs_workload.cc (L101-L114)

[2] Lower the model_reset:

5641c5e464/src/cpu/base.cc (L191-L192)
5641c5e464/src/cpu/base.cc (L674-L685)

Change-Id: I5bfc0b7491d14369fff77b98b71c0ac763fb7c42
2024-05-16 10:02:53 -07:00
Matthew Poremba
6164835230 configs: GPUFS: MI300X
Add a config capable of simulating MI300X ISA (gfx942). This is similar
to the mi200.py config and uses the same scripts followed by some
tuneable parameters. This config optionally lets the user call the
runMI300GPU function with gem5 resources. This allows for something like
the following before a VIPER stdlib python is available:

```
import mi300
from gem5.resources.resource import obtain_resource

disk = obtain_resource("x86-gpu-fs-img")
kernel = obtain_resource("x86-linux-kernel-5.4.0-105-generic")
app = obtain_resource("square-gpu-test")

mi300.runMI300GPUFS("X86KvmCPU", disk, kernel, app)
```

Tested cold boot config, checkpoint create and restore, and using gem5
resources.

Change-Id: I50a13d7a3d207786b779bf7fd47a5645256b1e6a
2024-05-16 09:23:03 -07:00
Matthew Poremba
c1803eafac arch-vega: Architected flat scratch and scratch insts
Architected flat scratch is added in MI300 which store the scratch base
address in dedicated registers rather than in SGPRs. These registers are
used by scratch_ instructions. These are flat instruction which
explicitly target the private memory aperture. These instructions have a
different address calculation than global_ instructions.

This change implements architected flat scratch support, fixes the
address calculation of scratch_ instructions, and implements decodings
for some scratch_ instructions. Previous flat_ instructions which happen
to access the private memory aperture have no change in address
calculation. Since scratch_ instructions are identical to flat_
instruction except for address calculation, the decodings simply reuse
existing flat_ instruction definitions.

Change-Id: I1e1d15a2fbcc7a4a678157c35608f4f22b359e21
2024-05-16 09:23:03 -07:00
Chong-Teng Wang
d48191d608 arch-riscv: Add RVV FP16 support (Zvfh & Zvfhmin) (#1123)
Add support for the following two extensions:

[Zvfh](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#185-zvfh-vector-extension-for-half-precision-floating-point):
Vector Extension for Half-Precision Floating-Point

[Zvfhmin](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#184-zvfhmin-vector-extension-for-minimal-half-precision-floating-point):
Vector Extension for Minimal Half-Precision Floating-Point

For instructions (`vfncvt[.rtz].x[u].f.w`) and (`vfwcvt.f.x[u].v`) which
will become defined when `SEW = 8`, a new template
`VectorFloatWideningAndNarrowingCvtDecodeBlock` is added and 8-bit
floating point type (`float8_t`) is defined.

The data type `float8_t` is introduced in the newer `3e` version of the
SoftFloat Package, however, the current version in use is `3d` which
does not include this definition. Despite this, `float8_t` is utilized
solely for constructing the `vfncvt[.rtz].x[u].f.w` and
`vfwcvt.f.x[u].v` instructions when `SEW = 8`. There are no operations
that directly manipulate data of the `float8_t` type.
2024-05-16 08:37:00 -07:00
Matthew Poremba
8be5ce6fc9 dev-amdgpu,configs,gpu-compute: Add gfx942 version
This is the version for MI300. For the most part, it is the same as
MI200 with the exception of architected flat scratch (not yet
implemented in gem5) and therefore a new version enum is required.

Change-Id: Id18cd7b57c4eebd467c010a3f61e3117beb8d58a
2024-05-15 12:08:41 -07:00
Harshil Patel
65976e4c6d util: Add GNU non executable line to x86 m5 (#1116)
- Adding this line as not specifiying GNU non executable stack was
throwing warnings when building m5
for ubuntu 24.04

Change-Id: I620c508be4090804698391cff671ba5091b053d7
2024-05-14 11:06:13 -07:00
Lukas Zenick
b279e40cb7 configs: nvm sweep fix (#1114)
These changes to sweep and sweep_hybrid for NVM allow them to run. I'm
not an expert on this, so I'm not sure if these are technically correct,
but they no longer fail when running
`build/X86/gem5.opt configs/nvm/sweep.py` and `build/X86/gem5.opt
configs/nvm/sweep_hybrid.py`

GitHub Issue: #669
2024-05-13 14:51:39 -07:00
Zhantong Qiu
6b427a84f7 stdlib: change default exit event for SIMPOINT_BEGIN (#1085)
The SIMPOINT_BEGIN should do nothing by default since it might be used
in various cases.

In
[https://www.mail-archive.com/gem5-users@gem5.org/msg22383.html](mailing
list), a user discovered a bug with the current
`simpoints-se-restore.py` example.
The bug is caused by the default behavior of the SIMPOINT_BEGIN exit
event.
When taking a checkpoint with `simpoints-se-checkpoint.py`, it stores
the future exit event scheduled at the beginning of the simulation. I
did not notice this when I wrote and tested the example script due to
the long print out log and my custom handler of the SIMPOINT_BEGIN exit
event.
In the restoring, the SIMPOINT_BEGIN exit event was triggered right
before the region end, so it resets the stats before the final stats
dump. Therefore, the simulation time is 0 as the user discovered.
This patch should fix this bug.

Change-Id: I800dfbd28d7b2c842864a1ab7d84b8f8e17b9b3c
2024-05-13 14:11:00 -07:00
Ivana Mitrovic
10b24dc9a4 arch-arm: Implement FEAT_MPAM in CPU (#1082)
This PR implements FEAT_MPAM on the CPU side. We define a MPAM system
registers and a mechanism
for tagging memory requests with the MPAM information bundle as
specified in existing documentation [1].

What this PR is *not* covering is the MPAM implementation in a MSC
(Memory System Component).
Which means at the moment it's only possible to have static partitioning
schemes (via the PartitioningPolicies
already part of gem5) and there is currently no way to dynamically
program partitions at runtime.

[1]: https://developer.arm.com/documentation/ddi0487/latest/
2024-05-13 08:56:23 -07:00
Ivana Mitrovic
53245fa0e8 arch-riscv: Fix CSR instruction behavior 2nd attempts (#1099)
Quote from change[1]

> The RISC-V spec clarifies the CSR instruction operation, some of them
shall not read or write CSR by the hints of RD/RS1/uimm, but the
original version use the 'data != oldData' condition to determine
whether write or not, and always read CSR first.
See CSR instruction in spec:
Section 9.1 Page 56 of
https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf

|||Register operand|||
|--- |--- |--- |--- |--- |
|Instruction|rd is x0|rs1 is x0|Reads CSR|Writes CSR|
|CSRRW|Yes|-|No|Yes|
|CSRRW|No|-|Yes|Yes|
|CSRRS/CSRRC|-|Yes|Yes|No|
|CSRRS/CSRRC|-|No|Yes|Yes|
|||Immediate operand|||
|Instruction|rd is x0|uimm = 0|Reads CSR|Writes
CSR|
|CSRRWI|Yes|-|No|Yes|
|CSRRWI|No|-|Yes|Yes|
|CSRRSI/CSRRCI|-|Yes|Yes|No|
|CSRRSI/CSRRCI|-|No|Yes|Yes|

The issue cause the ubuntu hanging because we shared the same status CSR
with `mstatus`, `sstatus` and `ustatus` and interrupt enabling CSR with
mip, sip and uip. We may need to read origin CSR without effect of
unmask bits to avoid override the bits of other CSR. Now the ubuntu can
work after the patch merged.

[1] https://gem5-review.googlesource.com/c/public/gem5/+/67717
2024-05-10 10:21:48 -07:00
Matthew Poremba
e3c2a322a1 arch-vega: Fix SDWA dst select (#1120)
The destination select should take a value of the selection size (dword,
word, or byte) starting at bit 0, move that to the selected destination,
and then apply the unused constraint (DST_U) to the remaining word or
bytes. Currently the code is selecting the word/byte currently being
iterated over, rather than the least significant word/byte. As a result,
any selection that is not word 0 or byte 0 will be replaced with the
original destination value at those bits. This results in the wrong
value.

This commit changes the orig bits to be the original dest value at the
lowest word / byte location. Tested with the mfma_i32_16x16x16i8 example
which uses an SDWA V_OR_B32 to pack i8 values into VGPRs for the MFMA.

Change-Id: I54ed819479a25fa9276d29a8f14f0fea7fd71afe
2024-05-10 08:49:13 -07:00