Commit Graph

21660 Commits

Author SHA1 Message Date
Ivana Mitrovic
a764b9be1c Revert "arch-x86: Fix TLB Assertion Error on CFLUSH" (#1196)
Reverts gem5/gem5#1080 as it is not a good fix.
2024-06-04 10:26:53 -07:00
Hoa Nguyen
40ef8f3afb dev: Remove an extra file in virtio (#1191)
`src/dev/virtio/VirtIORng 2.py` is identical to
`src/dev/virtio/VirtIORng.py`, and the former does not appear in any
build script.

Change-Id: I9c5f1b1a3809d1c7028b630c32310e540613e232

Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2024-06-04 08:40:41 -07:00
dependabot[bot]
500bdc5302 misc: bump tqdm from 4.66.3 to 4.66.4 (#1192)
Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.66.3 to 4.66.4.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-04 06:35:35 -07:00
dependabot[bot]
8c98dcb7cf misc: bump pre-commit from 3.7.0 to 3.7.1 (#1193)
Bumps [pre-commit](https://github.com/pre-commit/pre-commit) from 3.7.0
to 3.7.1.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-04 06:34:53 -07:00
Lukas Zenick
dad5c7b6f7 arch-x86: Fix TLB Assertion Error on CFLUSH (#1080)
Fixed the assertion statement in the cpu's translation.hh file so that
it doesn't fail the assertion if the cache is clean.

I compile this c code to `test`
```c
#include <stdio.h>

static inline void clflush(volatile void *p) {
    __asm__ volatile ("clflush (%0)" : : "r"(p) : "memory");
}

int main() {
    int data = 42;  // Example variable

    printf("Value before clflush: %d\n", data);

    clflush(&data);

    printf("Value after clflush: %d\n", data);

    return 0;
}
```
And run it with this script
`./build/X86/gem5.opt configs/learning_gem5/part1/two_level.py ./test`
In order to verify that it no longer fails the assertion check.

GitHub Issue: #862 
Change-Id: I6004662e7c99f637ba0ddb07d205d1657708e99f
2024-06-03 10:17:10 -07:00
Yu-Cheng Chang
5d3f1c3316 arch-riscv: Add rvZext to BranchTarget (#1173)
Ensure the upper xlen bits are all zeros

Change-Id: Id81330eced907d21320bc1af85ad38fb6e95f6b1
2024-06-03 10:03:51 -07:00
Ivana Mitrovic
fe8daa85d6 arch-vega: More scratch, accvgpr instructions (#1190)
- Implements the remaining scratch instruction which have corresponding
flat implementations
- Implements the remaining v_accvgpr instructions.
2024-06-03 08:56:32 -07:00
Matthew Poremba
00dcd5b0bc arch-vega: Implement literals for 64b dest operands
This feature has been available since Vega10 but was never implemented.
MI300 adds a few new instructions that make use of this more often
(e.g., v_mov_b64).

Change-Id: Ieeb7834462b76d77c0030f49622d0de09f90c9e4
2024-05-31 13:41:46 -07:00
Matthew Poremba
6c8caf83c6 arch-vega: Implement V_ACCVGPR_MOV_B32 instruction
This instruction is a simple move from accumulation register to
accumulation register. It is essentially a move with the accumulation
offset added to the register index.

Change-Id: Ic93ae72599b75c91213f56ebafe5bbd7b2867089
2024-05-31 09:32:35 -07:00
Matthew Poremba
7cdb69bf21 arch-vega: Fill in scratch insts to match flat/global
Flat, scratch, and global share the same instruction implementation with
different address calculations essentially. These instructions were
already implemented but not added to the decoder. This commit adds the
remaining scratch instructions which have a shared instruction
implementation.

Change-Id: I8f2e9ceb221294dce1b81c45745b642f0592d985
2024-05-31 09:32:34 -07:00
Bobby R. Bruce
3b7307182f misc: Fix daily-tests
1. Typo in container.
2. Add compression level to minimize size of generated artifact.

Change-Id: I854e814162fb434ad50a64e3070b943905e4134b
2024-05-30 10:15:35 -07:00
Bobby R. Bruce
a0de33110b arch-vega: Fix clang comp error due to constant exp (#1183)
The lines `constexpr int B_I = std::ceil(64.0f / (N * M / H));` caused
the following compilation error in clang Version 16:

```
error: constexpr variable 'B_I' must be initialized by a constant
expression
```

`std::ceil` is not a const expression. Therefore instances of this
expression in instructions.hh have been replaced with a constant
expression friendly alternative.

This is calling our compiler tests to fail:
https://github.com/gem5/gem5/actions/runs/9288296434/job/25559409142

Change-Id: I74da1dab08b335c59bdddef6581746a94107f370
2024-05-30 09:44:34 -07:00
NSurawar
efbfdeabd7 mem-ruby: Reduce handshaking between CorePair and dir (#1117)
Currently when data is downgraded by MOESI_AMD_Base-CorePair (e.g. due
to a replacement) this requires a 4-way handshake between the CorePair
and the dir. Specifically, the CorePair send a message telling the dir
it'd like to downgrade then, the dir sends an ACK back and then, the
CorePair writes the data back, and finally, the dir ACKs the writeback.
This is very inefficient and not representative of how modern protocols
downgrade a request. Accordingly, this commits updates the downgrade
support such that the CorePair writes back the data immediately and then
the dir ACKs it.
Thus, this approach requires only a 2-way handshake.

Change-Id: I7ebc85bb03e8ce46a8847e3240fc170120e9fcd6

Co-authored-by: Neeraj Surawar <neerajs@hyrule.cs.wisc.edu>
2024-05-30 09:36:29 -07:00
Bobby R. Bruce
7c1207d5c4 misc: Another attempt to fix the merge-upload in for daily (#1188)
Change-Id: I6a6064ec3b5be4ac1f7d6cd3c2f6c0ca62d2cfcd
2024-05-30 07:45:35 -07:00
Bobby R. Bruce
65b86cfac9 misc: Fix daily tests merge-artifacts (#1184) 2024-05-30 04:27:40 -07:00
ylldummy
7fa0342a7c mem-cache: Fix maybe-uninitialized warning (#1179)
When compiler tries to inline a vector construction with a default value
as default constructed ReplaceableEntry. It can complain about the
uninitialized member.

Let's provide basic initialization to the members.

Example codepath:
 SignaturePathV2 constructor
 -> GlobalHistoryEntry() as init_value to AssociativeSet
 -> AssociativeSet initialize vector<Entry> with init_value
2024-05-29 10:41:35 -07:00
Bobby R. Bruce
b161172f65 arch-arm: Fix memory attributes of table walks (#1180)
This PR is doing the following:

1) Fixing memory attributes of partial translation entries (table walks)
2) Properly setting the cacheability of table walks
2024-05-29 08:07:44 -07:00
Nicholas Mosier
9027d5c3e2 arch-x86: set AF=0 when logical instructions execute (#1171)
Fix #1168. Prevent logical instructions like AND, OR, and TEST from
having input dependencies on the previous value of the Zaps register
(ZF+AF+PF+SF) by having them set AF=0, rather than not modifying AF.
2024-05-29 08:04:44 -07:00
shinezyy
7d339ee79b util: allow to override ARCH in cxx config's Makefile (#1165)
allow to override ARCH in cxx config's Makefile

gem5 issue: #1164
2024-05-29 07:55:48 -07:00
Bobby R. Bruce
ce0bb4655c util-docker,gpu,gpu-compute: Improve GCN-GPU Dockerfile (#1170)
* The GCC used in the GCN-GPU images was increase from version 8 to 
  version 10. This was necessary due to PR #1145 which made GCC require
GCC >=10. This patch was previously part of #1161 but has been merged
into
  this PR.
* A patch has been applied to ROCm-OpenCL-Runtime to fix a linking error
  in which there were multiple definitions of `ret_val`. This issue is
highlighted here:
https://github.com/ROCm/ROCm-OpenCL-Runtime/issues/113.
  This was previously part #1161 but has been moved into this PR.
* The Dockerfile's `RUN` command (built to layers in the Docker image)
  have been refactored so sources and built objects are deleted in the
  same RUN command as where they were built and installed. This reduces
  the size of the image substantially: from 16.3GB down to 6.6GB.
* The `apt upgrade` has been removed. This step (previously at the start
  of the file) did nothing of importance. Removing it saves both time
building the image and reduces the size of the image by a small amount.
* `--depth=1` is used when cloning repositories so the entire commit
tree
  tree is not pulled each time. This saves some time when building the
  image.
* `apt -y update` has been added  where `apt -y install` is used so
  CACHED image layers do not become an issue in the future if the image
  were to be rebuilt.
2024-05-29 07:54:28 -07:00
Nicholas Mosier
a54d3198a8 arch-x86: break 32/64-bit mov's input dependency on prior dest value (#1172)
Fix #1169. Break the input dependency of 32-bit and 64-bit 'mov'
micro-ops on the prior value in the destination register. Such a
dependency is required for 8-bit and 16-bit moves, as they do not
completely overwrite the value in the destination register. However, it
is unnecessary for 32-bit moves (which implicitly zero the upper 32
bits) and 64-bit moves.

This patch implements the fix by adding a new code template field inside
the generated constructors of X86StaticInst's, called `invalidate_srcs`,
which instruction implementations like `mov` can use to conditionally
invalidate particular source registers as needed. In `mov`'s case, this
is when the data size is 32 or 64 bits.

Change-Id: Ib2aef6be6da08752640ea3414b90efb7965be924
2024-05-29 07:54:03 -07:00
Matthew Poremba
07f6b7c59c dev-amdgpu: Fix pending PCI RLC doorbell (#1157)
SDMA RLC queues do not currently remove their doorbell mapping. This can
cause issues re-registering the queue and prevents the pending doorbells
feature from working. In addition the data value of the doorbell (the
ring buffer rptr) is not saved, leading to UB when this workaround is
used.

This commit removes the doorbell mapping from the gpu device when the
SDMA engine unmaps an RLC queue and copies the next doorbell value to
the pending packet as was originally intended.

Change-Id: Ifd551450f439c065579afcf916f8ff192e7598ab
2024-05-29 07:15:46 -07:00
Giacomo Travaglini
c4ed23a10b arch-arm: Implement HCR_EL2 force broadcast for EL1&0 TLBIs (#1175)
According to the Arm architecture reference manual, it is possible to
force the broadcast of the following TLBIs:

AArch64: TLBI VMALLE1, TLBI VAE1, TLBI ASIDE1, TLBI VAAE1, TLBI VALE1,
TLBI VAALE1, IC IALLU, TLBI RVAE1, TLBI RVAAE1, TLBI RVALE1, and TLBI
RVAALE1.

AArch32: BPIALL, TLBIALL, TLBIMVA, TLBIASID, DTLBIALL, DTLBIMVA,
DTLBIASID, ITLBIALL, ITLBIMVA, ITLBIASID, TLBIMVAA, ICIALLU, TLBIMVAL,
and TLBIMVAAL.

Via the HCR_EL2.FB bit

Change-Id: Ib11aa05cd202fadfbd9221db7a2043051196ecbd

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-05-29 11:54:24 +01:00
Giacomo Travaglini
e9dcb906b4 arch-arm: Set memory attributes for partial table entries
Change-Id: I80adcead410f226c323e4d781adb1ff17a386986
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-05-29 09:30:58 +01:00
Giacomo Travaglini
09f0c20be2 arch-arm: Use HCR_EL2.CD for stage2 table walks
When determining the cacheability of table walks,
SCTLR.C should only be used in stage1 EL1&0 translations.
Stage2 translations should rely on HCR_EL2.CD instead

Change-Id: I1b0830bc3fb5086f68d7a7a1560c7fed5d126d28
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-05-29 09:30:58 +01:00
Giacomo Travaglini
854662f48f arch-arm: Check OSH domain as well for cacheability attribute
Make table walks uncacheable if marked as uncacheable
in either inner or outer shareable domain

Change-Id: I5898a3b91b5b919e0beda6c6fe896394e3ab94df
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-05-29 09:30:58 +01:00
Bobby R. Bruce
4acc20dac1 misc,tests: Download all gem5 bins via one artifact (#1178)
The Daily Tests are failing when downloading artifacts as part of the
`testlib-long-tests` matrix:
https://github.com/gem5/gem5/actions/runs/9250821764/job/25448583827.

It _could_ be that since upgrading to `actions/download@v4`, we're
hitting a limit as the `testlib-long-tests` are downloading every gem5
binary compiled in the `build-gem5` step, each with it's own
`actions/download` step, for every test.

This change adds a small job after `build-gem5` which creates a merged
artifact containing all the gem5 binaries then uses this to lessen the
number of times this action is called in such a short period of time.

Even if the bug still persists, this solution is neater than what was
there previously.
2024-05-28 12:55:30 -07:00
Matthew Poremba
e82cf20150 mem-ruby: Remove VIPER StoreThrough temp cache storage (#1156)
StoreThrough in VIPER when the TCP is disabled, GLC bit is set, or SLC
bit is set will bypass the TCP, but will temporarily allocate a cache
entry seemingly to handle write coalescing with valid blocks. It does
not attempt to evict a block if the set is full and the address is
invalid. This causes a panic if the set is full as there is no spare
cache entry to use temporarily to use for DataBlk manipulation. However,
a cache block is not required for this.

This commit removes using a cache block for StoreThrough with invalid
blocks as there is no existing data to coalesce with. It creates no
allocate variants of the actions needed in StoreThrough and pulls the
DataBlk information from the in_msg instead. Non-invalid blocks do not
have this panic as they have a cache entry already.

Fixes issues with StoreThroughs on more aggressive architectures like
MI300.

Change-Id: Id8687eccb991e967bb5292068cbe7686e0930d7d
2024-05-28 11:02:00 -07:00
Ivana Mitrovic
5ec1acaf5f arch-arm: TLBIs targeting EL2 regime are executable from S state (#1176)
Those AArch64 instructions/registers were labelled as executable
from EL3 only if SCR_EL3.NS == 1. This is not valid anymore
after the introduction of FEAT_SEL2
2024-05-28 10:54:18 -07:00
Matthew Poremba
1dfaa224ff arch-vega: Fix GCC 13 build errors (#1162)
The new static analysis in GCC 13 finds issues with operand.hh. This
commit fixes the error so that gem5 compiles when BUILD_GPU is true.

Change-Id: I6f4b0d350f0cabb6e356de20a46e1ca65fd0da55
2024-05-28 07:58:28 -07:00
Giacomo Travaglini
27c7647fee arch-arm: Use monWrite a shorter version
Change-Id: I8da8a39238eb100315d3df496f55a6bf3da948c6
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-05-28 11:20:52 +01:00
Giacomo Travaglini
6995a99d77 arch-arm: TLBIs targeting EL2 regime are executable from S state
Those AArch64 instructions/registers were labelled as executable
from EL3 only if SCR_EL3.NS == 1. This is not valid anymore
after the introduction of FEAT_SEL2

Change-Id: Ie7b56f3fe779c3a99d4f0ef937c7c8ec0530b00e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-05-28 11:20:32 +01:00
Giacomo Travaglini
10dbfb8bb7 arch-arm: Rewrite performTlbi to use map instead of switch (#1166)
This is making it easier for TLBI instructions to share code. Common
code (under the form of tlbi* functions) are closely matching the
instruction description in the Arm pseudocode

Change-Id: If10c22fb4a7df2bcd0335e9761286ad3c458722b

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-05-28 11:03:07 +01:00
Yu-Cheng Chang
4f6fdbf8bf arch-riscv: Fix c.jalr and c.jr instruction (#1163)
The bit 0 of register should be 0 for jump address. Wrong handling the
jump address may cause infinite run or segment fault.

gem5 issue: https://github.com/gem5/gem5/issues/981
2024-05-25 20:18:42 -07:00
Lukas Zenick
96fbc2068a util, ext: Fix building TLM (#1105)
Fixed the issue that did not allow building TLM.

Build commands:
```bash
scons build/ARM/gem5.opt
scons setconfig build/ARM USE_SYSTEMC=n
scons --with-cxx-config --without-python --without-tcmalloc build/ARM/libgem5_opt.so
cd util/tlm
scons
```
Following this README, I tested it successfully with the simple examples:
https://gem5.googlesource.com/public/gem5/+/master/util/tlm/README

GitHub Issue: #591 
Change-Id: If07fae2eb20ad62627e733573f61bc42d594f970

---------

Co-authored-by: Ivana Mitrovic <ivanamit91@gmail.com>
2024-05-24 13:29:58 -07:00
Matthew Poremba
1616d34003 arch-vega: Template MFMA instructions (#1128)
templated
- v_mfma_f64_16x16x4f64

added support for
- v_mfma_f32_32x32x2f32
- v_mfma_f32_4x4x1_16b_f32
- v_mfma_f32_16x16x4f32

[formula for gprs
needed](https://github.com/ROCm/amd_matrix_instruction_calculator)

[formulas for register layouts and lanes used in
computation](https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf)

Change-Id: I15d6c0a5865d58323ae8dbcb3f6dcb701a9ab3c7
2024-05-22 08:53:25 -07:00
Ivana Mitrovic
1a68d71f07 util: Update gem5-resource-manager requirements (#1154)
Bumps [requests](https://github.com/psf/requests) from 2.31.0 to 2.32.0.

Change-Id: I34df01fdd32cb300c4efc8cf072c0aa1137371bc
2024-05-22 07:32:52 -07:00
Bobby R. Bruce
52fbc8ebcf misc: Revert Dramsys Ubuntu to 22.04 to compile in gcc <13 (#1146)
Until https://github.com/gem5/gem5/issues/1121 is fixed, this change
will ensure our Weekly tests pass.
2024-05-21 10:57:16 -07:00
Bobby R. Bruce
6adb7a8637 misc: Remove gcc 8 support, gem5 support GCC >= v10 (#1145)
note: Due to #556 / #555, we don't support GCC 9. This PR removes gcc-8
which means gem5 would support GCC >= version 10.

The reason for removing gcc-8:

1. We already dropped support for gcc-9. I don't see any good reason to
support anything <9 as a result.
2. GCC is relatively old, and we're probably supporting a bit too many
compiler versions anyway. In Ubuntu 22.04, gcc-11 is downloaded by
default with `apt`. It doesn't seem many system are still using gcc.
3. There is a weird compiler bug in gcc-8 which is causes failure when
compiling gem5 since the inclusion of #1123. The error received is as
follows:

```sh
In file included from src/arch/riscv/tlb.hh:42,
                 from src/arch/riscv/mmu.hh:45,
                 from build/ALL/arch/riscv/generated/exec-g.cc.inc:14,
                 from build/ALL/arch/riscv/generated/generic_cpu_exec.cc:5:
src/arch/riscv/utility.hh: In instantiation of ‘FloatType gem5::RiscvISA::ftype(IntType) [with FloatType = float8_t; IntType = unsigned char]’:
build/ALL/arch/riscv/generated/exec-ns.cc.inc:38839:42:   required from ‘gem5::Fault gem5::RiscvISAInst::Vfwcvt_xu_f_vMicro<ElemType>::execute(gem5::ExecContext*, gem5::trace::InstRecord*) const [with ElemType = float8_t; gem5::Fault = std::shared_ptr<gem5::FaultBase>]’
build/ALL/arch/riscv/generated/exec-ns.cc.inc:38856:16:   required from here
src/arch/riscv/utility.hh:327:15: error: parameter ‘a’ set but not used [-Werror=unused-but-set-parameter]
 ftype(IntType a) -> FloatType
       ~~~~~~~~^
src/arch/riscv/utility.hh: In instantiation of ‘IntType gem5::RiscvISA::f_to_wui(FloatType, uint_fast8_t) [with FloatType = float8_t; IntType = short unsigned int; uint_fast8_t = unsigned char]’:
build/ALL/arch/riscv/generated/exec-ns.cc.inc:38838:49:   required from ‘gem5::Fault gem5::RiscvISAInst::Vfwcvt_xu_f_vMicro<ElemType>::execute(gem5::ExecContext*, gem5::trace::InstRecord*) const [with ElemType = float8_t; gem5::Fault = std::shared_ptr<gem5::FaultBase>]’
build/ALL/arch/riscv/generated/exec-ns.cc.inc:38856:16:   required from here
src/arch/riscv/utility.hh:570:20: error: parameter ‘a’ set but not used [-Werror=unused-but-set-parameter]
 f_to_wui(FloatType a, uint_fast8_t mode)
```

Note: This is currently causing our SST Daily tests to fail, and our
compiler tests to fail.
2024-05-21 10:56:41 -07:00
Harshil Patel
33cebe9376 dev: add reset wrap mode to mouse.cc (#1149)
This change fixes #1148 

I have only added an acknowledged return, as we dont ahve remote and
wrap mode so it can only be in stream mode.

Change-Id: I1882042d873ff0e9465c9491238554c8fbb9aa76
2024-05-21 10:55:03 -07:00
Robert Hauser
688f8fb03b arch-riscv: add exception code to DPRINTFS msg (#1153)
Change-Id: Ib5d1dc991f18256ec634c604c776629ea31317a9
2024-05-21 09:59:25 -07:00
Yu-Cheng Chang
5e20438c1c arch-riscv: Fix GDB connection failed after #1099 (#1152)
GDB connection failed after the PR[1] changed the index of CSR_FCSR to
MISCREG_FCSR itself. It cause the out of bound error.

[1]: https://github.com/gem5/gem5/pull/1099

gem5 issue: https://github.com/gem5/gem5/issues/1151
Change-Id: I402febe5a3a9addf3d4821ad716ade14e227d5d7
2024-05-21 09:58:15 -07:00
Harshil Patel
0824d7f2cd Revert "cpu-kvm: Support perf counters on hybrid host architectures" (#1127)
Reverts gem5/gem5#1065

Reverting this change because this PR breaks X86 kvm as mentioned in the
issue #1126.
2024-05-21 08:14:10 -07:00
Giacomo Travaglini
6f4ba0b422 arch-arm: Add missing outer-shareable TLBIs to the list (#1147)
Those were not part of the performTlbi switch and simulation was
therefore panicking when they were encountered

Change-Id: Ifbe0b89e45539df4abc147ac5970b0caf0d9dfdc

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2024-05-20 19:24:45 -07:00
Chong-Teng Wang
13924336b1 arch-riscv: Fix viota instruction (#1137)
This commit fixes and refactors the implementation of viota. It also
overrides the generateDisassembly function in viota's macro/micro to
correctly print out the instruction when tacing/debugging.

For example, it changes from:
viota_m vd, vd, vs2, v0.t
to:
viota_m vd, vs2, v0.t
2024-05-20 12:19:22 -07:00
Matthew Poremba
82318e85af arch-x86: Improve KVM set XCR (#1138)
This adds two failsafes which may cause a panic on some machines. First,
check the host machine has the KVM XCR capability before calling getXCRs
or setXCRs. Second, ensure the x87 bit, which must always be one, will
always return at least one by modifying the return value in readMiscReg.

Change-Id: I5e778acc926a47443ef6cef29fabd84eb69bb9ba
2024-05-20 10:22:48 -07:00
Matthew Poremba
b91c9be102 arch-vega: Load/stores commonly used with 16b MFMA
This implements some missing loads and store that are commonly used in
applications with MFMA instructions to load 16-bit data types into
specific register locations: DS_READ_U16_D16, DS_READ_U16_D16_HI,
BUFFER_LOAD_SHORT_D16, BUFFER_LOAD_SHORT_D16_HI.

Change-Id: Ie22d81ef010328f4541553a9a674764dc16a9f4d
2024-05-20 09:29:46 -05:00
Matthew Poremba
a4f0d9e6be arch-vega: Implement v_mfma_f32_32x32x8_bf16
Implement a bfloat16 MFMA. This was tested with PyTorch using
dtype=torch.bfloat16.

Change-Id: I35b4e60e71477553a93020ef0ee31d1bcae9ca5d
2024-05-20 09:28:58 -05:00
Matthew Poremba
10f8fdcd14 arch-vega: Unit test for MXFP types
Add a unit test for the MXFP types (bf16, fp16, fp8, bf8). These types
are not currently operated on directly. Instead the are cast to float
values and then arithmetic is performed. As a result, the unit test
simply checks that when we convert a value from MXFP type to float and
back that the values of the MXFP type match. Exact values are used to
avoid discrepancies with rounding.

Can be run using scons build/VEGA_X86/unittests.opt .

Change-Id: I596e9368eb929d239dd2d917e3abd7927b15b71e
2024-05-20 09:28:58 -05:00
Matthew Poremba
de11daec5f arch-vega: Implement F32 <-> F16 conversions
These instructions are used in some of the F16 MFMA example applications
to convert to/from floating point types.

Change-Id: I7426ea663ce11a39fe8c60c8006d8cca11cfaf07
2024-05-20 09:28:58 -05:00