derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Saúl	5cfad84a98	arch-riscv: correctly set dynamic VLEN for all arith instructions (#1187 ) Some arithmetic instructions of the riscv vector extension where still using the default VLEN=256 instead of the dynamic one through the inherited `vlen` attribute. Most of them only use this to calculate the effective index for the mask element like so: ``` uint32_t ei = i + vtype_VLMAX(vtype, vlen, true) * this->microIdx; if (this->vm \|\| elem_mask(v0, ei)) { ... ``` This means that instructions will wrongly compute the mask index in the second and subsequent micro instructions (`microIdx` > 0). This commit fixes this by adding the corresponding `set_vlen` snippet to the affected instruction formats. Change-Id: Ib041de972d6938490741a9fb4c214a6a5172c34e	2024-06-07 22:33:56 -07:00
Alexander Richardson	ec5881ec4e	arch-arm: avoid using an uninitialized variable use in MMU walks (#1198 ) While running a simple Arm32 binary, I noticed that all memory transactions were being marked as NS instead of S once I turn on the MMU (even though the page tables have the NS bit set to zero). The result of this was that semihosting calls were failing since they were using functional accesses with the SECURE flag set, but the caches only contained NS tagged entries so these accesses always read stale values from DRAM. Digging through the Arm MMU code it appears that the NS bit lookup was being keyed of the `secureLookup` flag which is only used for long descriptors. I believe `0c28712f51` should have used isSecure instead of secureLookup. To avoid using these uninitialized values in the future I wrapped the LPAE state in a std::optional to ensure that it is only accessed once initialized. Change-Id: Ibc406ed3f4cfa768f470e34a5eca3c1a2bf45cd8	2024-06-07 08:59:28 +01:00
Alexander Richardson	8e5fbcbbbb	arch-generic: flush streams after semihosting write calls (#1202 ) The SYS_WRITEC and SYS_WRITE0 calls are specified as writing to the debug channel, so it is a reasonable expectation for these messages to be visibile immediately after the semihosting call. Change-Id: I8e6e9a7aab593a59e82ecb9cf4603c18c7a8acbe	2024-06-06 09:57:36 +01:00
Yu-Cheng Chang	5d3f1c3316	arch-riscv: Add rvZext to BranchTarget (#1173 ) Ensure the upper xlen bits are all zeros Change-Id: Id81330eced907d21320bc1af85ad38fb6e95f6b1	2024-06-03 10:03:51 -07:00
Matthew Poremba	00dcd5b0bc	arch-vega: Implement literals for 64b dest operands This feature has been available since Vega10 but was never implemented. MI300 adds a few new instructions that make use of this more often (e.g., v_mov_b64). Change-Id: Ieeb7834462b76d77c0030f49622d0de09f90c9e4	2024-05-31 13:41:46 -07:00
Matthew Poremba	6c8caf83c6	arch-vega: Implement V_ACCVGPR_MOV_B32 instruction This instruction is a simple move from accumulation register to accumulation register. It is essentially a move with the accumulation offset added to the register index. Change-Id: Ic93ae72599b75c91213f56ebafe5bbd7b2867089	2024-05-31 09:32:35 -07:00
Matthew Poremba	7cdb69bf21	arch-vega: Fill in scratch insts to match flat/global Flat, scratch, and global share the same instruction implementation with different address calculations essentially. These instructions were already implemented but not added to the decoder. This commit adds the remaining scratch instructions which have a shared instruction implementation. Change-Id: I8f2e9ceb221294dce1b81c45745b642f0592d985	2024-05-31 09:32:34 -07:00
Bobby R. Bruce	a0de33110b	arch-vega: Fix clang comp error due to constant exp (#1183 ) The lines `constexpr int B_I = std::ceil(64.0f / (N * M / H));` caused the following compilation error in clang Version 16: ``` error: constexpr variable 'B_I' must be initialized by a constant expression ``` `std::ceil` is not a const expression. Therefore instances of this expression in instructions.hh have been replaced with a constant expression friendly alternative. This is calling our compiler tests to fail: https://github.com/gem5/gem5/actions/runs/9288296434/job/25559409142 Change-Id: I74da1dab08b335c59bdddef6581746a94107f370	2024-05-30 09:44:34 -07:00
Bobby R. Bruce	b161172f65	arch-arm: Fix memory attributes of table walks (#1180 ) This PR is doing the following: 1) Fixing memory attributes of partial translation entries (table walks) 2) Properly setting the cacheability of table walks	2024-05-29 08:07:44 -07:00
Nicholas Mosier	9027d5c3e2	arch-x86: set AF=0 when logical instructions execute (#1171 ) Fix #1168. Prevent logical instructions like AND, OR, and TEST from having input dependencies on the previous value of the Zaps register (ZF+AF+PF+SF) by having them set AF=0, rather than not modifying AF.	2024-05-29 08:04:44 -07:00
Nicholas Mosier	a54d3198a8	arch-x86: break 32/64-bit mov's input dependency on prior dest value (#1172 ) Fix #1169. Break the input dependency of 32-bit and 64-bit 'mov' micro-ops on the prior value in the destination register. Such a dependency is required for 8-bit and 16-bit moves, as they do not completely overwrite the value in the destination register. However, it is unnecessary for 32-bit moves (which implicitly zero the upper 32 bits) and 64-bit moves. This patch implements the fix by adding a new code template field inside the generated constructors of X86StaticInst's, called `invalidate_srcs`, which instruction implementations like `mov` can use to conditionally invalidate particular source registers as needed. In `mov`'s case, this is when the data size is 32 or 64 bits. Change-Id: Ib2aef6be6da08752640ea3414b90efb7965be924	2024-05-29 07:54:03 -07:00
Giacomo Travaglini	c4ed23a10b	arch-arm: Implement HCR_EL2 force broadcast for EL1&0 TLBIs (#1175 ) According to the Arm architecture reference manual, it is possible to force the broadcast of the following TLBIs: AArch64: TLBI VMALLE1, TLBI VAE1, TLBI ASIDE1, TLBI VAAE1, TLBI VALE1, TLBI VAALE1, IC IALLU, TLBI RVAE1, TLBI RVAAE1, TLBI RVALE1, and TLBI RVAALE1. AArch32: BPIALL, TLBIALL, TLBIMVA, TLBIASID, DTLBIALL, DTLBIMVA, DTLBIASID, ITLBIALL, ITLBIMVA, ITLBIASID, TLBIMVAA, ICIALLU, TLBIMVAL, and TLBIMVAAL. Via the HCR_EL2.FB bit Change-Id: Ib11aa05cd202fadfbd9221db7a2043051196ecbd Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-05-29 11:54:24 +01:00
Giacomo Travaglini	e9dcb906b4	arch-arm: Set memory attributes for partial table entries Change-Id: I80adcead410f226c323e4d781adb1ff17a386986 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-05-29 09:30:58 +01:00
Giacomo Travaglini	09f0c20be2	arch-arm: Use HCR_EL2.CD for stage2 table walks When determining the cacheability of table walks, SCTLR.C should only be used in stage1 EL1&0 translations. Stage2 translations should rely on HCR_EL2.CD instead Change-Id: I1b0830bc3fb5086f68d7a7a1560c7fed5d126d28 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-05-29 09:30:58 +01:00
Giacomo Travaglini	854662f48f	arch-arm: Check OSH domain as well for cacheability attribute Make table walks uncacheable if marked as uncacheable in either inner or outer shareable domain Change-Id: I5898a3b91b5b919e0beda6c6fe896394e3ab94df Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-05-29 09:30:58 +01:00
Ivana Mitrovic	5ec1acaf5f	arch-arm: TLBIs targeting EL2 regime are executable from S state (#1176 ) Those AArch64 instructions/registers were labelled as executable from EL3 only if SCR_EL3.NS == 1. This is not valid anymore after the introduction of FEAT_SEL2	2024-05-28 10:54:18 -07:00
Matthew Poremba	1dfaa224ff	arch-vega: Fix GCC 13 build errors (#1162 ) The new static analysis in GCC 13 finds issues with operand.hh. This commit fixes the error so that gem5 compiles when BUILD_GPU is true. Change-Id: I6f4b0d350f0cabb6e356de20a46e1ca65fd0da55	2024-05-28 07:58:28 -07:00
Giacomo Travaglini	27c7647fee	arch-arm: Use monWrite a shorter version Change-Id: I8da8a39238eb100315d3df496f55a6bf3da948c6 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-05-28 11:20:52 +01:00
Giacomo Travaglini	6995a99d77	arch-arm: TLBIs targeting EL2 regime are executable from S state Those AArch64 instructions/registers were labelled as executable from EL3 only if SCR_EL3.NS == 1. This is not valid anymore after the introduction of FEAT_SEL2 Change-Id: Ie7b56f3fe779c3a99d4f0ef937c7c8ec0530b00e Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-05-28 11:20:32 +01:00
Giacomo Travaglini	10dbfb8bb7	arch-arm: Rewrite performTlbi to use map instead of switch (#1166 ) This is making it easier for TLBI instructions to share code. Common code (under the form of tlbi* functions) are closely matching the instruction description in the Arm pseudocode Change-Id: If10c22fb4a7df2bcd0335e9761286ad3c458722b Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-05-28 11:03:07 +01:00
Yu-Cheng Chang	4f6fdbf8bf	arch-riscv: Fix c.jalr and c.jr instruction (#1163 ) The bit 0 of register should be 0 for jump address. Wrong handling the jump address may cause infinite run or segment fault. gem5 issue: https://github.com/gem5/gem5/issues/981	2024-05-25 20:18:42 -07:00
Matthew Poremba	1616d34003	arch-vega: Template MFMA instructions (#1128 ) templated - v_mfma_f64_16x16x4f64 added support for - v_mfma_f32_32x32x2f32 - v_mfma_f32_4x4x1_16b_f32 - v_mfma_f32_16x16x4f32 [formula for gprs needed](https://github.com/ROCm/amd_matrix_instruction_calculator) [formulas for register layouts and lanes used in computation](https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf) Change-Id: I15d6c0a5865d58323ae8dbcb3f6dcb701a9ab3c7	2024-05-22 08:53:25 -07:00
Robert Hauser	688f8fb03b	arch-riscv: add exception code to DPRINTFS msg (#1153 ) Change-Id: Ib5d1dc991f18256ec634c604c776629ea31317a9	2024-05-21 09:59:25 -07:00
Yu-Cheng Chang	5e20438c1c	arch-riscv: Fix GDB connection failed after #1099 (#1152 ) GDB connection failed after the PR[1] changed the index of CSR_FCSR to MISCREG_FCSR itself. It cause the out of bound error. [1]: https://github.com/gem5/gem5/pull/1099 gem5 issue: https://github.com/gem5/gem5/issues/1151 Change-Id: I402febe5a3a9addf3d4821ad716ade14e227d5d7	2024-05-21 09:58:15 -07:00
Giacomo Travaglini	6f4ba0b422	arch-arm: Add missing outer-shareable TLBIs to the list (#1147 ) Those were not part of the performTlbi switch and simulation was therefore panicking when they were encountered Change-Id: Ifbe0b89e45539df4abc147ac5970b0caf0d9dfdc Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-05-20 19:24:45 -07:00
Chong-Teng Wang	13924336b1	arch-riscv: Fix viota instruction (#1137 ) This commit fixes and refactors the implementation of viota. It also overrides the generateDisassembly function in viota's macro/micro to correctly print out the instruction when tacing/debugging. For example, it changes from: viota_m vd, vd, vs2, v0.t to: viota_m vd, vs2, v0.t	2024-05-20 12:19:22 -07:00
Matthew Poremba	82318e85af	arch-x86: Improve KVM set XCR (#1138 ) This adds two failsafes which may cause a panic on some machines. First, check the host machine has the KVM XCR capability before calling getXCRs or setXCRs. Second, ensure the x87 bit, which must always be one, will always return at least one by modifying the return value in readMiscReg. Change-Id: I5e778acc926a47443ef6cef29fabd84eb69bb9ba	2024-05-20 10:22:48 -07:00
Matthew Poremba	b91c9be102	arch-vega: Load/stores commonly used with 16b MFMA This implements some missing loads and store that are commonly used in applications with MFMA instructions to load 16-bit data types into specific register locations: DS_READ_U16_D16, DS_READ_U16_D16_HI, BUFFER_LOAD_SHORT_D16, BUFFER_LOAD_SHORT_D16_HI. Change-Id: Ie22d81ef010328f4541553a9a674764dc16a9f4d	2024-05-20 09:29:46 -05:00
Matthew Poremba	a4f0d9e6be	arch-vega: Implement v_mfma_f32_32x32x8_bf16 Implement a bfloat16 MFMA. This was tested with PyTorch using dtype=torch.bfloat16. Change-Id: I35b4e60e71477553a93020ef0ee31d1bcae9ca5d	2024-05-20 09:28:58 -05:00
Matthew Poremba	10f8fdcd14	arch-vega: Unit test for MXFP types Add a unit test for the MXFP types (bf16, fp16, fp8, bf8). These types are not currently operated on directly. Instead the are cast to float values and then arithmetic is performed. As a result, the unit test simply checks that when we convert a value from MXFP type to float and back that the values of the MXFP type match. Exact values are used to avoid discrepancies with rounding. Can be run using scons build/VEGA_X86/unittests.opt . Change-Id: I596e9368eb929d239dd2d917e3abd7927b15b71e	2024-05-20 09:28:58 -05:00
Matthew Poremba	de11daec5f	arch-vega: Implement F32 <-> F16 conversions These instructions are used in some of the F16 MFMA example applications to convert to/from floating point types. Change-Id: I7426ea663ce11a39fe8c60c8006d8cca11cfaf07	2024-05-20 09:28:58 -05:00
Matthew Poremba	a062229ac3	arch-vega: Implement v_mov_b64 This instruction is new in MI300 and is used in some of the example applications used to test MFMAs. Change-Id: I739f8ab2be6a93ee3b6bdc4120d0117724edb0d4	2024-05-20 09:27:12 -05:00
Matthew Poremba	91955ae879	arch-vega: Decodings for all MFMA/SMFMACs up to MI300 This adds the decodings for all of the matrix fused multiply add (MFMA) and sparse matrix fused multiply accumulate (SMFMAC) instructions up to and including MI300. This does not yet provide the implementation for these instructions, however it is easier and less tedious to add them in bulk rather that one at a time. Change-Id: I5acd23ca8a26bdec843bead545d1f8820ad95b41	2024-05-20 09:27:12 -05:00
Matthew Poremba	ce578c8831	arch-vega: MFMA templates for MXFP and INT8 types The microscaling formats (MXFP) and INT8 types require additional size checks which are not needed for the current MFMA template. The size check is done using a constexpr method exclusive to the MXFP type, therefore create a special class for MXFP types. This is preferrable to attempting to shoehorn into the existing template as it helps with readability. Similar, INT8 requires a size check to determine number of elements per VGPR, but it not an MXFP type. Create a special template for that as well. This additionally implements all of the MFMA types which have test cases in the amd-lab-notes repository (https://github.com/amd/amd-lab-notes/). The implementations were tested using the applications in the matrix-cores subfolder and achieve L2 norms equivalent or better than MI200 hardware. Change-Id: Ia5ae89387149928905e7bcd25302ed3d1df6af38	2024-05-20 09:27:12 -05:00
Matthew Poremba	994c5ad1cc	arch-vega: Add PackedReg helper class This class can be used to load multiple operand dwords into an array and then select bits from the span of that array. It handles cases where the bits span two dwords (e.g., you have four dwords for a 128-bit value and want to select bits 35:30) and cases where multiple values < 32-bits are packed into a single dword (e.g., two bf16 values). This is most useful for packed arrays and instructions which have more than two dwords. Beyond two dwords, the operator[] overload of VectorOperand is not available requiring additional logic to select from an operand. This helper class handles that additional logic itself. Change-Id: I74856d0f312f7549b3b6c405ab71eb2b174c70ac	2024-05-20 09:27:12 -05:00
Matthew Poremba	2bb62a05e1	arch-vega: Implement v_cvt_pk_fp8_f32 This instruction serves as a test for the MXFP8 type. Change-Id: I2ce30bf7f3a3ecc850a445aebdf971c37c39a79e	2024-05-20 09:27:12 -05:00
Matthew Poremba	d420a0a1e7	arch-vega: Add OCP microscaling formats The open compute project (OCP) microscaling formats (MX) are used in the GPU model. The specification is available at [1]. This implements a C++ version of MXFP formats with many constraints that conform to the specification. Actually arithmetic is not performed directly on the MXFP types. They are rather converted to fp32 and the computation is performed. For most of these types this is acceptable for the GPU model as there are no instruction which directly perform arithmetic on them. For example, the DOT/MFMA instructions operating may first convert to FP32 and then perform arithmetic. Change-Id: I7235722627f7f66c291792b5dbf9e3ea2f67883e	2024-05-20 09:27:12 -05:00
Marco Kurzynski	d5a734c252	arch-vega: Template MFMA instructions templated - v_mfma_f64_16x16x4f64 added support for - v_mfma_f32_32x32x2f32 - v_mfma_f32_4x4x1_16b_f32 - v_mfma_f32_16x16x4f32 [formula for gprs needed](https://github.com/ROCm/amd_matrix_instruction_calculator) [formulas for register layouts and lanes used in computation](https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf) Change-Id: I15d6c0a5865d58323ae8dbcb3f6dcb701a9ab3c7	2024-05-20 09:27:12 -05:00
Matthew Poremba	2b3beb92ff	dev-amdgpu,gpu-compute,configs: MI300X (#1141 ) Release of MI300X simulation capability: - Implements the required MI300X features over MI200 (currently only architecture flat scratch). - Make the gpu-compute model use MI200 features when MI300X / gfx942 is configured. - Fix up the scratch_ instructions which are seem to be preferred in debug hipcc builds over buffer_. - Add mi300.py config similar to mi200.py. This config can optionally use resources instead of command line args.	2024-05-17 09:26:04 -07:00
Alexander Richardson	716fe6d31d	arch-arm: Fix 32-bit semihosting ABI (#1142 ) It appears we have been trying to read 64-bit arguments for ARM32 since `695583709b`. I noticed that SYS_OPEN was trying to read a really long string as the pathname argument and it turned out it was reading from the wrong stack offset. With this change I can successfully run some of the semihosting tests for ARM32. Change-Id: Ie154052dac4211993fb6c4c99d93990123c2eacf	2024-05-16 10:28:45 -07:00
Alexander Richardson	6b34765d5d	arch-generic: Avoid out-of-memory errors for bad semihosting calls (#1143 ) In BaseSemihosting::readString() we were using the len argument to allocate a std::vector without checking whether the value makes any sense. This resulted in a std::bad_alloc exception being raised prior to https://github.com/gem5/gem5/pull/1142 for my semihosting tests. This commit prevents semihosting from reading more than 64K for string arguments which should be more than sufficient for any valid code. Change-Id: I059669016ee2c5721fedb914595d0494f6cfd4cd	2024-05-16 10:28:10 -07:00
Chong-Teng Wang	adb177dab6	arch-riscv: Fix vrgather instruction (#1134 ) This commit fixes the implementation of vrgather instruction based on rvv 1.0. In section 16.4. Vector Register Gather Instructions, > Vector-scalar and vector-immediate forms of the register gather are also provided. These read one element from the source vector at the given index, and write this value to the active elements of the destination vector register. The index value in the scalar register and the immediate, zero-extended to XLEN bits, are treated as unsigned integers. If XLEN > SEW, the index value is not truncated to SEW bits. The fix zero-extends the index value in the scalar register and the immediate.	2024-05-16 10:12:35 -07:00
Matthew Poremba	c1803eafac	arch-vega: Architected flat scratch and scratch insts Architected flat scratch is added in MI300 which store the scratch base address in dedicated registers rather than in SGPRs. These registers are used by scratch_ instructions. These are flat instruction which explicitly target the private memory aperture. These instructions have a different address calculation than global_ instructions. This change implements architected flat scratch support, fixes the address calculation of scratch_ instructions, and implements decodings for some scratch_ instructions. Previous flat_ instructions which happen to access the private memory aperture have no change in address calculation. Since scratch_ instructions are identical to flat_ instruction except for address calculation, the decodings simply reuse existing flat_ instruction definitions. Change-Id: I1e1d15a2fbcc7a4a678157c35608f4f22b359e21	2024-05-16 09:23:03 -07:00
Chong-Teng Wang	d48191d608	arch-riscv: Add RVV FP16 support (Zvfh & Zvfhmin) (#1123 ) Add support for the following two extensions: [Zvfh](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#185-zvfh-vector-extension-for-half-precision-floating-point): Vector Extension for Half-Precision Floating-Point [Zvfhmin](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#184-zvfhmin-vector-extension-for-minimal-half-precision-floating-point): Vector Extension for Minimal Half-Precision Floating-Point For instructions (`vfncvt[.rtz].x[u].f.w`) and (`vfwcvt.f.x[u].v`) which will become defined when `SEW = 8`, a new template `VectorFloatWideningAndNarrowingCvtDecodeBlock` is added and 8-bit floating point type (`float8_t`) is defined. The data type `float8_t` is introduced in the newer `3e` version of the SoftFloat Package, however, the current version in use is `3d` which does not include this definition. Despite this, `float8_t` is utilized solely for constructing the `vfncvt[.rtz].x[u].f.w` and `vfwcvt.f.x[u].v` instructions when `SEW = 8`. There are no operations that directly manipulate data of the `float8_t` type.	2024-05-16 08:37:00 -07:00
Ivana Mitrovic	10b24dc9a4	arch-arm: Implement FEAT_MPAM in CPU (#1082 ) This PR implements FEAT_MPAM on the CPU side. We define a MPAM system registers and a mechanism for tagging memory requests with the MPAM information bundle as specified in existing documentation [1]. What this PR is not covering is the MPAM implementation in a MSC (Memory System Component). Which means at the moment it's only possible to have static partitioning schemes (via the PartitioningPolicies already part of gem5) and there is currently no way to dynamically program partitions at runtime. [1]: https://developer.arm.com/documentation/ddi0487/latest/	2024-05-13 08:56:23 -07:00
Ivana Mitrovic	53245fa0e8	arch-riscv: Fix CSR instruction behavior 2nd attempts (#1099 ) Quote from change[1] > The RISC-V spec clarifies the CSR instruction operation, some of them shall not read or write CSR by the hints of RD/RS1/uimm, but the original version use the 'data != oldData' condition to determine whether write or not, and always read CSR first. See CSR instruction in spec: Section 9.1 Page 56 of https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf \|\|\|Register operand\|\|\| \|--- \|--- \|--- \|--- \|--- \| \|Instruction\|rd is x0\|rs1 is x0\|Reads CSR\|Writes CSR\| \|CSRRW\|Yes\|-\|No\|Yes\| \|CSRRW\|No\|-\|Yes\|Yes\| \|CSRRS/CSRRC\|-\|Yes\|Yes\|No\| \|CSRRS/CSRRC\|-\|No\|Yes\|Yes\| \|\|\|Immediate operand\|\|\| \|Instruction\|rd is x0\|uimm = 0\|Reads CSR\|Writes CSR\| \|CSRRWI\|Yes\|-\|No\|Yes\| \|CSRRWI\|No\|-\|Yes\|Yes\| \|CSRRSI/CSRRCI\|-\|Yes\|Yes\|No\| \|CSRRSI/CSRRCI\|-\|No\|Yes\|Yes\| The issue cause the ubuntu hanging because we shared the same status CSR with `mstatus`, `sstatus` and `ustatus` and interrupt enabling CSR with mip, sip and uip. We may need to read origin CSR without effect of unmask bits to avoid override the bits of other CSR. Now the ubuntu can work after the patch merged. [1] https://gem5-review.googlesource.com/c/public/gem5/+/67717	2024-05-10 10:21:48 -07:00
Matthew Poremba	e3c2a322a1	arch-vega: Fix SDWA dst select (#1120 ) The destination select should take a value of the selection size (dword, word, or byte) starting at bit 0, move that to the selected destination, and then apply the unused constraint (DST_U) to the remaining word or bytes. Currently the code is selecting the word/byte currently being iterated over, rather than the least significant word/byte. As a result, any selection that is not word 0 or byte 0 will be replaced with the original destination value at those bits. This results in the wrong value. This commit changes the orig bits to be the original dest value at the lowest word / byte location. Tested with the mfma_i32_16x16x16i8 example which uses an SDWA V_OR_B32 to pack i8 values into VGPRs for the MFMA. Change-Id: I54ed819479a25fa9276d29a8f14f0fea7fd71afe	2024-05-10 08:49:13 -07:00
Chong-Teng Wang	8c4d5f8e27	arch-riscv: Fix narrowing/widening type-convert instructions (#1079 ) Correct ei calculation under VectorFloatWideningCvtFormat and VectorFloatNarrowingCvtFormat. Change-Id: I08699ffe3b9f8a7d4543023437626cc054344053	2024-05-09 10:17:15 -07:00
Roger Chang	c1713a0b18	arch-riscv: Fix CSR instruction behavior 2nd attempts Change-Id: Id0a9a374281445c7821863f0f74564857d3d8fa2	2024-05-07 20:32:56 +08:00
Roger Chang	1a81144985	arch-riscv: Move FCSR implementation to isa.cc Change-Id: I132edfe2c0ae4caecaa9e6209249662895b5c608	2024-05-07 20:32:56 +08:00

1 2 3 4 5 ...

5955 Commits