derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matthew Poremba	b91c9be102	arch-vega: Load/stores commonly used with 16b MFMA This implements some missing loads and store that are commonly used in applications with MFMA instructions to load 16-bit data types into specific register locations: DS_READ_U16_D16, DS_READ_U16_D16_HI, BUFFER_LOAD_SHORT_D16, BUFFER_LOAD_SHORT_D16_HI. Change-Id: Ie22d81ef010328f4541553a9a674764dc16a9f4d	2024-05-20 09:29:46 -05:00
Matthew Poremba	a4f0d9e6be	arch-vega: Implement v_mfma_f32_32x32x8_bf16 Implement a bfloat16 MFMA. This was tested with PyTorch using dtype=torch.bfloat16. Change-Id: I35b4e60e71477553a93020ef0ee31d1bcae9ca5d	2024-05-20 09:28:58 -05:00
Matthew Poremba	10f8fdcd14	arch-vega: Unit test for MXFP types Add a unit test for the MXFP types (bf16, fp16, fp8, bf8). These types are not currently operated on directly. Instead the are cast to float values and then arithmetic is performed. As a result, the unit test simply checks that when we convert a value from MXFP type to float and back that the values of the MXFP type match. Exact values are used to avoid discrepancies with rounding. Can be run using scons build/VEGA_X86/unittests.opt . Change-Id: I596e9368eb929d239dd2d917e3abd7927b15b71e	2024-05-20 09:28:58 -05:00
Matthew Poremba	de11daec5f	arch-vega: Implement F32 <-> F16 conversions These instructions are used in some of the F16 MFMA example applications to convert to/from floating point types. Change-Id: I7426ea663ce11a39fe8c60c8006d8cca11cfaf07	2024-05-20 09:28:58 -05:00
Matthew Poremba	a062229ac3	arch-vega: Implement v_mov_b64 This instruction is new in MI300 and is used in some of the example applications used to test MFMAs. Change-Id: I739f8ab2be6a93ee3b6bdc4120d0117724edb0d4	2024-05-20 09:27:12 -05:00
Matthew Poremba	91955ae879	arch-vega: Decodings for all MFMA/SMFMACs up to MI300 This adds the decodings for all of the matrix fused multiply add (MFMA) and sparse matrix fused multiply accumulate (SMFMAC) instructions up to and including MI300. This does not yet provide the implementation for these instructions, however it is easier and less tedious to add them in bulk rather that one at a time. Change-Id: I5acd23ca8a26bdec843bead545d1f8820ad95b41	2024-05-20 09:27:12 -05:00
Matthew Poremba	ce578c8831	arch-vega: MFMA templates for MXFP and INT8 types The microscaling formats (MXFP) and INT8 types require additional size checks which are not needed for the current MFMA template. The size check is done using a constexpr method exclusive to the MXFP type, therefore create a special class for MXFP types. This is preferrable to attempting to shoehorn into the existing template as it helps with readability. Similar, INT8 requires a size check to determine number of elements per VGPR, but it not an MXFP type. Create a special template for that as well. This additionally implements all of the MFMA types which have test cases in the amd-lab-notes repository (https://github.com/amd/amd-lab-notes/). The implementations were tested using the applications in the matrix-cores subfolder and achieve L2 norms equivalent or better than MI200 hardware. Change-Id: Ia5ae89387149928905e7bcd25302ed3d1df6af38	2024-05-20 09:27:12 -05:00
Matthew Poremba	994c5ad1cc	arch-vega: Add PackedReg helper class This class can be used to load multiple operand dwords into an array and then select bits from the span of that array. It handles cases where the bits span two dwords (e.g., you have four dwords for a 128-bit value and want to select bits 35:30) and cases where multiple values < 32-bits are packed into a single dword (e.g., two bf16 values). This is most useful for packed arrays and instructions which have more than two dwords. Beyond two dwords, the operator[] overload of VectorOperand is not available requiring additional logic to select from an operand. This helper class handles that additional logic itself. Change-Id: I74856d0f312f7549b3b6c405ab71eb2b174c70ac	2024-05-20 09:27:12 -05:00
Matthew Poremba	2bb62a05e1	arch-vega: Implement v_cvt_pk_fp8_f32 This instruction serves as a test for the MXFP8 type. Change-Id: I2ce30bf7f3a3ecc850a445aebdf971c37c39a79e	2024-05-20 09:27:12 -05:00
Matthew Poremba	d420a0a1e7	arch-vega: Add OCP microscaling formats The open compute project (OCP) microscaling formats (MX) are used in the GPU model. The specification is available at [1]. This implements a C++ version of MXFP formats with many constraints that conform to the specification. Actually arithmetic is not performed directly on the MXFP types. They are rather converted to fp32 and the computation is performed. For most of these types this is acceptable for the GPU model as there are no instruction which directly perform arithmetic on them. For example, the DOT/MFMA instructions operating may first convert to FP32 and then perform arithmetic. Change-Id: I7235722627f7f66c291792b5dbf9e3ea2f67883e	2024-05-20 09:27:12 -05:00
Marco Kurzynski	d5a734c252	arch-vega: Template MFMA instructions templated - v_mfma_f64_16x16x4f64 added support for - v_mfma_f32_32x32x2f32 - v_mfma_f32_4x4x1_16b_f32 - v_mfma_f32_16x16x4f32 [formula for gprs needed](https://github.com/ROCm/amd_matrix_instruction_calculator) [formulas for register layouts and lanes used in computation](https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf) Change-Id: I15d6c0a5865d58323ae8dbcb3f6dcb701a9ab3c7	2024-05-20 09:27:12 -05:00
Matthew Poremba	2b3beb92ff	dev-amdgpu,gpu-compute,configs: MI300X (#1141 ) Release of MI300X simulation capability: - Implements the required MI300X features over MI200 (currently only architecture flat scratch). - Make the gpu-compute model use MI200 features when MI300X / gfx942 is configured. - Fix up the scratch_ instructions which are seem to be preferred in debug hipcc builds over buffer_. - Add mi300.py config similar to mi200.py. This config can optionally use resources instead of command line args.	2024-05-17 09:26:04 -07:00
Alexander Richardson	716fe6d31d	arch-arm: Fix 32-bit semihosting ABI (#1142 ) It appears we have been trying to read 64-bit arguments for ARM32 since `695583709b`. I noticed that SYS_OPEN was trying to read a really long string as the pathname argument and it turned out it was reading from the wrong stack offset. With this change I can successfully run some of the semihosting tests for ARM32. Change-Id: Ie154052dac4211993fb6c4c99d93990123c2eacf	2024-05-16 10:28:45 -07:00
Alexander Richardson	6b34765d5d	arch-generic: Avoid out-of-memory errors for bad semihosting calls (#1143 ) In BaseSemihosting::readString() we were using the len argument to allocate a std::vector without checking whether the value makes any sense. This resulted in a std::bad_alloc exception being raised prior to https://github.com/gem5/gem5/pull/1142 for my semihosting tests. This commit prevents semihosting from reading more than 64K for string arguments which should be more than sufficient for any valid code. Change-Id: I059669016ee2c5721fedb914595d0494f6cfd4cd	2024-05-16 10:28:10 -07:00
Chong-Teng Wang	adb177dab6	arch-riscv: Fix vrgather instruction (#1134 ) This commit fixes the implementation of vrgather instruction based on rvv 1.0. In section 16.4. Vector Register Gather Instructions, > Vector-scalar and vector-immediate forms of the register gather are also provided. These read one element from the source vector at the given index, and write this value to the active elements of the destination vector register. The index value in the scalar register and the immediate, zero-extended to XLEN bits, are treated as unsigned integers. If XLEN > SEW, the index value is not truncated to SEW bits. The fix zero-extends the index value in the scalar register and the immediate.	2024-05-16 10:12:35 -07:00
Hossam ElAtali	97a87a7c84	util: Fixed gem5img.py script (#990 ) Made the script more robust to different names. Co-authored-by: Hossam ElAtali <hossam.elatali@uwaterloo.ca>	2024-05-16 10:09:27 -07:00
Yu-Cheng Chang	321bd07163	cpu: Don't change to suspend if the thread status is halted (#1039 ) In our gem5 model, there are four types represent thread context: Active, Suspend, Halting and Halted `5641c5e464/src/cpu/thread_context.hh (L99-L117)` When initializing the gem5 instance, all of the thread contexts are set Halted. The status of thread context will not be active until the Workload initializes start up, except the StubWorkload. So if the user uses the StubWorkload, and the CPU is connected with the model_reset port. The thread context of the CPU will be activated possibly. The following is the steps of activating thread context of the CPU without Workload[1] initialization or lower model_reset port[2]. 1. Raise the model_reset port (Change the state from Halted to Suspend) `5641c5e464/src/cpu/base.cc (L671-L673)` 2. Post the interrupt to CPU (Change the state from Suspend to Active) `5641c5e464/src/cpu/base.cc (L231-L239)` Implementation of wakeup SimpleCPU: `5641c5e464/src/cpu/simple/base.cc (L251-L259)` MinorCPU: `5641c5e464/src/cpu/minor/cpu.cc (L143-L151)` O3CPU: `5641c5e464/src/cpu/o3/cpu.cc (L1337-L1346)` This CL fixed the issue when raising the model reset port to CPU(let CPU sleep) if the CPU is not activated by workload. If the CPU status is halted, it's should not change to Suspend to avoid wake up Reference The model_reset is introduced in the CL: https://gem5-review.googlesource.com/c/public/gem5/+/67574/4 [1] Activate by workload (ARM example): `5641c5e464/src/arch/arm/fs_workload.cc (L101-L114)` [2] Lower the model_reset: `5641c5e464/src/cpu/base.cc (L191-L192)` `5641c5e464/src/cpu/base.cc (L674-L685)` Change-Id: I5bfc0b7491d14369fff77b98b71c0ac763fb7c42	2024-05-16 10:02:53 -07:00
Matthew Poremba	6164835230	configs: GPUFS: MI300X Add a config capable of simulating MI300X ISA (gfx942). This is similar to the mi200.py config and uses the same scripts followed by some tuneable parameters. This config optionally lets the user call the runMI300GPU function with gem5 resources. This allows for something like the following before a VIPER stdlib python is available: ``` import mi300 from gem5.resources.resource import obtain_resource disk = obtain_resource("x86-gpu-fs-img") kernel = obtain_resource("x86-linux-kernel-5.4.0-105-generic") app = obtain_resource("square-gpu-test") mi300.runMI300GPUFS("X86KvmCPU", disk, kernel, app) ``` Tested cold boot config, checkpoint create and restore, and using gem5 resources. Change-Id: I50a13d7a3d207786b779bf7fd47a5645256b1e6a	2024-05-16 09:23:03 -07:00
Matthew Poremba	c1803eafac	arch-vega: Architected flat scratch and scratch insts Architected flat scratch is added in MI300 which store the scratch base address in dedicated registers rather than in SGPRs. These registers are used by scratch_ instructions. These are flat instruction which explicitly target the private memory aperture. These instructions have a different address calculation than global_ instructions. This change implements architected flat scratch support, fixes the address calculation of scratch_ instructions, and implements decodings for some scratch_ instructions. Previous flat_ instructions which happen to access the private memory aperture have no change in address calculation. Since scratch_ instructions are identical to flat_ instruction except for address calculation, the decodings simply reuse existing flat_ instruction definitions. Change-Id: I1e1d15a2fbcc7a4a678157c35608f4f22b359e21	2024-05-16 09:23:03 -07:00
Chong-Teng Wang	d48191d608	arch-riscv: Add RVV FP16 support (Zvfh & Zvfhmin) (#1123 ) Add support for the following two extensions: [Zvfh](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#185-zvfh-vector-extension-for-half-precision-floating-point): Vector Extension for Half-Precision Floating-Point [Zvfhmin](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#184-zvfhmin-vector-extension-for-minimal-half-precision-floating-point): Vector Extension for Minimal Half-Precision Floating-Point For instructions (`vfncvt[.rtz].x[u].f.w`) and (`vfwcvt.f.x[u].v`) which will become defined when `SEW = 8`, a new template `VectorFloatWideningAndNarrowingCvtDecodeBlock` is added and 8-bit floating point type (`float8_t`) is defined. The data type `float8_t` is introduced in the newer `3e` version of the SoftFloat Package, however, the current version in use is `3d` which does not include this definition. Despite this, `float8_t` is utilized solely for constructing the `vfncvt[.rtz].x[u].f.w` and `vfwcvt.f.x[u].v` instructions when `SEW = 8`. There are no operations that directly manipulate data of the `float8_t` type.	2024-05-16 08:37:00 -07:00
Matthew Poremba	8be5ce6fc9	dev-amdgpu,configs,gpu-compute: Add gfx942 version This is the version for MI300. For the most part, it is the same as MI200 with the exception of architected flat scratch (not yet implemented in gem5) and therefore a new version enum is required. Change-Id: Id18cd7b57c4eebd467c010a3f61e3117beb8d58a	2024-05-15 12:08:41 -07:00
Harshil Patel	65976e4c6d	util: Add GNU non executable line to x86 m5 (#1116 ) - Adding this line as not specifiying GNU non executable stack was throwing warnings when building m5 for ubuntu 24.04 Change-Id: I620c508be4090804698391cff671ba5091b053d7	2024-05-14 11:06:13 -07:00
Lukas Zenick	b279e40cb7	configs: nvm sweep fix (#1114 ) These changes to sweep and sweep_hybrid for NVM allow them to run. I'm not an expert on this, so I'm not sure if these are technically correct, but they no longer fail when running `build/X86/gem5.opt configs/nvm/sweep.py` and `build/X86/gem5.opt configs/nvm/sweep_hybrid.py` GitHub Issue: #669	2024-05-13 14:51:39 -07:00
Zhantong Qiu	6b427a84f7	stdlib: change default exit event for SIMPOINT_BEGIN (#1085 ) The SIMPOINT_BEGIN should do nothing by default since it might be used in various cases. In [https://www.mail-archive.com/gem5-users@gem5.org/msg22383.html](mailing list), a user discovered a bug with the current `simpoints-se-restore.py` example. The bug is caused by the default behavior of the SIMPOINT_BEGIN exit event. When taking a checkpoint with `simpoints-se-checkpoint.py`, it stores the future exit event scheduled at the beginning of the simulation. I did not notice this when I wrote and tested the example script due to the long print out log and my custom handler of the SIMPOINT_BEGIN exit event. In the restoring, the SIMPOINT_BEGIN exit event was triggered right before the region end, so it resets the stats before the final stats dump. Therefore, the simulation time is 0 as the user discovered. This patch should fix this bug. Change-Id: I800dfbd28d7b2c842864a1ab7d84b8f8e17b9b3c	2024-05-13 14:11:00 -07:00
Ivana Mitrovic	10b24dc9a4	arch-arm: Implement FEAT_MPAM in CPU (#1082 ) This PR implements FEAT_MPAM on the CPU side. We define a MPAM system registers and a mechanism for tagging memory requests with the MPAM information bundle as specified in existing documentation [1]. What this PR is not covering is the MPAM implementation in a MSC (Memory System Component). Which means at the moment it's only possible to have static partitioning schemes (via the PartitioningPolicies already part of gem5) and there is currently no way to dynamically program partitions at runtime. [1]: https://developer.arm.com/documentation/ddi0487/latest/	2024-05-13 08:56:23 -07:00
Ivana Mitrovic	53245fa0e8	arch-riscv: Fix CSR instruction behavior 2nd attempts (#1099 ) Quote from change[1] > The RISC-V spec clarifies the CSR instruction operation, some of them shall not read or write CSR by the hints of RD/RS1/uimm, but the original version use the 'data != oldData' condition to determine whether write or not, and always read CSR first. See CSR instruction in spec: Section 9.1 Page 56 of https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf \|\|\|Register operand\|\|\| \|--- \|--- \|--- \|--- \|--- \| \|Instruction\|rd is x0\|rs1 is x0\|Reads CSR\|Writes CSR\| \|CSRRW\|Yes\|-\|No\|Yes\| \|CSRRW\|No\|-\|Yes\|Yes\| \|CSRRS/CSRRC\|-\|Yes\|Yes\|No\| \|CSRRS/CSRRC\|-\|No\|Yes\|Yes\| \|\|\|Immediate operand\|\|\| \|Instruction\|rd is x0\|uimm = 0\|Reads CSR\|Writes CSR\| \|CSRRWI\|Yes\|-\|No\|Yes\| \|CSRRWI\|No\|-\|Yes\|Yes\| \|CSRRSI/CSRRCI\|-\|Yes\|Yes\|No\| \|CSRRSI/CSRRCI\|-\|No\|Yes\|Yes\| The issue cause the ubuntu hanging because we shared the same status CSR with `mstatus`, `sstatus` and `ustatus` and interrupt enabling CSR with mip, sip and uip. We may need to read origin CSR without effect of unmask bits to avoid override the bits of other CSR. Now the ubuntu can work after the patch merged. [1] https://gem5-review.googlesource.com/c/public/gem5/+/67717	2024-05-10 10:21:48 -07:00
Matthew Poremba	e3c2a322a1	arch-vega: Fix SDWA dst select (#1120 ) The destination select should take a value of the selection size (dword, word, or byte) starting at bit 0, move that to the selected destination, and then apply the unused constraint (DST_U) to the remaining word or bytes. Currently the code is selecting the word/byte currently being iterated over, rather than the least significant word/byte. As a result, any selection that is not word 0 or byte 0 will be replaced with the original destination value at those bits. This results in the wrong value. This commit changes the orig bits to be the original dest value at the lowest word / byte location. Tested with the mfma_i32_16x16x16i8 example which uses an SDWA V_OR_B32 to pack i8 values into VGPRs for the MFMA. Change-Id: I54ed819479a25fa9276d29a8f14f0fea7fd71afe	2024-05-10 08:49:13 -07:00
Chong-Teng Wang	8c4d5f8e27	arch-riscv: Fix narrowing/widening type-convert instructions (#1079 ) Correct ei calculation under VectorFloatWideningCvtFormat and VectorFloatNarrowingCvtFormat. Change-Id: I08699ffe3b9f8a7d4543023437626cc054344053	2024-05-09 10:17:15 -07:00
Harshil Patel	5c82447653	misc: Add resource versions to examples (#1110 ) - Explicitly defining resource version in obtain resource calls in examples. Change-Id: I74ab5d2f5e9bc73a0145585a0fe75f2ec905472f	2024-05-09 10:16:27 -07:00
Matthew Poremba	e4ebe29f43	util: Bump gpu-fs docker to ROCm 6.1 (#1097 ) This version matches the disk image on gem5-resources. Change-Id: I69a45ef290f0fdf2167ead4d67d4d789d30e0e91	2024-05-09 10:11:54 -07:00
Ivana Mitrovic	233135da81	mem-ruby: Fix NullPointerException in RubyRequest (#1118 ) This PR includes a check for `m_pkt` being null and appropriately handles that case. This issue was causing the Daily tests to fail. Change-Id: I87142ca14ca4ab3d8306153a1cf34c2629a119ba	2024-05-09 08:46:13 -07:00
Giacomo Travaglini	0df5635bdf	mem-ruby: Implement NS bit for CHI transactions (#1100 ) This patch is adding the NS bit to CHI requests to make sure they are properly tagged according to their security Change-Id: I33d3610edefbb5a05a6090e9125c35d4fb8bca58 Reviewed-by: Tiago Muck <tiago.muck@arm.com> Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-05-08 07:46:50 +02:00
Ivana Mitrovic	bc0f388316	util: Update gem5-resource-manager requirements (#1115 ) Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.3 to 3.1.4. Bumps [werkzeug](https://github.com/pallets/werkzeug) from 2.3.8 to 3.0.3. Change-Id: I88e97c3c546c8dcfaa8c310a537def850177f0b9	2024-05-07 17:33:51 -07:00
Ivana Mitrovic	06ab3f9b18	misc: Update version in optional-requirements (#1109 ) Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.66.2 to 4.66.3.	2024-05-07 17:33:30 -07:00
Roger Chang	c1713a0b18	arch-riscv: Fix CSR instruction behavior 2nd attempts Change-Id: Id0a9a374281445c7821863f0f74564857d3d8fa2	2024-05-07 20:32:56 +08:00
Roger Chang	1a81144985	arch-riscv: Move FCSR implementation to isa.cc Change-Id: I132edfe2c0ae4caecaa9e6209249662895b5c608	2024-05-07 20:32:56 +08:00
Matthew Poremba	6ed446e546	arch-x86: Add XCR0 register and add to X86KvmCPU (#1040 ) The extended control registers were not being updated in the KVM thread context nor updated in the KVM state. This was causing issues when checkpointing since the XCR0 value was reverting to the default value rather than what it was previously before the checkpoint. THis was causing multiple applications to crash due to executing instructions which are now illegal instructions due to XCR0 being incorrect. This commit adds the XCR0 as a misc register similar to the exiting x86 control registers and adds all of the helper functions to access and set the register value. It also adds support for updating the KVM CPU's state with the register value and updating the thread context's misc reg value so that it is checkpointed along with the other misc regs. Note that this does not add support for XSAVE of the AVX state (i.e., the upper 128 bits of YMM registers). It does however fix the immediate problem in issue #958 . Change-Id: I97456c8b57cbc7b381bd4be94944ce6567a43c76	2024-05-06 09:58:07 -07:00
Matthew Poremba	cb47755e15	gpu: Consolidated fixes for v24.0 (#1103 ) Includes fixes for several bugs reported via email, self found, and internal reports. Also includes runs through Valgrind and UBsan. See individual commits for more details.	2024-05-06 07:35:57 -07:00
Matthew Poremba	0d3d456894	gpu-compute: Invalidate Scalar cache when SQC invalidates (#1093 ) The scalar cache is not being invalidated which causes stale data to be left in the scalar cache between GPU kernels. This commit sends invalidates to the scalar cache when the SQC is invalidated. This is a sufficient baseline for simulation. Since the number of invalidates might be larger than the mandatory queue can hold and no flash invalidate mechanism exists in the VIPER protocol, the command line option for the mandatory queue size is removed, which is the same behavior as the SQC. Change-Id: I1723f224711b04caa4c88beccfa8fb73ccf56572	2024-05-06 07:35:38 -07:00
Giacomo Travaglini	36c1ea9c61	mem-ruby: Implement MakeReadUnique in CHI (#1101 ) Change-Id: I64cd3c62804cca184d68287fc099534e9205f2b8 Reviewed-by: Tiago Muck <tiago.muck@arm.com> Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-05-06 08:30:59 +02:00
Giacomo Travaglini	7c9925bafa	arch-generic: Fix reading from special :semihosting-features file (#1089 ) The implementation of SYS_FLEN was missing, which caused picolibc to treat this file as not implemented. Additionally, there was a bug in the SYS_READ call that was comparing the wrong variable against the passed buffer length. It was comparing the current file position against the buffer length instead of the number of written bytes. Finally, pos was unititialized which could result in spurious errors. Change-Id: I8b487a79df5970a5001d3fef08d5579bb4aa0dd0	2024-05-06 07:30:13 +01:00
dependabot[bot]	d834e8bf4e	misc: bump mypy from 1.9.0 to 1.10.0 (#1092 ) Bumps [mypy](https://github.com/python/mypy) from 1.9.0 to 1.10.0. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-05-03 21:00:51 -07:00
Matthew Poremba	3490d5bf18	gpu-compute: Add DebugFlag for LDS This prints what values are read/written to LDS and the previous value on write. This is useful for debugging problems with LDS instructions. Change-Id: I30063327bec1a1a808914a018467d5d78d5d58b4	2024-05-03 14:31:17 -07:00
Matthew Poremba	29f63f630b	dev-amdgpu: Correct missing GART warning SDMA ptePde packets are generating a warning that a GART address is missing, causing a wrong address to be clobbered by the operation. This commit fixes this by converting the GART address when the queue is running in privledged mode, which is the only mode allowed to use GART addresses. This removes the warnings and writes to the correct memory region. Change-Id: I64acac308db2431c5996b876bf4cda704f51cf25	2024-05-03 14:31:17 -07:00
Matthew Poremba	8249d6d1cd	arch-vega: Remove FP asserts in VOP3 lane manip insts The VOP3 instruction encoding generally states that ABS/NEG modifiers in the instruction encoding are only valid on floating point data types. This is currently coded in gem5 to mean floating point instructions. For untyped instructions like V_CNDMASK_B32, we don't actually know what the data type is. We must trust that the compiler did not attempt to apply these bits to non-FP data types. This commit simply removes the asserts. The ABS/NEG modifiers are therefore ignored which is consistent with the ISA documentation. This is done on the lane manipulation instructions V_CNDMASK_B32, V_READLINE_B32, and V_WRITELANE_B32 which are typically used to mask off or move data between registers. Other bitwise instructions (e.g., V_OR_B32) keep the asserts as bitwise operations on FP types are genernally illegal in languages like C++. Change-Id: I478c5272ba96383a063b2828de21d60948b25c8f	2024-05-03 14:31:17 -07:00
Matthew Poremba	2703fb5699	gpu-compute: Fix valgrind memleak complaints Fixes several memory leaks, mostly of small and medium severity. Fixes mismatched new/new[] and delete/delete[] calls. Change-Id: Iedafc409389bd94e45f330bc587d6d72d1971219	2024-05-03 14:29:31 -07:00
Matthew Poremba	386fb3d1cc	configs: Fix HSA packer processor address The address has one too many zeros and is therefore placed in a memory region usually used for system memory. As a result this causes failure when trying to run a simulation with a huge amount of memory. Change the address to be within the C000'0000h - FFFF'FFFFh X86 I/O hole as was intended. Change-Id: I5d03ac19ea3b2c01a8c431073c12fa1868b3df24	2024-05-03 14:29:30 -07:00
Matthew Poremba	0faa9510f9	arch-vega,gpu-compute: Fix misc ubsan runtime errors Three main fixes: - Remove the initDynOperandInfo. UBSAN errors and exits due to things not being captured properly. After a few failed attempts playing with the capture list, just move the lambda to a new method. - Invalid data type size for some thread mask instructions. This might actually have caused silent bugs when the thread id was > 31. - Alignment issues with the operands. Change-Id: I0297e10df0f0ab9730b6f1bd132602cd36b5e7ac	2024-05-03 14:26:46 -07:00
Harshil Patel	1164f9b81e	tests: update resource to use new checkpoint - Updated the id of the simpoint-se-checkpoint resource. Change-Id: Iab0b10da87b9790c24407e0edce7a18c38e0f48a	2024-05-03 10:55:04 -07:00
Yu-Cheng Chang	3a2a917a53	arch-riscv: Fix VCSR read behavoir (#1076 ) The VCSR should read the value with VXSAT and VXRM <table class="tableblock frame-all grid-all fit-content center"> <caption class="title">Table 40. vcsr layout</caption> <colgroup> <col> <col> <col> </colgroup> <thead> <tr> <th class="tableblock halign-right valign-top">Bits</th> <th class="tableblock halign-left valign-top">Name</th> <th class="tableblock halign-left valign-top">Description</th> </tr> </thead> <tbody> <tr> <td class="tableblock halign-right valign-top"><p class="tableblock">XLEN-1:3</p></td> <td class="tableblock halign-left valign-top"></td> <td class="tableblock halign-left valign-top"><p class="tableblock">Reserved</p></td> </tr> <tr> <td class="tableblock halign-right valign-top"><p class="tableblock">2:1</p></td> <td class="tableblock halign-left valign-top"><p class="tableblock">vxrm[1:0]</p></td> <td class="tableblock halign-left valign-top"><p class="tableblock">Fixed-point rounding mode</p></td> </tr> <tr> <td class="tableblock halign-right valign-top"><p class="tableblock">0</p></td> <td class="tableblock halign-left valign-top"><p class="tableblock">vxsat</p></td> <td class="tableblock halign-left valign-top"><p class="tableblock">Fixed-point accrued saturation flag</p></td> </tr> </tbody> </table> Change-Id: I1227b920da78026951dfa548e41c8cc56da6caac	2024-05-03 09:53:43 -07:00

1 2 3 4 5 ...

21610 Commits