derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Giacomo Travaglini	27c7647fee	arch-arm: Use monWrite a shorter version Change-Id: I8da8a39238eb100315d3df496f55a6bf3da948c6 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-05-28 11:20:52 +01:00
Giacomo Travaglini	6995a99d77	arch-arm: TLBIs targeting EL2 regime are executable from S state Those AArch64 instructions/registers were labelled as executable from EL3 only if SCR_EL3.NS == 1. This is not valid anymore after the introduction of FEAT_SEL2 Change-Id: Ie7b56f3fe779c3a99d4f0ef937c7c8ec0530b00e Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-05-28 11:20:32 +01:00
Giacomo Travaglini	10dbfb8bb7	arch-arm: Rewrite performTlbi to use map instead of switch (#1166 ) This is making it easier for TLBI instructions to share code. Common code (under the form of tlbi* functions) are closely matching the instruction description in the Arm pseudocode Change-Id: If10c22fb4a7df2bcd0335e9761286ad3c458722b Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-05-28 11:03:07 +01:00
Bobby R. Bruce	8f0ed46061	stdlib: Move `_m5.stats.processDumpQueue` to call-once This commit addresses Jason's comment (https://github.com/gem5/gem5/pull/996#discussion_r1613870880) which highlighted putting the `_m5.stats.processDumpQueue` call in the iteration through the `root` object in `get_simstat` caused this function be potentially called many times when it only needs to be called once. This chance moved this call to just before this iteration and will tehrefore only be called once (if required) per `get_simstat` execution. Change-Id: I16908b6dee063a0df7877a19e215883963bfb081	2024-05-27 08:35:21 -07:00
Yu-Cheng Chang	4f6fdbf8bf	arch-riscv: Fix c.jalr and c.jr instruction (#1163 ) The bit 0 of register should be 0 for jump address. Wrong handling the jump address may cause infinite run or segment fault. gem5 issue: https://github.com/gem5/gem5/issues/981	2024-05-25 20:18:42 -07:00
Lukas Zenick	96fbc2068a	util, ext: Fix building TLM (#1105 ) Fixed the issue that did not allow building TLM. Build commands: ```bash scons build/ARM/gem5.opt scons setconfig build/ARM USE_SYSTEMC=n scons --with-cxx-config --without-python --without-tcmalloc build/ARM/libgem5_opt.so cd util/tlm scons ``` Following this README, I tested it successfully with the simple examples: https://gem5.googlesource.com/public/gem5/+/master/util/tlm/README GitHub Issue: #591 Change-Id: If07fae2eb20ad62627e733573f61bc42d594f970 --------- Co-authored-by: Ivana Mitrovic <ivanamit91@gmail.com>	2024-05-24 13:29:58 -07:00
Bobby R. Bruce	0f6bd24c95	stdlib: Fix get_simstat to accept lists of SimObjects Change-Id: Iae12a0ac88f9646acb00e73d70f83b1e2ff94ac9	2024-05-23 14:54:59 -07:00
Bobby R. Bruce	c509615ec9	tests: Pretty print Dict when compating for PyStats Change-Id: I1d93453072d12aa2dd40066f364723de1225b4e0	2024-05-23 14:54:59 -07:00
Bobby R. Bruce	45b26ce465	stdlib: Specialize scalar tests; use 'pystat', not 'simstat' 1. Thests here for the Scalar tasks are named appropriately. Not just generic "SimStats tess". 2. We remove 'simstat' terminology. The correct word is "Pystats". Change-Id: Idebc4e750f4be7f140ad6bff9c6772f580a24861	2024-05-23 14:54:59 -07:00
Bobby R. Bruce	c0a1fa33fe	stdlib: Improve PyStat support for SimObject Vectors Change-Id: Iba0c93ffa5c4b18acf75af82965c63a8881df189	2024-05-23 14:54:59 -07:00
Bobby R. Bruce	178679cbfd	stdlib: Add SparseHist to PyStats This is inclusive of tests to ensure they have implemented correctly. Change-Id: I5c84d5ffdb7b914936cfd86ca012a7b141eeaf42	2024-05-23 14:54:59 -07:00
Bobby R. Bruce	b5e8804cd4	stdlib: Remove 'Vector' group subclass This was not used and easily confused with the other 'Vector' in PyStats. Change-Id: I9294bb0ae04db0537c87a5f50ce023fc83d587b8	2024-05-23 14:54:59 -07:00
Bobby R. Bruce	6ae3692057	stdlib: Add Vector2d to PyStats Change-Id: Icb2f691abf88ef4bac8d277e421329edb000209b	2024-05-23 14:54:59 -07:00
Bobby R. Bruce	a3af819d82	stdlib: Remove PyStats Accumulator This appears to have no equivalent type in the CPP stats and was never utilized in PyStats. Change-Id: Ia9afc83b4159eb1ab2c6f44ec0ad86cd73f2a4f8	2024-05-23 14:54:59 -07:00
Bobby R. Bruce	940e1d2063	stdlib: Fix PyStats Distribution to be vector of Scalars As Distribution inherits from Vector, it should be constructed with a Dictionary of scalars (in our implementation, a dictionary mapping the vector position's unique id for each bin and the value of that bin). Change-Id: Ie603c248e5db4b6dd7f71cc453eebd78793f69a3	2024-05-23 14:54:59 -07:00
Bobby R. Bruce	252dbe9c72	stdlib: Add tests for PyStats's Vector and fix bugs The big thing missing from the Vector stats was that each position in the vector could have it's own unique id (a str, float, or int) and each position in the vector can have its own description. Therefore, to add this the Vector is represented as a dictionary mapping the unique ID to a Pystat Scaler (whcih can have it's own unique description. Change-Id: I3a8634f43298f6491300cf5a4f9d25dee8101808	2024-05-23 14:54:59 -07:00
Bobby R. Bruce	3c86175d08	stdlib: Rename BaseScalarVector -> Vector This isn't a true Base class, it's just a Vector. In gem5 all Vectors are Scalar Vectors. This change simplfies the naming. Change-Id: Ib8881d854ab18de6acbf0fb200db2de6a43621a7	2024-05-23 14:54:58 -07:00
Matthew Poremba	1616d34003	arch-vega: Template MFMA instructions (#1128 ) templated - v_mfma_f64_16x16x4f64 added support for - v_mfma_f32_32x32x2f32 - v_mfma_f32_4x4x1_16b_f32 - v_mfma_f32_16x16x4f32 [formula for gprs needed](https://github.com/ROCm/amd_matrix_instruction_calculator) [formulas for register layouts and lanes used in computation](https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf) Change-Id: I15d6c0a5865d58323ae8dbcb3f6dcb701a9ab3c7	2024-05-22 08:53:25 -07:00
Ivana Mitrovic	1a68d71f07	util: Update gem5-resource-manager requirements (#1154 ) Bumps [requests](https://github.com/psf/requests) from 2.31.0 to 2.32.0. Change-Id: I34df01fdd32cb300c4efc8cf072c0aa1137371bc	2024-05-22 07:32:52 -07:00
Bobby R. Bruce	0b2243bb0a	misc: Sync stable .github dir with develop (#1155 )	2024-05-21 11:56:17 -07:00
Bobby R. Bruce	52fbc8ebcf	misc: Revert Dramsys Ubuntu to 22.04 to compile in gcc <13 (#1146 ) Until https://github.com/gem5/gem5/issues/1121 is fixed, this change will ensure our Weekly tests pass.	2024-05-21 10:57:16 -07:00
Bobby R. Bruce	6adb7a8637	misc: Remove gcc 8 support, gem5 support GCC >= v10 (#1145 ) note: Due to #556 / #555, we don't support GCC 9. This PR removes gcc-8 which means gem5 would support GCC >= version 10. The reason for removing gcc-8: 1. We already dropped support for gcc-9. I don't see any good reason to support anything <9 as a result. 2. GCC is relatively old, and we're probably supporting a bit too many compiler versions anyway. In Ubuntu 22.04, gcc-11 is downloaded by default with `apt`. It doesn't seem many system are still using gcc. 3. There is a weird compiler bug in gcc-8 which is causes failure when compiling gem5 since the inclusion of #1123. The error received is as follows: ```sh In file included from src/arch/riscv/tlb.hh:42, from src/arch/riscv/mmu.hh:45, from build/ALL/arch/riscv/generated/exec-g.cc.inc:14, from build/ALL/arch/riscv/generated/generic_cpu_exec.cc:5: src/arch/riscv/utility.hh: In instantiation of ‘FloatType gem5::RiscvISA::ftype(IntType) [with FloatType = float8_t; IntType = unsigned char]’: build/ALL/arch/riscv/generated/exec-ns.cc.inc:38839:42: required from ‘gem5::Fault gem5::RiscvISAInst::Vfwcvt_xu_f_vMicro<ElemType>::execute(gem5::ExecContext, gem5::trace::InstRecord) const [with ElemType = float8_t; gem5::Fault = std::shared_ptr<gem5::FaultBase>]’ build/ALL/arch/riscv/generated/exec-ns.cc.inc:38856:16: required from here src/arch/riscv/utility.hh:327:15: error: parameter ‘a’ set but not used [-Werror=unused-but-set-parameter] ftype(IntType a) -> FloatType ~~~~~~~~^ src/arch/riscv/utility.hh: In instantiation of ‘IntType gem5::RiscvISA::f_to_wui(FloatType, uint_fast8_t) [with FloatType = float8_t; IntType = short unsigned int; uint_fast8_t = unsigned char]’: build/ALL/arch/riscv/generated/exec-ns.cc.inc:38838:49: required from ‘gem5::Fault gem5::RiscvISAInst::Vfwcvt_xu_f_vMicro<ElemType>::execute(gem5::ExecContext, gem5::trace::InstRecord) const [with ElemType = float8_t; gem5::Fault = std::shared_ptr<gem5::FaultBase>]’ build/ALL/arch/riscv/generated/exec-ns.cc.inc:38856:16: required from here src/arch/riscv/utility.hh:570:20: error: parameter ‘a’ set but not used [-Werror=unused-but-set-parameter] f_to_wui(FloatType a, uint_fast8_t mode) ``` Note: This is currently causing our SST Daily tests to fail, and our compiler tests to fail.	2024-05-21 10:56:41 -07:00
Harshil Patel	33cebe9376	dev: add reset wrap mode to mouse.cc (#1149 ) This change fixes #1148 I have only added an acknowledged return, as we dont ahve remote and wrap mode so it can only be in stream mode. Change-Id: I1882042d873ff0e9465c9491238554c8fbb9aa76	2024-05-21 10:55:03 -07:00
Robert Hauser	688f8fb03b	arch-riscv: add exception code to DPRINTFS msg (#1153 ) Change-Id: Ib5d1dc991f18256ec634c604c776629ea31317a9	2024-05-21 09:59:25 -07:00
Yu-Cheng Chang	5e20438c1c	arch-riscv: Fix GDB connection failed after #1099 (#1152 ) GDB connection failed after the PR[1] changed the index of CSR_FCSR to MISCREG_FCSR itself. It cause the out of bound error. [1]: https://github.com/gem5/gem5/pull/1099 gem5 issue: https://github.com/gem5/gem5/issues/1151 Change-Id: I402febe5a3a9addf3d4821ad716ade14e227d5d7	2024-05-21 09:58:15 -07:00
Harshil Patel	0824d7f2cd	Revert "cpu-kvm: Support perf counters on hybrid host architectures" (#1127 ) Reverts gem5/gem5#1065 Reverting this change because this PR breaks X86 kvm as mentioned in the issue #1126.	2024-05-21 08:14:10 -07:00
Giacomo Travaglini	6f4ba0b422	arch-arm: Add missing outer-shareable TLBIs to the list (#1147 ) Those were not part of the performTlbi switch and simulation was therefore panicking when they were encountered Change-Id: Ifbe0b89e45539df4abc147ac5970b0caf0d9dfdc Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-05-20 19:24:45 -07:00
Chong-Teng Wang	13924336b1	arch-riscv: Fix viota instruction (#1137 ) This commit fixes and refactors the implementation of viota. It also overrides the generateDisassembly function in viota's macro/micro to correctly print out the instruction when tacing/debugging. For example, it changes from: viota_m vd, vd, vs2, v0.t to: viota_m vd, vs2, v0.t	2024-05-20 12:19:22 -07:00
Matthew Poremba	82318e85af	arch-x86: Improve KVM set XCR (#1138 ) This adds two failsafes which may cause a panic on some machines. First, check the host machine has the KVM XCR capability before calling getXCRs or setXCRs. Second, ensure the x87 bit, which must always be one, will always return at least one by modifying the return value in readMiscReg. Change-Id: I5e778acc926a47443ef6cef29fabd84eb69bb9ba	2024-05-20 10:22:48 -07:00
Matthew Poremba	b91c9be102	arch-vega: Load/stores commonly used with 16b MFMA This implements some missing loads and store that are commonly used in applications with MFMA instructions to load 16-bit data types into specific register locations: DS_READ_U16_D16, DS_READ_U16_D16_HI, BUFFER_LOAD_SHORT_D16, BUFFER_LOAD_SHORT_D16_HI. Change-Id: Ie22d81ef010328f4541553a9a674764dc16a9f4d	2024-05-20 09:29:46 -05:00
Matthew Poremba	a4f0d9e6be	arch-vega: Implement v_mfma_f32_32x32x8_bf16 Implement a bfloat16 MFMA. This was tested with PyTorch using dtype=torch.bfloat16. Change-Id: I35b4e60e71477553a93020ef0ee31d1bcae9ca5d	2024-05-20 09:28:58 -05:00
Matthew Poremba	10f8fdcd14	arch-vega: Unit test for MXFP types Add a unit test for the MXFP types (bf16, fp16, fp8, bf8). These types are not currently operated on directly. Instead the are cast to float values and then arithmetic is performed. As a result, the unit test simply checks that when we convert a value from MXFP type to float and back that the values of the MXFP type match. Exact values are used to avoid discrepancies with rounding. Can be run using scons build/VEGA_X86/unittests.opt . Change-Id: I596e9368eb929d239dd2d917e3abd7927b15b71e	2024-05-20 09:28:58 -05:00
Matthew Poremba	de11daec5f	arch-vega: Implement F32 <-> F16 conversions These instructions are used in some of the F16 MFMA example applications to convert to/from floating point types. Change-Id: I7426ea663ce11a39fe8c60c8006d8cca11cfaf07	2024-05-20 09:28:58 -05:00
Matthew Poremba	a062229ac3	arch-vega: Implement v_mov_b64 This instruction is new in MI300 and is used in some of the example applications used to test MFMAs. Change-Id: I739f8ab2be6a93ee3b6bdc4120d0117724edb0d4	2024-05-20 09:27:12 -05:00
Matthew Poremba	91955ae879	arch-vega: Decodings for all MFMA/SMFMACs up to MI300 This adds the decodings for all of the matrix fused multiply add (MFMA) and sparse matrix fused multiply accumulate (SMFMAC) instructions up to and including MI300. This does not yet provide the implementation for these instructions, however it is easier and less tedious to add them in bulk rather that one at a time. Change-Id: I5acd23ca8a26bdec843bead545d1f8820ad95b41	2024-05-20 09:27:12 -05:00
Matthew Poremba	ce578c8831	arch-vega: MFMA templates for MXFP and INT8 types The microscaling formats (MXFP) and INT8 types require additional size checks which are not needed for the current MFMA template. The size check is done using a constexpr method exclusive to the MXFP type, therefore create a special class for MXFP types. This is preferrable to attempting to shoehorn into the existing template as it helps with readability. Similar, INT8 requires a size check to determine number of elements per VGPR, but it not an MXFP type. Create a special template for that as well. This additionally implements all of the MFMA types which have test cases in the amd-lab-notes repository (https://github.com/amd/amd-lab-notes/). The implementations were tested using the applications in the matrix-cores subfolder and achieve L2 norms equivalent or better than MI200 hardware. Change-Id: Ia5ae89387149928905e7bcd25302ed3d1df6af38	2024-05-20 09:27:12 -05:00
Matthew Poremba	994c5ad1cc	arch-vega: Add PackedReg helper class This class can be used to load multiple operand dwords into an array and then select bits from the span of that array. It handles cases where the bits span two dwords (e.g., you have four dwords for a 128-bit value and want to select bits 35:30) and cases where multiple values < 32-bits are packed into a single dword (e.g., two bf16 values). This is most useful for packed arrays and instructions which have more than two dwords. Beyond two dwords, the operator[] overload of VectorOperand is not available requiring additional logic to select from an operand. This helper class handles that additional logic itself. Change-Id: I74856d0f312f7549b3b6c405ab71eb2b174c70ac	2024-05-20 09:27:12 -05:00
Matthew Poremba	2bb62a05e1	arch-vega: Implement v_cvt_pk_fp8_f32 This instruction serves as a test for the MXFP8 type. Change-Id: I2ce30bf7f3a3ecc850a445aebdf971c37c39a79e	2024-05-20 09:27:12 -05:00
Matthew Poremba	d420a0a1e7	arch-vega: Add OCP microscaling formats The open compute project (OCP) microscaling formats (MX) are used in the GPU model. The specification is available at [1]. This implements a C++ version of MXFP formats with many constraints that conform to the specification. Actually arithmetic is not performed directly on the MXFP types. They are rather converted to fp32 and the computation is performed. For most of these types this is acceptable for the GPU model as there are no instruction which directly perform arithmetic on them. For example, the DOT/MFMA instructions operating may first convert to FP32 and then perform arithmetic. Change-Id: I7235722627f7f66c291792b5dbf9e3ea2f67883e	2024-05-20 09:27:12 -05:00
Marco Kurzynski	d5a734c252	arch-vega: Template MFMA instructions templated - v_mfma_f64_16x16x4f64 added support for - v_mfma_f32_32x32x2f32 - v_mfma_f32_4x4x1_16b_f32 - v_mfma_f32_16x16x4f32 [formula for gprs needed](https://github.com/ROCm/amd_matrix_instruction_calculator) [formulas for register layouts and lanes used in computation](https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf) Change-Id: I15d6c0a5865d58323ae8dbcb3f6dcb701a9ab3c7	2024-05-20 09:27:12 -05:00
Bobby R. Bruce	8b30d848e9	scons: Setup scons for gem5 only supporting gcc >=10 Change-Id: I66f83498a38def3d00d1c9e981aa90706ee20bbb	2024-05-20 07:05:08 -07:00
Bobby R. Bruce	ba1c22f143	misc,tests: Remove gcc-8 from compiler tests GCC Version 10 is no longer supported by the gem5 project. Change-Id: If657654299c1a018764d5f92e814ed5cd18c50f0	2024-05-20 06:27:45 -07:00
Bobby R. Bruce	d011fe47a9	util-docker: Upgrade sst-env docker image to use GCC 10 Previously was GCC 9 which is no longer supported by gem5. Change-Id: Ife715446e3f1179d19db544953fbd6ded25f5b4d	2024-05-20 06:24:14 -07:00
Bobby R. Bruce	321c34d0bd	util-docker: Remove GCC-8 from docker-compose.yaml Change-Id: Ia1aba03412b138b05b569b08a146a2123f7142e4	2024-05-20 06:23:28 -07:00
Matthew Poremba	2b3beb92ff	dev-amdgpu,gpu-compute,configs: MI300X (#1141 ) Release of MI300X simulation capability: - Implements the required MI300X features over MI200 (currently only architecture flat scratch). - Make the gpu-compute model use MI200 features when MI300X / gfx942 is configured. - Fix up the scratch_ instructions which are seem to be preferred in debug hipcc builds over buffer_. - Add mi300.py config similar to mi200.py. This config can optionally use resources instead of command line args.	2024-05-17 09:26:04 -07:00
Alexander Richardson	716fe6d31d	arch-arm: Fix 32-bit semihosting ABI (#1142 ) It appears we have been trying to read 64-bit arguments for ARM32 since `695583709b`. I noticed that SYS_OPEN was trying to read a really long string as the pathname argument and it turned out it was reading from the wrong stack offset. With this change I can successfully run some of the semihosting tests for ARM32. Change-Id: Ie154052dac4211993fb6c4c99d93990123c2eacf	2024-05-16 10:28:45 -07:00
Alexander Richardson	6b34765d5d	arch-generic: Avoid out-of-memory errors for bad semihosting calls (#1143 ) In BaseSemihosting::readString() we were using the len argument to allocate a std::vector without checking whether the value makes any sense. This resulted in a std::bad_alloc exception being raised prior to https://github.com/gem5/gem5/pull/1142 for my semihosting tests. This commit prevents semihosting from reading more than 64K for string arguments which should be more than sufficient for any valid code. Change-Id: I059669016ee2c5721fedb914595d0494f6cfd4cd	2024-05-16 10:28:10 -07:00
Chong-Teng Wang	adb177dab6	arch-riscv: Fix vrgather instruction (#1134 ) This commit fixes the implementation of vrgather instruction based on rvv 1.0. In section 16.4. Vector Register Gather Instructions, > Vector-scalar and vector-immediate forms of the register gather are also provided. These read one element from the source vector at the given index, and write this value to the active elements of the destination vector register. The index value in the scalar register and the immediate, zero-extended to XLEN bits, are treated as unsigned integers. If XLEN > SEW, the index value is not truncated to SEW bits. The fix zero-extends the index value in the scalar register and the immediate.	2024-05-16 10:12:35 -07:00
Hossam ElAtali	97a87a7c84	util: Fixed gem5img.py script (#990 ) Made the script more robust to different names. Co-authored-by: Hossam ElAtali <hossam.elatali@uwaterloo.ca>	2024-05-16 10:09:27 -07:00
Yu-Cheng Chang	321bd07163	cpu: Don't change to suspend if the thread status is halted (#1039 ) In our gem5 model, there are four types represent thread context: Active, Suspend, Halting and Halted `5641c5e464/src/cpu/thread_context.hh (L99-L117)` When initializing the gem5 instance, all of the thread contexts are set Halted. The status of thread context will not be active until the Workload initializes start up, except the StubWorkload. So if the user uses the StubWorkload, and the CPU is connected with the model_reset port. The thread context of the CPU will be activated possibly. The following is the steps of activating thread context of the CPU without Workload[1] initialization or lower model_reset port[2]. 1. Raise the model_reset port (Change the state from Halted to Suspend) `5641c5e464/src/cpu/base.cc (L671-L673)` 2. Post the interrupt to CPU (Change the state from Suspend to Active) `5641c5e464/src/cpu/base.cc (L231-L239)` Implementation of wakeup SimpleCPU: `5641c5e464/src/cpu/simple/base.cc (L251-L259)` MinorCPU: `5641c5e464/src/cpu/minor/cpu.cc (L143-L151)` O3CPU: `5641c5e464/src/cpu/o3/cpu.cc (L1337-L1346)` This CL fixed the issue when raising the model reset port to CPU(let CPU sleep) if the CPU is not activated by workload. If the CPU status is halted, it's should not change to Suspend to avoid wake up Reference The model_reset is introduced in the CL: https://gem5-review.googlesource.com/c/public/gem5/+/67574/4 [1] Activate by workload (ARM example): `5641c5e464/src/arch/arm/fs_workload.cc (L101-L114)` [2] Lower the model_reset: `5641c5e464/src/cpu/base.cc (L191-L192)` `5641c5e464/src/cpu/base.cc (L674-L685)` Change-Id: I5bfc0b7491d14369fff77b98b71c0ac763fb7c42	2024-05-16 10:02:53 -07:00

... 3 4 5 6 7 ...

21852 Commits