derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matthew Poremba	4d336c0636	arch-vega: Implement buffer_atomic_cmpswap (#439 ) This is a standard compare and swap but implemented on vector memory buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with MUBUF's special address calculation). This was tested using a Tensile kernel, a backend for rocBLAS, which is used by PyTorch and Tensorflow. Prior to this patch both ML frameworks crashed. With this patch they both make forward progress. Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06	2023-10-12 07:33:40 -07:00
Matthew Poremba	4b7f25fcb6	arch-vega: Ignore s_setprio instruction instead of panic This instruction is used by ML frameworks to prioritize certain wavefronts. Since gem5 does not have any support for wavefront scheduling based on priority (besides wavefront age), we ignore this instruction and warn_once rather than calling panic. Since hardware can override this priority anyways, we can be sure that ignoring the value will not inhibit forward progress resulting in application hangs. Change-Id: Ic5eef14f9685dd2b316c5cf76078bb78d5bfe3cc	2023-10-11 15:55:16 -05:00
Matthew Poremba	4b85a1710e	arch-vega: Implement buffer_atomic_cmpswap This is a standard compare and swap but implemented on vector memory buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with MUBUF's special address calculation). This was tested using a Tensile kernel, a backend for rocBLAS, which is used by PyTorch and Tensorflow. Prior to this patch both ML frameworks crashed. With this patch they both make forward progress. Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06	2023-10-11 15:42:50 -05:00
Bobby R. Bruce	70b6b53e54	misc,python: Add `pyupgrade` to pre-commit (#424 ) This adds the [pyupgrade](https://github.com/asottile/pyupgrade) hook to pre-commit. This hook automatically upgrades the syntax to the recommended standards for the newer version of the language.	2023-10-11 09:07:09 -07:00
Matthew Poremba	da11427ba6	gpu-compute: Update tokens for flat global/scratch (#408 ) Memory instructions acquire coalescer tokens in the schedule stage. Currently this is only done for buffer and flat instructions, but not flat global or flat scratch. This change now acquires tokens for flat global and flat scratch instructions. This provides back-pressure to the CUs and helps to avoid deadlocks in Ruby. The change also handles returning tokens for buffer, flat global, and flat scratch instructions. This was previously only being done for normal flat instructions leading to deadlocks in some applications when the tokens were exhausted. To simplify the logic, added a needsToken() method to GPUDynInst which return if the instruction is buffer or any flat segment. The waitcnts were also incorrect for flat global and flat scratch. We should always decrement vmem and exp count for stores and only normal flat instructions should decrement lgkm. Currently vmem/exp are not decremented for flat global and flat scratch which can lead to deadlock. This change set fixes this by always decrementing vmem/exp and lgkm only for normal flat instructions. Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5	2023-10-11 09:00:10 -07:00
Andreas Sandberg	891250192d	arch-arm: Implement FEAT_TCR2 and FEAT_SCTLR2 (#416 ) This is simply adding the new Armv8.9 registers defined in the related features: - FEAT_TCR2 - FEAT_SCTLR2	2023-10-11 10:14:31 +01:00
Bobby R. Bruce	298119e402	misc,python: Run `pre-commit run --all-files` Applies the `pyupgrade` hook to all files in the repo. Change-Id: I9879c634a65c5fcaa9567c63bc5977ff97d5d3bf	2023-10-10 21:47:07 -07:00
Bobby R. Bruce	ddf6cb88e4	misc: Run `pre-commit run --all-files` This is reflect the updates made to black when running `pre-commit autoupdate`. Change-Id: Ifb7fea117f354c7f02f26926a5afdf7d67bc5919	2023-10-10 14:01:58 -07:00
Yu-Cheng Chang	141b06d335	arch,arch-riscv: Remove setRegOperand in VecRegOperand (#341 ) The RISC-V vector instructions still work without setRegOperand. We should fix the register statistic issue by https://github.com/gem5/gem5/pull/360 to avoid duplicate statistic register write count Change-Id: Ib6a52935e00c3e557b366abfcf60450dca05614d	2023-10-10 08:00:10 -07:00
Matthew Poremba	9f4d334644	gpu-compute: Update tokens for flat global/scratch Memory instructions acquire coalescer tokens in the schedule stage. Currently this is only done for buffer and flat instructions, but not flat global or flat scratch. This change now acquires tokens for flat global and flat scratch instructions. This provides back-pressure to the CUs and helps to avoid deadlocks in Ruby. The change also handles returning tokens for buffer, flat global, and flat scratch instructions. This was previously only being done for normal flat instructions leading to deadlocks in some applications when the tokens were exhausted. To simplify the logic, added a needsToken() method to GPUDynInst which return if the instruction is buffer or any flat segment. The waitcnts were also incorrect for flat global and flat scratch. We should always decrement vmem and exp count for stores and only normal flat instructions should decrement lgkm. Currently vmem/exp are not decremented for flat global and flat scratch which can lead to deadlock. This change set fixes this by always decrementing vmem/exp and lgkm only for normal flat instructions. Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5	2023-10-10 09:48:16 -05:00
Giacomo Travaglini	8acf49b6fa	arch-arm: Revamp takeInt to take VHE/SEL2 into account The new implementation matches the table in the ARM Architecture Reference Manual (version DDI 0487J.a, section D1.3.6, table R_SXLWJ) It takes into consideration features like FEAT_SEL2 (scr.eel2 bit) and FEAT_VHE (hcr.e2h bit) which affect the masking of interrupts under certain circumstances Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Change-Id: I07ebd8d859651475bd32fd201eea0f4e64a7dd5f	2023-10-10 09:46:47 +01:00
Giacomo Travaglini	e412ddddbd	arch-arm: Split takeInt into AArch64/32 versions We pay a small duplication cost but we make the code more readable and we enable further modifications to the AArch64 code without forcing the same code on the AArch32 method Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Change-Id: I1efa33cf19f91094fd33bd48b6a0a57d8df8f89f	2023-10-10 09:45:59 +01:00
Bobby R. Bruce	bbe05b0cba	tests,misc: Fix compilation tests failures (#400 ) Exposed in our failing compiler tests: https://github.com/gem5/gem5/actions/runs/6348223508, this PR: * Adds missing overrides to `PCState`'s `set` function. * Removes `std::binary_function` from DramPower (it was deprecated in CPP-11 and officially removed in CPP-17).	2023-10-09 11:20:52 -07:00
Giacomo Travaglini	eac5a8b215	arch-arm: Implement FEAT_TCR2 Change-Id: I0396f5938c09b68fcc3303a6fdda1e4dde290869 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-09 17:19:57 +01:00
Giacomo Travaglini	49cbb24351	arch-arm: Implement FEAT_SCTLR2 Change-Id: Ifb8c8dc1729cc21007842b950273fe38129d9539 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-09 17:12:53 +01:00
Giacomo Travaglini	c4c5d2e172	arch-arm: Implement ID_AA64MMFR3_EL1 register Change-Id: If8c37bdccf35a070870900c06dc4640348f0f063 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-09 17:12:53 +01:00
Andreas Sandberg	ec7921305b	arch-arm: Implement FEAT_TLBIRANGE extension (#414 )	2023-10-09 17:09:31 +01:00
Giacomo Travaglini	39fdfaea5a	arch-arm: Implement FEAT_TLBIRANGE Change-Id: I7eb020573420e49a8a54e1fc7a89eb6e2236dacb Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-09 13:59:47 +01:00
Giacomo Travaglini	6b698630a2	arch-arm: Check VMID in secure mode as well (NS=0) This is still trying to completely remove any artifact which implies virtualization is only supported in non-secure mode (NS=1) Change-Id: I83fed1c33cc745ecdf3c5ad60f4f356f3c58aad5 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-09 13:56:57 +01:00
Giacomo Travaglini	a8efded644	arch-arm: Include Granule Size in a TLB entry This info can be used during TLB invalidation Change-Id: I81247e40b11745f0207178b52c47845ca1b92870 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-10-09 13:56:57 +01:00
Nicholas Mosier	7a0e84d853	cpu-kvm, arch-x86: flush TLB after syscalls Modified the x86 KVM-in-SE syscall handler to flush the TLB following each syscall, in case the page table has been modified. This is done by reloading the value in %cr3. Doing this requires an intermediate GPR, which we store in a new scratch buffer following the syscall code at address `syscallDataBuf`. GitHub issue: https://github.com/gem5/gem5/issues/409 Change-Id: Ibc20018c97ebb1794fa31a0c71e0857d661c7c9d	2023-10-06 20:41:59 +00:00
Giacomo Travaglini	ae104cc431	mem-ruby: Add new feature far atomics in CHI (#177 ) Added a new feature to CHI protocol (in collaboration with @tiagormk). Here is the Jira Ticket [https://gem5.atlassian.net/browse/GEM5-1326](https://gem5.atlassian.net/browse/GEM5-1326 ). As described in CHI specs, far atomic transactions enable remote execution of Atomic Memory Operations. This pull request incorporates several changes: * Fix Arm ISA definition of Swap instructions. These instructions should return an operand, so their ISA definition should be Return Operation. * Enable AMOs in Ruby Mem Test to verify that AMOs work * Enable near and far AMO in the Cache Controler of CHI Three configuration parameters have been used to tune this behavior: * policy_type: sets the atomic policy to one of the described in [our paper](https://dl.acm.org/doi/10.1145/3579371.3589065) * atomic_op_latency: simulates the AMO ALU operation latency * comp_anr: configures the Atomic No return transaction to split CompDBIDResp into two different messages DBIDResp and Comp	2023-10-06 10:09:58 +01:00
Bobby R. Bruce	761f6b73a0	arch-arm: Implement FEAT_FGT (#334 ) This PR implements FEAT_FGT (Fine Grain Traps)	2023-10-05 10:44:26 -07:00
Bobby R. Bruce	39c7e7d1ed	arch: Adding missing `override` to `PCState.set` As highlighed in this failing compiler test: https://github.com/gem5/gem5/actions/runs/6348223508/job/17389057995 Clang was failing when compiling "build/ALL/gem5.opt" due missing overrides in `PCState`'s "set" function. This was observed in Clang-14 and, stangely, Clang-8. Change-Id: I240c1087e8875fd07630e467e7452c62a5d14d5b	2023-10-05 10:18:19 -07:00
Roger Chang	ea3ee880aa	arch-riscv: Implement Zcb instructions Added the following instructions: c.lbu c.lh c.lhu c.sb c.sh c.zext.b c.sext.b c.zext.h c.sext.h c.zext.w c.not c.mul Reference: https://github.com/riscv/riscv-code-size-reduction Change-Id: Ib04820bf5591b365a3bfbbd8b90655a8a1d844cf	2023-10-05 18:46:35 +08:00
Víctor Soria	12dada2dc5	arch-arm: Correct return operand in swap instructions Swap instructions are configured as non returning AMO operations. This is wrong because they return the previous value stored in the target memory position Change-Id: I84d75a571a8eaeaee0dbfac344f7b34c72b47d53	2023-10-04 19:11:01 +02:00
Andreas Sandberg	7806eaad51	arch: Add instruction size and PC set methods (#357 ) Add the instruction size of a static instruction. x86 and arm decoders add now the instruction size to the macro instruction. However, microops are still handled by the fetch stage which is not nice. Furthermore, we add a set method to the PC state. It allows setting a PC state to acertain address. Both methods are required for the decoupled front-end. Change-Id: I311fe3f637e867c42dee7781f5373ea2e69e2072	2023-10-04 10:49:30 +01:00
David Schall	7d2e1ee789	arch: Add instruction size and PC set methods Adds the instruction size to all static instruction. x86, arm and RISC-V decoders add the instruction size to every decoded macro instruction. As microops should reflect the size of the their parent macroop the set method is overwritten to pass the size to all microops. Furthermore, we add a set method to the PC state. It allows setting a PC state to a certain address. Both methods are required for the decoupled front-end. Change-Id: I311fe3f637e867c42dee7781f5373ea2e69e2072 Signed-off-by: David Schall <david.schall@ed.ac.uk>	2023-10-02 20:10:57 +00:00
Hoa Nguyen	da72590c19	arch-riscv: FS bits -> DIRTY for more floating point loads The affected instructions are, - c.flw - c.flwsp - flh - flw This change is related to [1] [2], which also aim to change the FS bits to DIRTY when the state of any floating point register might change. [1] https://gem5-review.googlesource.com/c/public/gem5/+/65272 [2] https://github.com/gem5/gem5/pull/370 Change-Id: I098e1b1812fb352bd5d3614ff5d3547e58903b65 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2023-10-01 23:12:25 -07:00
Hoa Nguyen	6640447c1e	arch-riscv: Update FS bits when doing floating point loads This problem is similar to the problem described in [1]. This problem produces symptoms as described in [2]. In short, the Linux kernel relies on the CSR_STATUS's FS bits to decide whether to save the floating point registers. If the FS bits are set to DIRTY, the floating point registers will be saved during context switching / task switching. Currently, with the patch in [1], we only change the FS bits upon every floating arithmetic instruction. However, since floating load instructions also mutate the state of floating point registers, the FS bits should be updated to DIRTY. The problem in [2] arose when the program populates the content of one floating register to an array by repeatedly using `fld fa5, EA`. A context switch occured upon a page fault, and while handling that page fault, the kernel might have to handle an interrupt. This caused the kernel to task switch between handling page fault and handling interrupt. This caused __switch_to() to be called, which will save the floating point registers only if the SD (indirectly set by FS) bits are set to DIRTY, while restoring the floating point registers to the switch-to task [3]. This caused the floating point registers to be zeroed out when it was restored as it was never saved before. [1] https://gem5-review.googlesource.com/c/public/gem5/+/65272 [2] https://github.com/gem5/gem5/issues/349 [3] https://github.com/torvalds/linux/blob/v6.5/arch/riscv/include/asm/switch_to.h#L56 Change-Id: Ia5656da5a589a8e29fb699d2ee12885b8f3fa2d2 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2023-09-28 19:14:29 -07:00
Bobby R. Bruce	49a1d48264	arch-x86: properly initialize the auxv platform string (#347 ) The auxv platform string was not copied to the same location that was pointed to by the value of AT_PLATFORM; instead, it was copied over the auxv random buffer. This patch fixes this by copying the auxv platform string to the right offset in the initial program stack. GitHub issue: https://github.com/gem5/gem5/issues/346	2023-09-27 14:31:19 -07:00
Bobby R. Bruce	4638434b97	arch-x86: make popx87 micro-op actually pop st(0) (#345 ) The popx87 micro-op did not in fact pop the st(0) floating-point register off the stack; it acted as a no-op. This patch fixes the bug by passing the spm=1 argument to PopX87's superclass to indicate the floating-point stack pointer should be incremented. GitHub issue: https://github.com/gem5/gem5/issues/344	2023-09-27 14:31:00 -07:00
Jason Lowe-Power	010ac43369	arch-riscv: Make RISC-V decodeInst overridable (#350 ) The change will allow developers to implement and decode their non-standard instructions to the CPU models	2023-09-25 06:43:56 -07:00
Giacomo Travaglini	df60b0f5c9	arch-arm: Implement FEAT_FGT Change-Id: I89391f17f353ab6ce555d65783977c1f30f64fc5 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-09-22 16:33:58 +01:00
Giacomo Travaglini	37b6824c4c	arch-arm: Fix disassembly for NZCV read/writes At the moment the instruction is disassembled as an integer operation: msrNZCV x547, x0 Instead of msr nzcv x0 Change-Id: I3f6576dccbe86db401c73747750ca3cfdf4055d5 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-09-22 16:33:58 +01:00
Roger Chang	d55f8f2716	arch: Enable customized decoder class name Developers can make the own ISADesc action in the SConscript with their decoder class name. Change-Id: I011cf059642e178913e1f62df4e5c02401cc132e	2023-09-22 15:45:56 +08:00
Roger Chang	5b41112e03	arch-riscv: Make RISC-V decodeInst overridable The change will allow developers to implement and decode their non-standard instructions to the CPU models Bug: 289467440 Test: None Change-Id: I67f4abc71596f819c1265e325784f51c8e9bb359	2023-09-22 11:38:22 +08:00
Nicholas Mosier	7298ebd49b	arch-x86: properly initialize the auxv platform string The auxv platform string was not copied to the same location that was pointed to by the value of AT_PLATFORM; instead, it was copied over the auxv random buffer. This patch fixes this by copying the auxv platform string to the right offset in the initial program stack. GitHub issue: https://github.com/gem5/gem5/issues/346 Change-Id: Ied4b660d5fc444a94acb97b799be0a3722438b5e	2023-09-21 05:16:17 +00:00
Nicholas Mosier	5697bf26a8	arch-x86: make popx87 micro-op actually pop st(0) The popx87 micro-op did not in fact pop the st(0) floating-point register off the stack; it acted as a no-op. This patch fixes the bug by passing the spm=1 argument to PopX87's superclass to indicate the floating-point stack pointer should be incremented. GitHub issue: https://github.com/gem5/gem5/issues/344 Change-Id: I6e731882b6bcf8f0e06ebd2f66f673bf9da80717	2023-09-21 04:29:05 +00:00
Bobby R. Bruce	958eda6961	arch-riscv: Fix inst flags for jal and jalr (#325 ) The jal and jalr share the same instruction format JumpConstructor, which sets the IsCall and IsReturn flags by the register ID. However, it may cause wrong instruction flags set for jal because the section "handle the 'Jalr' instruction" misses the opcode checking. The PR fix the issue to ensure the IsReturn can be only set in Jalr.	2023-09-20 16:25:21 -07:00
Roger Chang	70c1d762c7	arch-riscv: Fix inst flags for jal and jalr The jal and jalr share the same instruction format JumpConstructor, which sets the IsCall and IsReturn flags by the register ID. However, it may cause wrong instruction flags set for jal because the section "handle the 'Jalr' instruction" misses the opcode checking. The PR fix the issue to ensure the IsReturn can be only set in Jalr. Change-Id: I9ad867a389256f9253988552e6567d2b505a6901	2023-09-20 14:27:23 +08:00
Nicholas Mosier	741a901d8d	arch-x86: fix negative overflow check bug in PACK micro-op The implementation of the x86 PACK micro-op had a logical bug that caused the `PACKSSWB` and `PACKSSDW` instructions to produce incorrect results. Specifically, due to a signedness error, the overflow check for negative integers being packed always evaluated to true, resulting in all negative integers being packed as -1 in the output. This patch fixes the signedness error that causes the bug. GitHub issue: https://github.com/gem5/gem5/issues/331 Change-Id: I44b7328a8ce31742a3c0dfaebd747f81751e8851	2023-09-20 05:09:32 +00:00
Nicholas Mosier	2178e26bf2	arch-x86: initialize and correct bitwidth for FPU tag word The x87 FPU tag word (FTW) was not explicitly initialized in {X86_64,i386}Process::initState(), resulting in holding an initial value of zero, resulting in an invalid x87 FPU state. This commit initializes FTW to 0xFFFF, indicating the FPU is empty at program start during syscall emulation. The 16-bit FTW register was also incorrectly masked down to 8-bits in X86ISA::ISA::setMiscRegNoEffect(), leading to an invalid X87 FPU state that later caused crashes in the X86KvmCPU. This commit corrects the bitwidth of the mask to 16. GitHub issue: https://github.com/gem5/gem5/issues/303 Change-Id: I97892d707998a87c1ff8546e08c15fede7eed66f	2023-09-12 15:39:29 +00:00
Bobby R. Bruce	d67a6603c1	cpu-kvm: properly set x86 xsave header on gem5->KVM transition (#298 ) If the XSAVE KVM capability is available (KVM_CAP_XSAVE), the X86KvmCPU will try to set the x87 FPU + SSE state using KVM_SET_XSAVE, which expects a buffer (struct kvm_xsave) in XSAVE area format (Vol. 1, Sec. 13.4 of Intel x86 SDM). The original implementation of `X86KvmCPU::updateKvmStateFPUXSave()`, however, improperly sets the xsave header, which contains a bitmap of state components present in the xsave area. This patch defines `XSaveHeader` structure to model the xsave header, which is expected directly following the legacy FPU region (defined in the `FXSave` structure) in the xsave area. It then sets two bist in the xsave header to indicate the presence of x86 FPU and SSE state components. GitHub issue: https://github.com/gem5/gem5/issues/296	2023-09-12 08:32:20 -07:00
Roger Chang	def89745bc	arch-riscv: Allow Minor and O3 CPU execute RVV Change-Id: I4780b42c25d349806254b5053fb0da3b6993ca2f	2023-09-12 13:56:22 +08:00
Roger Chang	0f54cb0593	arch-riscv: Remove check vconf done implementation Change-Id: If633cef209390d0500c4c2c5741d56158ef26c00	2023-09-12 13:56:22 +08:00
Roger Chang	31b95987da	arch-riscv: Change the instruction family to jump like The method that get the vl, vtype from PCState in the next changes Change-Id: I022b47b7a96572f6434eed30dd9f7caa79854c31	2023-09-12 13:56:22 +08:00
Roger Chang	282765234b	arch-riscv: Implement the branchTarget for vsetvl Change-Id: I10bf6be736ce2b99323ace410bff1d8e1e2a4123	2023-09-12 13:56:22 +08:00
Roger Chang	a3aaad2ecd	arch-riscv: Refactor the execution part of vsetvl Change-Id: Ie0d9671242481a85bb0fe5728748b16c3ef62592	2023-09-12 13:56:21 +08:00
Roger Chang	1bde42760f	arch-riscv: Get vl, vtype and vlenb from PCState Change-Id: I0ded57a3dc2db6fcc7121f147bcaf6d8a8873f6a	2023-09-12 13:56:21 +08:00

1 2 3 4 5 ...

5680 Commits