Commit Graph

5680 Commits

Author SHA1 Message Date
Matthew Poremba
4d336c0636 arch-vega: Implement buffer_atomic_cmpswap (#439)
This is a standard compare and swap but implemented on vector memory
buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with
MUBUF's special address calculation).

This was tested using a Tensile kernel, a backend for rocBLAS, which is
used by PyTorch and Tensorflow. Prior to this patch both ML frameworks
crashed. With this patch they both make forward progress.

Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06
2023-10-12 07:33:40 -07:00
Matthew Poremba
4b7f25fcb6 arch-vega: Ignore s_setprio instruction instead of panic
This instruction is used by ML frameworks to prioritize certain
wavefronts. Since gem5 does not have any support for wavefront
scheduling based on priority (besides wavefront age), we ignore this
instruction and warn_once rather than calling panic. Since hardware can
override this priority anyways, we can be sure that ignoring the value
will not inhibit forward progress resulting in application hangs.

Change-Id: Ic5eef14f9685dd2b316c5cf76078bb78d5bfe3cc
2023-10-11 15:55:16 -05:00
Matthew Poremba
4b85a1710e arch-vega: Implement buffer_atomic_cmpswap
This is a standard compare and swap but implemented on vector memory
buffer instructions (i.e., it is the same as FLAT_ATOMIC_CMPSWAP with
MUBUF's special address calculation).

This was tested using a Tensile kernel, a backend for rocBLAS, which is
used by PyTorch and Tensorflow. Prior to this patch both ML frameworks
crashed. With this patch they both make forward progress.

Change-Id: Ie76447a72d210f81624e01e1fa374e41c2c21e06
2023-10-11 15:42:50 -05:00
Bobby R. Bruce
70b6b53e54 misc,python: Add pyupgrade to pre-commit (#424)
This adds the [pyupgrade](https://github.com/asottile/pyupgrade) hook to
pre-commit.

This hook automatically upgrades the syntax to the recommended standards
for the newer version of the language.
2023-10-11 09:07:09 -07:00
Matthew Poremba
da11427ba6 gpu-compute: Update tokens for flat global/scratch (#408)
Memory instructions acquire coalescer tokens in the schedule stage.
Currently this is only done for buffer and flat instructions, but not
flat global or flat scratch. This change now acquires tokens for flat
global and flat scratch instructions. This provides back-pressure to the
CUs and helps to avoid deadlocks in Ruby.

The change also handles returning tokens for buffer, flat global, and
flat scratch instructions. This was previously only being done for
normal flat instructions leading to deadlocks in some applications when
the tokens were exhausted.

To simplify the logic, added a needsToken() method to GPUDynInst which
return if the instruction is buffer or any flat segment.

The waitcnts were also incorrect for flat global and flat scratch. We
should always decrement vmem and exp count for stores and only normal
flat instructions should decrement lgkm. Currently vmem/exp are not
decremented for flat global and flat scratch which can lead to deadlock.
This change set fixes this by always decrementing vmem/exp and lgkm only
for normal flat instructions.

Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5
2023-10-11 09:00:10 -07:00
Andreas Sandberg
891250192d arch-arm: Implement FEAT_TCR2 and FEAT_SCTLR2 (#416)
This is simply adding the new Armv8.9 registers defined in the related
features:

- FEAT_TCR2
- FEAT_SCTLR2
2023-10-11 10:14:31 +01:00
Bobby R. Bruce
298119e402 misc,python: Run pre-commit run --all-files
Applies the `pyupgrade` hook to all files in the repo.

Change-Id: I9879c634a65c5fcaa9567c63bc5977ff97d5d3bf
2023-10-10 21:47:07 -07:00
Bobby R. Bruce
ddf6cb88e4 misc: Run pre-commit run --all-files
This is reflect the updates made to black when running `pre-commit
autoupdate`.

Change-Id: Ifb7fea117f354c7f02f26926a5afdf7d67bc5919
2023-10-10 14:01:58 -07:00
Yu-Cheng Chang
141b06d335 arch,arch-riscv: Remove setRegOperand in VecRegOperand (#341)
The RISC-V vector instructions still work without setRegOperand.
We should fix the register statistic issue by
https://github.com/gem5/gem5/pull/360 to avoid duplicate statistic
register write count



Change-Id: Ib6a52935e00c3e557b366abfcf60450dca05614d
2023-10-10 08:00:10 -07:00
Matthew Poremba
9f4d334644 gpu-compute: Update tokens for flat global/scratch
Memory instructions acquire coalescer tokens in the schedule stage.
Currently this is only done for buffer and flat instructions, but not
flat global or flat scratch. This change now acquires tokens for flat
global and flat scratch instructions. This provides back-pressure to the
CUs and helps to avoid deadlocks in Ruby.

The change also handles returning tokens for buffer, flat global, and
flat scratch instructions. This was previously only being done for
normal flat instructions leading to deadlocks in some applications when
the tokens were exhausted.

To simplify the logic, added a needsToken() method to GPUDynInst which
return if the instruction is buffer or any flat segment.

The waitcnts were also incorrect for flat global and flat scratch. We
should always decrement vmem and exp count for stores and only normal
flat instructions should decrement lgkm. Currently vmem/exp are not
decremented for flat global and flat scratch which can lead to deadlock.
This change set fixes this by always decrementing vmem/exp and lgkm only
for normal flat instructions.

Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5
2023-10-10 09:48:16 -05:00
Giacomo Travaglini
8acf49b6fa arch-arm: Revamp takeInt to take VHE/SEL2 into account
The new implementation matches the table in the ARM Architecture
Reference Manual (version DDI 0487J.a, section D1.3.6, table R_SXLWJ)

It takes into consideration features like FEAT_SEL2 (scr.eel2 bit) and
FEAT_VHE (hcr.e2h bit) which affect the masking of interrupts under
certain circumstances

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: I07ebd8d859651475bd32fd201eea0f4e64a7dd5f
2023-10-10 09:46:47 +01:00
Giacomo Travaglini
e412ddddbd arch-arm: Split takeInt into AArch64/32 versions
We pay a small duplication cost but we make the code
more readable and we enable further modifications to the
AArch64 code without forcing the same code on the AArch32
method

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: I1efa33cf19f91094fd33bd48b6a0a57d8df8f89f
2023-10-10 09:45:59 +01:00
Bobby R. Bruce
bbe05b0cba tests,misc: Fix compilation tests failures (#400)
Exposed in our failing compiler tests:
https://github.com/gem5/gem5/actions/runs/6348223508, this PR:

* Adds missing overrides to `PCState`'s `set` function.
* Removes `std::binary_function` from DramPower (it was deprecated in
CPP-11 and officially removed in CPP-17).
2023-10-09 11:20:52 -07:00
Giacomo Travaglini
eac5a8b215 arch-arm: Implement FEAT_TCR2
Change-Id: I0396f5938c09b68fcc3303a6fdda1e4dde290869
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-09 17:19:57 +01:00
Giacomo Travaglini
49cbb24351 arch-arm: Implement FEAT_SCTLR2
Change-Id: Ifb8c8dc1729cc21007842b950273fe38129d9539
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-09 17:12:53 +01:00
Giacomo Travaglini
c4c5d2e172 arch-arm: Implement ID_AA64MMFR3_EL1 register
Change-Id: If8c37bdccf35a070870900c06dc4640348f0f063
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-09 17:12:53 +01:00
Andreas Sandberg
ec7921305b arch-arm: Implement FEAT_TLBIRANGE extension (#414) 2023-10-09 17:09:31 +01:00
Giacomo Travaglini
39fdfaea5a arch-arm: Implement FEAT_TLBIRANGE
Change-Id: I7eb020573420e49a8a54e1fc7a89eb6e2236dacb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-09 13:59:47 +01:00
Giacomo Travaglini
6b698630a2 arch-arm: Check VMID in secure mode as well (NS=0)
This is still trying to completely remove any artifact
which implies virtualization is only supported in
non-secure mode (NS=1)

Change-Id: I83fed1c33cc745ecdf3c5ad60f4f356f3c58aad5
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-09 13:56:57 +01:00
Giacomo Travaglini
a8efded644 arch-arm: Include Granule Size in a TLB entry
This info can be used during TLB invalidation

Change-Id: I81247e40b11745f0207178b52c47845ca1b92870
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-09 13:56:57 +01:00
Nicholas Mosier
7a0e84d853 cpu-kvm, arch-x86: flush TLB after syscalls
Modified the x86 KVM-in-SE syscall handler to flush the TLB following
each syscall, in case the page table has been modified. This is done
by reloading the value in %cr3. Doing this requires an intermediate
GPR, which we store in a new scratch buffer following the syscall code
at address `syscallDataBuf`.

GitHub issue: https://github.com/gem5/gem5/issues/409

Change-Id: Ibc20018c97ebb1794fa31a0c71e0857d661c7c9d
2023-10-06 20:41:59 +00:00
Giacomo Travaglini
ae104cc431 mem-ruby: Add new feature far atomics in CHI (#177)
Added a new feature to CHI protocol (in collaboration with @tiagormk).
Here is the Jira Ticket
[https://gem5.atlassian.net/browse/GEM5-1326](https://gem5.atlassian.net/browse/GEM5-1326
). As described in CHI specs, far atomic transactions enable remote
execution of Atomic Memory Operations. This pull request incorporates
several changes:

* Fix Arm ISA definition of Swap instructions. These instructions should
return an operand, so their ISA definition should be Return Operation.
* Enable AMOs in Ruby Mem Test to verify that AMOs work
* Enable near and far AMO in the Cache Controler of CHI

Three configuration parameters have been used to tune this behavior:
* policy_type: sets the atomic policy to one of the described in [our
paper](https://dl.acm.org/doi/10.1145/3579371.3589065)
* atomic_op_latency: simulates the AMO ALU operation latency
* comp_anr: configures the Atomic No return transaction to split
CompDBIDResp into two different messages DBIDResp and Comp
2023-10-06 10:09:58 +01:00
Bobby R. Bruce
761f6b73a0 arch-arm: Implement FEAT_FGT (#334)
This PR implements FEAT_FGT (Fine Grain Traps)
2023-10-05 10:44:26 -07:00
Bobby R. Bruce
39c7e7d1ed arch: Adding missing override to PCState.set
As highlighed in this failing compiler test:
https://github.com/gem5/gem5/actions/runs/6348223508/job/17389057995

Clang was failing when compiling "build/ALL/gem5.opt" due missing
overrides in `PCState`'s "set" function.

This was observed in Clang-14 and, stangely, Clang-8.

Change-Id: I240c1087e8875fd07630e467e7452c62a5d14d5b
2023-10-05 10:18:19 -07:00
Roger Chang
ea3ee880aa arch-riscv: Implement Zcb instructions
Added the following instructions:
c.lbu
c.lh
c.lhu
c.sb
c.sh
c.zext.b
c.sext.b
c.zext.h
c.sext.h
c.zext.w
c.not
c.mul

Reference: https://github.com/riscv/riscv-code-size-reduction
Change-Id: Ib04820bf5591b365a3bfbbd8b90655a8a1d844cf
2023-10-05 18:46:35 +08:00
Víctor Soria
12dada2dc5 arch-arm: Correct return operand in swap instructions
Swap instructions are configured as non returning AMO operations. This is wrong because they
return the previous value stored in the target memory position

Change-Id: I84d75a571a8eaeaee0dbfac344f7b34c72b47d53
2023-10-04 19:11:01 +02:00
Andreas Sandberg
7806eaad51 arch: Add instruction size and PC set methods (#357)
Add the instruction size of a static instruction. x86 and arm decoders
add now the instruction size to the macro instruction. However, microops
are still handled by the fetch stage which is not nice.
Furthermore, we add a set method to the PC state. It allows setting a PC
state to acertain address.
Both methods are required for the decoupled front-end.

Change-Id: I311fe3f637e867c42dee7781f5373ea2e69e2072
2023-10-04 10:49:30 +01:00
David Schall
7d2e1ee789 arch: Add instruction size and PC set methods
Adds the instruction size to all static instruction. x86, arm
and RISC-V decoders add the instruction size to every decoded
macro instruction. As microops should reflect the size of the
their parent macroop the set method is overwritten to pass the
size to all microops.
Furthermore, we add a set method to the PC state. It allows
setting a PC state to a certain address.
Both methods are required for the decoupled front-end.

Change-Id: I311fe3f637e867c42dee7781f5373ea2e69e2072
Signed-off-by: David Schall <david.schall@ed.ac.uk>
2023-10-02 20:10:57 +00:00
Hoa Nguyen
da72590c19 arch-riscv: FS bits -> DIRTY for more floating point loads
The affected instructions are,
- c.flw
- c.flwsp
- flh
- flw

This change is related to [1] [2], which also aim to change the
FS bits to DIRTY when the state of any floating point register
might change.

[1] https://gem5-review.googlesource.com/c/public/gem5/+/65272
[2] https://github.com/gem5/gem5/pull/370

Change-Id: I098e1b1812fb352bd5d3614ff5d3547e58903b65
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2023-10-01 23:12:25 -07:00
Hoa Nguyen
6640447c1e arch-riscv: Update FS bits when doing floating point loads
This problem is similar to the problem described in [1].
This problem produces symptoms as described in [2].

In short, the Linux kernel relies on the CSR_STATUS's FS bits
to decide whether to save the floating point registers. If
the FS bits are set to DIRTY, the floating point registers will
be saved during context switching / task switching.

Currently, with the patch in [1], we only change the FS bits
upon every floating arithmetic instruction. However, since
floating load instructions also mutate the state of floating
point registers, the FS bits should be updated to DIRTY.

The problem in [2] arose when the program populates the content
of one floating register to an array by repeatedly using
`fld fa5, EA`. A context switch occured upon a page fault, and
while handling that page fault, the kernel might have to handle
an interrupt. This caused the kernel to task switch between
handling page fault and handling interrupt. This caused
__switch_to() to be called, which will save the floating point
registers only if the SD (indirectly set by FS) bits are set to
DIRTY, while restoring the floating point registers to the
switch-to task [3]. This caused the floating point registers to
be zeroed out when it was restored as it was never saved before.

[1] https://gem5-review.googlesource.com/c/public/gem5/+/65272
[2] https://github.com/gem5/gem5/issues/349
[3] https://github.com/torvalds/linux/blob/v6.5/arch/riscv/include/asm/switch_to.h#L56

Change-Id: Ia5656da5a589a8e29fb699d2ee12885b8f3fa2d2
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2023-09-28 19:14:29 -07:00
Bobby R. Bruce
49a1d48264 arch-x86: properly initialize the auxv platform string (#347)
The auxv platform string was not copied to the same location that was
pointed to by the value of AT_PLATFORM; instead, it was copied over the
auxv random buffer. This patch fixes this by copying the auxv platform
string to the right offset in the initial program stack.

GitHub issue: https://github.com/gem5/gem5/issues/346
2023-09-27 14:31:19 -07:00
Bobby R. Bruce
4638434b97 arch-x86: make popx87 micro-op actually pop st(0) (#345)
The popx87 micro-op did not in fact pop the st(0) floating-point
register off the stack; it acted as a no-op. This patch fixes the bug by
passing the spm=1 argument to PopX87's superclass to indicate the
floating-point stack pointer should be incremented.

GitHub issue: https://github.com/gem5/gem5/issues/344
2023-09-27 14:31:00 -07:00
Jason Lowe-Power
010ac43369 arch-riscv: Make RISC-V decodeInst overridable (#350)
The change will allow developers to implement and decode their
non-standard instructions to the CPU models
2023-09-25 06:43:56 -07:00
Giacomo Travaglini
df60b0f5c9 arch-arm: Implement FEAT_FGT
Change-Id: I89391f17f353ab6ce555d65783977c1f30f64fc5
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-09-22 16:33:58 +01:00
Giacomo Travaglini
37b6824c4c arch-arm: Fix disassembly for NZCV read/writes
At the moment the instruction is disassembled as an integer
operation:

msrNZCV   x547, x0

Instead of

msr nzcv x0

Change-Id: I3f6576dccbe86db401c73747750ca3cfdf4055d5
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-09-22 16:33:58 +01:00
Roger Chang
d55f8f2716 arch: Enable customized decoder class name
Developers can make the own ISADesc action in the SConscript with
their decoder class name.

Change-Id: I011cf059642e178913e1f62df4e5c02401cc132e
2023-09-22 15:45:56 +08:00
Roger Chang
5b41112e03 arch-riscv: Make RISC-V decodeInst overridable
The change will allow developers to implement and decode their
non-standard instructions to the CPU models

Bug: 289467440
Test: None
Change-Id: I67f4abc71596f819c1265e325784f51c8e9bb359
2023-09-22 11:38:22 +08:00
Nicholas Mosier
7298ebd49b arch-x86: properly initialize the auxv platform string
The auxv platform string was not copied to the same location that was
pointed to by the value of AT_PLATFORM; instead, it was copied over
the auxv random buffer. This patch fixes this by copying the auxv
platform string to the right offset in the initial program stack.

GitHub issue: https://github.com/gem5/gem5/issues/346

Change-Id: Ied4b660d5fc444a94acb97b799be0a3722438b5e
2023-09-21 05:16:17 +00:00
Nicholas Mosier
5697bf26a8 arch-x86: make popx87 micro-op actually pop st(0)
The popx87 micro-op did not in fact pop the st(0) floating-point
register off the stack; it acted as a no-op. This patch fixes the bug
by passing the spm=1 argument to PopX87's superclass to indicate the
floating-point stack pointer should be incremented.

GitHub issue: https://github.com/gem5/gem5/issues/344

Change-Id: I6e731882b6bcf8f0e06ebd2f66f673bf9da80717
2023-09-21 04:29:05 +00:00
Bobby R. Bruce
958eda6961 arch-riscv: Fix inst flags for jal and jalr (#325)
The jal and jalr share the same instruction format JumpConstructor,
which sets the IsCall and IsReturn flags by the register ID. However, it
may cause wrong instruction flags set for jal because the section
"handle the 'Jalr' instruction" misses the opcode checking. The PR fix
the issue to ensure the IsReturn can be only set in Jalr.
2023-09-20 16:25:21 -07:00
Roger Chang
70c1d762c7 arch-riscv: Fix inst flags for jal and jalr
The jal and jalr share the same instruction format JumpConstructor,
which sets the IsCall and IsReturn flags by the register ID.
However, it may cause wrong instruction flags set for jal because
the section "handle the 'Jalr' instruction" misses the opcode
checking. The PR fix the issue to ensure the IsReturn can be only
set in Jalr.

Change-Id: I9ad867a389256f9253988552e6567d2b505a6901
2023-09-20 14:27:23 +08:00
Nicholas Mosier
741a901d8d arch-x86: fix negative overflow check bug in PACK micro-op
The implementation of the x86 PACK micro-op had a logical bug that
caused the `PACKSSWB` and `PACKSSDW` instructions to produce
incorrect results. Specifically, due to a signedness error, the
overflow check for negative integers being packed always evaluated
to true, resulting in all negative integers being packed as -1 in
the output.

This patch fixes the signedness error that causes the bug.

GitHub issue: https://github.com/gem5/gem5/issues/331

Change-Id: I44b7328a8ce31742a3c0dfaebd747f81751e8851
2023-09-20 05:09:32 +00:00
Nicholas Mosier
2178e26bf2 arch-x86: initialize and correct bitwidth for FPU tag word
The x87 FPU tag word (FTW) was not explicitly initialized in
{X86_64,i386}Process::initState(), resulting in holding an initial
value of zero, resulting in an invalid x87 FPU state. This commit
initializes FTW to 0xFFFF, indicating the FPU is empty at program
start during syscall emulation.

The 16-bit FTW register was also incorrectly masked down to 8-bits
in X86ISA::ISA::setMiscRegNoEffect(), leading to an invalid X87 FPU
state that later caused crashes in the X86KvmCPU. This commit
corrects the bitwidth of the mask to 16.

GitHub issue: https://github.com/gem5/gem5/issues/303

Change-Id: I97892d707998a87c1ff8546e08c15fede7eed66f
2023-09-12 15:39:29 +00:00
Bobby R. Bruce
d67a6603c1 cpu-kvm: properly set x86 xsave header on gem5->KVM transition (#298)
If the XSAVE KVM capability is available (KVM_CAP_XSAVE), the X86KvmCPU
will try to set the x87 FPU + SSE state using KVM_SET_XSAVE, which
expects a buffer (struct kvm_xsave) in XSAVE area format (Vol. 1, Sec.
13.4 of Intel x86 SDM). The original implementation of
`X86KvmCPU::updateKvmStateFPUXSave()`, however, improperly sets the
xsave header, which contains a bitmap of state components present in the
xsave area.

This patch defines `XSaveHeader` structure to model the xsave header,
which is expected directly following the legacy FPU region (defined in
the `FXSave` structure) in the xsave area. It then sets two bist in the
xsave header to indicate the presence of x86 FPU and SSE state
components.

GitHub issue: https://github.com/gem5/gem5/issues/296
2023-09-12 08:32:20 -07:00
Roger Chang
def89745bc arch-riscv: Allow Minor and O3 CPU execute RVV
Change-Id: I4780b42c25d349806254b5053fb0da3b6993ca2f
2023-09-12 13:56:22 +08:00
Roger Chang
0f54cb0593 arch-riscv: Remove check vconf done implementation
Change-Id: If633cef209390d0500c4c2c5741d56158ef26c00
2023-09-12 13:56:22 +08:00
Roger Chang
31b95987da arch-riscv: Change the instruction family to jump like
The method that get the vl, vtype from PCState in the next changes

Change-Id: I022b47b7a96572f6434eed30dd9f7caa79854c31
2023-09-12 13:56:22 +08:00
Roger Chang
282765234b arch-riscv: Implement the branchTarget for vset*vl*
Change-Id: I10bf6be736ce2b99323ace410bff1d8e1e2a4123
2023-09-12 13:56:22 +08:00
Roger Chang
a3aaad2ecd arch-riscv: Refactor the execution part of vset*vl*
Change-Id: Ie0d9671242481a85bb0fe5728748b16c3ef62592
2023-09-12 13:56:21 +08:00
Roger Chang
1bde42760f arch-riscv: Get vl, vtype and vlenb from PCState
Change-Id: I0ded57a3dc2db6fcc7121f147bcaf6d8a8873f6a
2023-09-12 13:56:21 +08:00