Commit Graph

5666 Commits

Author SHA1 Message Date
Giacomo Travaglini
e412ddddbd arch-arm: Split takeInt into AArch64/32 versions
We pay a small duplication cost but we make the code
more readable and we enable further modifications to the
AArch64 code without forcing the same code on the AArch32
method

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: I1efa33cf19f91094fd33bd48b6a0a57d8df8f89f
2023-10-10 09:45:59 +01:00
Bobby R. Bruce
bbe05b0cba tests,misc: Fix compilation tests failures (#400)
Exposed in our failing compiler tests:
https://github.com/gem5/gem5/actions/runs/6348223508, this PR:

* Adds missing overrides to `PCState`'s `set` function.
* Removes `std::binary_function` from DramPower (it was deprecated in
CPP-11 and officially removed in CPP-17).
2023-10-09 11:20:52 -07:00
Andreas Sandberg
ec7921305b arch-arm: Implement FEAT_TLBIRANGE extension (#414) 2023-10-09 17:09:31 +01:00
Giacomo Travaglini
39fdfaea5a arch-arm: Implement FEAT_TLBIRANGE
Change-Id: I7eb020573420e49a8a54e1fc7a89eb6e2236dacb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-09 13:59:47 +01:00
Giacomo Travaglini
6b698630a2 arch-arm: Check VMID in secure mode as well (NS=0)
This is still trying to completely remove any artifact
which implies virtualization is only supported in
non-secure mode (NS=1)

Change-Id: I83fed1c33cc745ecdf3c5ad60f4f356f3c58aad5
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-09 13:56:57 +01:00
Giacomo Travaglini
a8efded644 arch-arm: Include Granule Size in a TLB entry
This info can be used during TLB invalidation

Change-Id: I81247e40b11745f0207178b52c47845ca1b92870
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-10-09 13:56:57 +01:00
Nicholas Mosier
7a0e84d853 cpu-kvm, arch-x86: flush TLB after syscalls
Modified the x86 KVM-in-SE syscall handler to flush the TLB following
each syscall, in case the page table has been modified. This is done
by reloading the value in %cr3. Doing this requires an intermediate
GPR, which we store in a new scratch buffer following the syscall code
at address `syscallDataBuf`.

GitHub issue: https://github.com/gem5/gem5/issues/409

Change-Id: Ibc20018c97ebb1794fa31a0c71e0857d661c7c9d
2023-10-06 20:41:59 +00:00
Giacomo Travaglini
ae104cc431 mem-ruby: Add new feature far atomics in CHI (#177)
Added a new feature to CHI protocol (in collaboration with @tiagormk).
Here is the Jira Ticket
[https://gem5.atlassian.net/browse/GEM5-1326](https://gem5.atlassian.net/browse/GEM5-1326
). As described in CHI specs, far atomic transactions enable remote
execution of Atomic Memory Operations. This pull request incorporates
several changes:

* Fix Arm ISA definition of Swap instructions. These instructions should
return an operand, so their ISA definition should be Return Operation.
* Enable AMOs in Ruby Mem Test to verify that AMOs work
* Enable near and far AMO in the Cache Controler of CHI

Three configuration parameters have been used to tune this behavior:
* policy_type: sets the atomic policy to one of the described in [our
paper](https://dl.acm.org/doi/10.1145/3579371.3589065)
* atomic_op_latency: simulates the AMO ALU operation latency
* comp_anr: configures the Atomic No return transaction to split
CompDBIDResp into two different messages DBIDResp and Comp
2023-10-06 10:09:58 +01:00
Bobby R. Bruce
761f6b73a0 arch-arm: Implement FEAT_FGT (#334)
This PR implements FEAT_FGT (Fine Grain Traps)
2023-10-05 10:44:26 -07:00
Bobby R. Bruce
39c7e7d1ed arch: Adding missing override to PCState.set
As highlighed in this failing compiler test:
https://github.com/gem5/gem5/actions/runs/6348223508/job/17389057995

Clang was failing when compiling "build/ALL/gem5.opt" due missing
overrides in `PCState`'s "set" function.

This was observed in Clang-14 and, stangely, Clang-8.

Change-Id: I240c1087e8875fd07630e467e7452c62a5d14d5b
2023-10-05 10:18:19 -07:00
Roger Chang
ea3ee880aa arch-riscv: Implement Zcb instructions
Added the following instructions:
c.lbu
c.lh
c.lhu
c.sb
c.sh
c.zext.b
c.sext.b
c.zext.h
c.sext.h
c.zext.w
c.not
c.mul

Reference: https://github.com/riscv/riscv-code-size-reduction
Change-Id: Ib04820bf5591b365a3bfbbd8b90655a8a1d844cf
2023-10-05 18:46:35 +08:00
Víctor Soria
12dada2dc5 arch-arm: Correct return operand in swap instructions
Swap instructions are configured as non returning AMO operations. This is wrong because they
return the previous value stored in the target memory position

Change-Id: I84d75a571a8eaeaee0dbfac344f7b34c72b47d53
2023-10-04 19:11:01 +02:00
Andreas Sandberg
7806eaad51 arch: Add instruction size and PC set methods (#357)
Add the instruction size of a static instruction. x86 and arm decoders
add now the instruction size to the macro instruction. However, microops
are still handled by the fetch stage which is not nice.
Furthermore, we add a set method to the PC state. It allows setting a PC
state to acertain address.
Both methods are required for the decoupled front-end.

Change-Id: I311fe3f637e867c42dee7781f5373ea2e69e2072
2023-10-04 10:49:30 +01:00
David Schall
7d2e1ee789 arch: Add instruction size and PC set methods
Adds the instruction size to all static instruction. x86, arm
and RISC-V decoders add the instruction size to every decoded
macro instruction. As microops should reflect the size of the
their parent macroop the set method is overwritten to pass the
size to all microops.
Furthermore, we add a set method to the PC state. It allows
setting a PC state to a certain address.
Both methods are required for the decoupled front-end.

Change-Id: I311fe3f637e867c42dee7781f5373ea2e69e2072
Signed-off-by: David Schall <david.schall@ed.ac.uk>
2023-10-02 20:10:57 +00:00
Hoa Nguyen
da72590c19 arch-riscv: FS bits -> DIRTY for more floating point loads
The affected instructions are,
- c.flw
- c.flwsp
- flh
- flw

This change is related to [1] [2], which also aim to change the
FS bits to DIRTY when the state of any floating point register
might change.

[1] https://gem5-review.googlesource.com/c/public/gem5/+/65272
[2] https://github.com/gem5/gem5/pull/370

Change-Id: I098e1b1812fb352bd5d3614ff5d3547e58903b65
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2023-10-01 23:12:25 -07:00
Hoa Nguyen
6640447c1e arch-riscv: Update FS bits when doing floating point loads
This problem is similar to the problem described in [1].
This problem produces symptoms as described in [2].

In short, the Linux kernel relies on the CSR_STATUS's FS bits
to decide whether to save the floating point registers. If
the FS bits are set to DIRTY, the floating point registers will
be saved during context switching / task switching.

Currently, with the patch in [1], we only change the FS bits
upon every floating arithmetic instruction. However, since
floating load instructions also mutate the state of floating
point registers, the FS bits should be updated to DIRTY.

The problem in [2] arose when the program populates the content
of one floating register to an array by repeatedly using
`fld fa5, EA`. A context switch occured upon a page fault, and
while handling that page fault, the kernel might have to handle
an interrupt. This caused the kernel to task switch between
handling page fault and handling interrupt. This caused
__switch_to() to be called, which will save the floating point
registers only if the SD (indirectly set by FS) bits are set to
DIRTY, while restoring the floating point registers to the
switch-to task [3]. This caused the floating point registers to
be zeroed out when it was restored as it was never saved before.

[1] https://gem5-review.googlesource.com/c/public/gem5/+/65272
[2] https://github.com/gem5/gem5/issues/349
[3] https://github.com/torvalds/linux/blob/v6.5/arch/riscv/include/asm/switch_to.h#L56

Change-Id: Ia5656da5a589a8e29fb699d2ee12885b8f3fa2d2
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2023-09-28 19:14:29 -07:00
Bobby R. Bruce
49a1d48264 arch-x86: properly initialize the auxv platform string (#347)
The auxv platform string was not copied to the same location that was
pointed to by the value of AT_PLATFORM; instead, it was copied over the
auxv random buffer. This patch fixes this by copying the auxv platform
string to the right offset in the initial program stack.

GitHub issue: https://github.com/gem5/gem5/issues/346
2023-09-27 14:31:19 -07:00
Bobby R. Bruce
4638434b97 arch-x86: make popx87 micro-op actually pop st(0) (#345)
The popx87 micro-op did not in fact pop the st(0) floating-point
register off the stack; it acted as a no-op. This patch fixes the bug by
passing the spm=1 argument to PopX87's superclass to indicate the
floating-point stack pointer should be incremented.

GitHub issue: https://github.com/gem5/gem5/issues/344
2023-09-27 14:31:00 -07:00
Jason Lowe-Power
010ac43369 arch-riscv: Make RISC-V decodeInst overridable (#350)
The change will allow developers to implement and decode their
non-standard instructions to the CPU models
2023-09-25 06:43:56 -07:00
Giacomo Travaglini
df60b0f5c9 arch-arm: Implement FEAT_FGT
Change-Id: I89391f17f353ab6ce555d65783977c1f30f64fc5
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-09-22 16:33:58 +01:00
Giacomo Travaglini
37b6824c4c arch-arm: Fix disassembly for NZCV read/writes
At the moment the instruction is disassembled as an integer
operation:

msrNZCV   x547, x0

Instead of

msr nzcv x0

Change-Id: I3f6576dccbe86db401c73747750ca3cfdf4055d5
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-09-22 16:33:58 +01:00
Roger Chang
d55f8f2716 arch: Enable customized decoder class name
Developers can make the own ISADesc action in the SConscript with
their decoder class name.

Change-Id: I011cf059642e178913e1f62df4e5c02401cc132e
2023-09-22 15:45:56 +08:00
Roger Chang
5b41112e03 arch-riscv: Make RISC-V decodeInst overridable
The change will allow developers to implement and decode their
non-standard instructions to the CPU models

Bug: 289467440
Test: None
Change-Id: I67f4abc71596f819c1265e325784f51c8e9bb359
2023-09-22 11:38:22 +08:00
Nicholas Mosier
7298ebd49b arch-x86: properly initialize the auxv platform string
The auxv platform string was not copied to the same location that was
pointed to by the value of AT_PLATFORM; instead, it was copied over
the auxv random buffer. This patch fixes this by copying the auxv
platform string to the right offset in the initial program stack.

GitHub issue: https://github.com/gem5/gem5/issues/346

Change-Id: Ied4b660d5fc444a94acb97b799be0a3722438b5e
2023-09-21 05:16:17 +00:00
Nicholas Mosier
5697bf26a8 arch-x86: make popx87 micro-op actually pop st(0)
The popx87 micro-op did not in fact pop the st(0) floating-point
register off the stack; it acted as a no-op. This patch fixes the bug
by passing the spm=1 argument to PopX87's superclass to indicate the
floating-point stack pointer should be incremented.

GitHub issue: https://github.com/gem5/gem5/issues/344

Change-Id: I6e731882b6bcf8f0e06ebd2f66f673bf9da80717
2023-09-21 04:29:05 +00:00
Bobby R. Bruce
958eda6961 arch-riscv: Fix inst flags for jal and jalr (#325)
The jal and jalr share the same instruction format JumpConstructor,
which sets the IsCall and IsReturn flags by the register ID. However, it
may cause wrong instruction flags set for jal because the section
"handle the 'Jalr' instruction" misses the opcode checking. The PR fix
the issue to ensure the IsReturn can be only set in Jalr.
2023-09-20 16:25:21 -07:00
Roger Chang
70c1d762c7 arch-riscv: Fix inst flags for jal and jalr
The jal and jalr share the same instruction format JumpConstructor,
which sets the IsCall and IsReturn flags by the register ID.
However, it may cause wrong instruction flags set for jal because
the section "handle the 'Jalr' instruction" misses the opcode
checking. The PR fix the issue to ensure the IsReturn can be only
set in Jalr.

Change-Id: I9ad867a389256f9253988552e6567d2b505a6901
2023-09-20 14:27:23 +08:00
Nicholas Mosier
741a901d8d arch-x86: fix negative overflow check bug in PACK micro-op
The implementation of the x86 PACK micro-op had a logical bug that
caused the `PACKSSWB` and `PACKSSDW` instructions to produce
incorrect results. Specifically, due to a signedness error, the
overflow check for negative integers being packed always evaluated
to true, resulting in all negative integers being packed as -1 in
the output.

This patch fixes the signedness error that causes the bug.

GitHub issue: https://github.com/gem5/gem5/issues/331

Change-Id: I44b7328a8ce31742a3c0dfaebd747f81751e8851
2023-09-20 05:09:32 +00:00
Nicholas Mosier
2178e26bf2 arch-x86: initialize and correct bitwidth for FPU tag word
The x87 FPU tag word (FTW) was not explicitly initialized in
{X86_64,i386}Process::initState(), resulting in holding an initial
value of zero, resulting in an invalid x87 FPU state. This commit
initializes FTW to 0xFFFF, indicating the FPU is empty at program
start during syscall emulation.

The 16-bit FTW register was also incorrectly masked down to 8-bits
in X86ISA::ISA::setMiscRegNoEffect(), leading to an invalid X87 FPU
state that later caused crashes in the X86KvmCPU. This commit
corrects the bitwidth of the mask to 16.

GitHub issue: https://github.com/gem5/gem5/issues/303

Change-Id: I97892d707998a87c1ff8546e08c15fede7eed66f
2023-09-12 15:39:29 +00:00
Bobby R. Bruce
d67a6603c1 cpu-kvm: properly set x86 xsave header on gem5->KVM transition (#298)
If the XSAVE KVM capability is available (KVM_CAP_XSAVE), the X86KvmCPU
will try to set the x87 FPU + SSE state using KVM_SET_XSAVE, which
expects a buffer (struct kvm_xsave) in XSAVE area format (Vol. 1, Sec.
13.4 of Intel x86 SDM). The original implementation of
`X86KvmCPU::updateKvmStateFPUXSave()`, however, improperly sets the
xsave header, which contains a bitmap of state components present in the
xsave area.

This patch defines `XSaveHeader` structure to model the xsave header,
which is expected directly following the legacy FPU region (defined in
the `FXSave` structure) in the xsave area. It then sets two bist in the
xsave header to indicate the presence of x86 FPU and SSE state
components.

GitHub issue: https://github.com/gem5/gem5/issues/296
2023-09-12 08:32:20 -07:00
Roger Chang
def89745bc arch-riscv: Allow Minor and O3 CPU execute RVV
Change-Id: I4780b42c25d349806254b5053fb0da3b6993ca2f
2023-09-12 13:56:22 +08:00
Roger Chang
0f54cb0593 arch-riscv: Remove check vconf done implementation
Change-Id: If633cef209390d0500c4c2c5741d56158ef26c00
2023-09-12 13:56:22 +08:00
Roger Chang
31b95987da arch-riscv: Change the instruction family to jump like
The method that get the vl, vtype from PCState in the next changes

Change-Id: I022b47b7a96572f6434eed30dd9f7caa79854c31
2023-09-12 13:56:22 +08:00
Roger Chang
282765234b arch-riscv: Implement the branchTarget for vset*vl*
Change-Id: I10bf6be736ce2b99323ace410bff1d8e1e2a4123
2023-09-12 13:56:22 +08:00
Roger Chang
a3aaad2ecd arch-riscv: Refactor the execution part of vset*vl*
Change-Id: Ie0d9671242481a85bb0fe5728748b16c3ef62592
2023-09-12 13:56:21 +08:00
Roger Chang
1bde42760f arch-riscv: Get vl, vtype and vlenb from PCState
Change-Id: I0ded57a3dc2db6fcc7121f147bcaf6d8a8873f6a
2023-09-12 13:56:21 +08:00
Roger Chang
8918302239 arch-riscv: Change the implementation of vset*vl*
The changes includes:

1. Add VL, Vtype and VlenbBits operands
2. Change R/W methods of VL, Vtype and VlenbBits from PCState

Change-Id: I0531ddc14344f2cca94d0e750a3b4291e0227d54
2023-09-12 13:56:21 +08:00
Roger Chang
7b5d8b4e5b arch-riscv: Add vlenb, vtype and vl in PCState
Change-Id: I7c2aed7dda34a1a449253671d7b86aa615c28464
2023-09-12 13:56:21 +08:00
Roger Chang
f94658098d arch-riscv: Remove checked_type in StaticInst Constructor
We should not try to check vtype when decoding the instruction.
It should be checked in vset{i}vl{i} since the register can be
modified via vset{i}vl{i}

Change-Id: I403e5c4579bc5b8e6af10f93eac20c14662e4d2d
2023-09-12 13:56:21 +08:00
Roger Chang
3f0475321a arch-riscv: Change VTYPE to BitUnion64
Change-Id: I7620ad1ef3ee0cc045bcd02b3c9a2d83f93bf3fe
2023-09-12 13:56:21 +08:00
Roger Chang
dfc725838e arch-riscv: Refactor PCState class
Change-Id: I1d25350ba2a3c7c366f42340c20b4488c33cde6f
2023-09-12 13:56:21 +08:00
Nicholas Mosier
2b9d558cef cpu-kvm: properly set x86 xsave header on gem5->KVM transition
If the XSAVE KVM capability is available (KVM_CAP_XSAVE), the X86KvmCPU
will try to set the x87 FPU + SSE state using KVM_SET_XSAVE, which
expects a buffer (struct kvm_xsave) in XSAVE area format (Vol. 1,
Sec. 13.4 of Intel x86 SDM). The original implementation of
`X86KvmCPU::updateKvmStateFPUXSave()`, however, improperly sets the
xsave header, which contains a bitmap of state components present
in the xsave area.

This patch defines `XSaveHeader` structure to model the xsave header,
which is expected directly following the legacy FPU region (defined in
the `FXSave` structure) in the xsave area. It then sets two bist in
the xsave header to indicate the presence of x86 FPU and SSE state
components.

GitHub issue: https://github.com/gem5/gem5/issues/296

Change-Id: I5c5c7925fa7f78a7b5e2adc209187deff53ac039
2023-09-10 15:16:50 +00:00
Hoa Nguyen
4ff1f160ec arch-x86: Fix wrong x86 assembly
The RM field of ModRM was printed as Reg field for several instructions.

For reference, this change fixes typos introduced by [1].

[1] https://gem5-review.googlesource.com/c/public/gem5/+/40339

Change-Id: I41eb58e6a70845c4ddd6774ccba81b8069888be5
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2023-09-01 00:26:51 +00:00
Matthew Poremba
60f071d09a gpu-compute,arch-vega: Implement flat scratch insts
Flat scratch instructions (aka private) are the 3rd and final segment of
flat instructions in gfx9 (Vega) and beyond. These are used for things
like spills/fills and thread local storage. This commit enables two
forms of flat scratch instructions: (1) flat_load/flat_store
instructions where the memory address resolves to private memory and (2)
the new scratch_load/scratch_store instructions in Vega. The first are
similar to older generation ISAs where the aperture is unknown until
address translation. The second are instructions guaranteed to go to
private memory.

Since these are very similar to flat global instructions there are
minimal changes needed:

- Ensure a flat instruction is either regular flat, global, XOR scratch
- Rename the global op_encoding methods to GlobalScratch to indicate
  they are for both and are intentionally used.
- Flat instructions in segment 1 output scratch_ in the disassembly
- Flat instruction executed as private use similar mem helpers as global
- Flat scratch cannot be an atomic

This was tested using a modified version of the 'square' application:

template <typename T>
__global__ void
scratch_square(T *C_d, T *A_d, size_t N)
{
    size_t offset = (blockIdx.x * blockDim.x + threadIdx.x);
    size_t stride = blockDim.x * gridDim.x ;

    volatile int foo; // Volatile ensures scratch / unoptimized code

    for (size_t i=offset; i<N; i+=stride) {
        foo = A_d[i];
        C_d[i] = foo * foo;
    }
}

Change-Id: Icc91a7f67836fa3e759fefe7c1c3f6851528ae7d
2023-08-26 13:40:12 -05:00
Matthew Poremba
90a518e885 gpu-compute,arch-vega: Fix ALU-only LDS counters
There are a few LDS instructions that perform local ALU operations and
writeback which are marked as loads. These are marked as loads because
they fit in the pipeline logic better, according to a several year old
comment. In the VEGA ISA these instructions (swizzle, permute, bpermute)
are not decrementing the LDS load counter. As a result, the counter will
gradually increase over time. Since wavefront slots are persistent, this
can cause applications with a few thousand kernels to eventually hang
thinking there are not enough resources.

This changeset fixes this by decrementing the LDS load counter for these
instructions. This fix was already integrated in the GCN3 ISA in the
exact same way. This changeset moves it near a similar comment about
scheduling register file writes.

Change-Id: Ife5237a2cae7213948c32ef266f4f8f22917351c
2023-08-23 19:30:24 -05:00
Jason Lowe-Power
22c52f4fba Fix reporting traps (faults) to GDB in SE mode (#166)
This addresses #123
2023-08-17 16:08:49 -07:00
Jan Vrany
3564348eec arch-riscv: Report traps to GDB in SE mode
This commit add code to report illegal instruction and breakpoint traps
to GDB (if connected). This merely follows what POWER does.
2023-08-17 15:55:04 +01:00
Jan Vrany
546b3eac7d arch-riscv: Do not advance PC when handling faults in SE mode
On RISC-V when trap occurs the contents of PC register contains the
address of instruction that caused that trap (as opposed to the address
of instruction following it in instruction stream). Therefore this commit
does not advance the PC before reporting trap in SE mode.

Change-Id: I83f3766cff276312cefcf1b4ac6e78a6569846b9
2023-08-17 15:55:04 +01:00
Jan Vrany
fde58a4365 arch-power: Fix reporting traps to GDB
Due to inverted logic in POWER fault handlers, unimplemented opcode and
trap faults did not report trap to GDB (if connected). This commit fixes
the problem.

While at it, I opted to use `if (! ...) { panic(...) }` rather than
`panic_if(...)`. I find it easier to understand in this case.

Change-Id: I6cd5dfd5f6546b8541d685e877afef21540d6824
2023-08-17 15:55:04 +01:00
Roger Chang
fe142f485a arch-riscv: Add missing vector required check for vmem instructions
The mem instructions usually executed from initiateAcc. We also need
to check vector required in those instructions

Change-Id: I97b4fec7fada432abb55ca58050615e12e00d1ca
2023-08-17 09:53:30 +08:00