We pay a small duplication cost but we make the code
more readable and we enable further modifications to the
AArch64 code without forcing the same code on the AArch32
method
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: I1efa33cf19f91094fd33bd48b6a0a57d8df8f89f
Exposed in our failing compiler tests:
https://github.com/gem5/gem5/actions/runs/6348223508, this PR:
* Adds missing overrides to `PCState`'s `set` function.
* Removes `std::binary_function` from DramPower (it was deprecated in
CPP-11 and officially removed in CPP-17).
This is still trying to completely remove any artifact
which implies virtualization is only supported in
non-secure mode (NS=1)
Change-Id: I83fed1c33cc745ecdf3c5ad60f4f356f3c58aad5
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
This info can be used during TLB invalidation
Change-Id: I81247e40b11745f0207178b52c47845ca1b92870
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Modified the x86 KVM-in-SE syscall handler to flush the TLB following
each syscall, in case the page table has been modified. This is done
by reloading the value in %cr3. Doing this requires an intermediate
GPR, which we store in a new scratch buffer following the syscall code
at address `syscallDataBuf`.
GitHub issue: https://github.com/gem5/gem5/issues/409
Change-Id: Ibc20018c97ebb1794fa31a0c71e0857d661c7c9d
Added a new feature to CHI protocol (in collaboration with @tiagormk).
Here is the Jira Ticket
[https://gem5.atlassian.net/browse/GEM5-1326](https://gem5.atlassian.net/browse/GEM5-1326
). As described in CHI specs, far atomic transactions enable remote
execution of Atomic Memory Operations. This pull request incorporates
several changes:
* Fix Arm ISA definition of Swap instructions. These instructions should
return an operand, so their ISA definition should be Return Operation.
* Enable AMOs in Ruby Mem Test to verify that AMOs work
* Enable near and far AMO in the Cache Controler of CHI
Three configuration parameters have been used to tune this behavior:
* policy_type: sets the atomic policy to one of the described in [our
paper](https://dl.acm.org/doi/10.1145/3579371.3589065)
* atomic_op_latency: simulates the AMO ALU operation latency
* comp_anr: configures the Atomic No return transaction to split
CompDBIDResp into two different messages DBIDResp and Comp
As highlighed in this failing compiler test:
https://github.com/gem5/gem5/actions/runs/6348223508/job/17389057995
Clang was failing when compiling "build/ALL/gem5.opt" due missing
overrides in `PCState`'s "set" function.
This was observed in Clang-14 and, stangely, Clang-8.
Change-Id: I240c1087e8875fd07630e467e7452c62a5d14d5b
Swap instructions are configured as non returning AMO operations. This is wrong because they
return the previous value stored in the target memory position
Change-Id: I84d75a571a8eaeaee0dbfac344f7b34c72b47d53
Add the instruction size of a static instruction. x86 and arm decoders
add now the instruction size to the macro instruction. However, microops
are still handled by the fetch stage which is not nice.
Furthermore, we add a set method to the PC state. It allows setting a PC
state to acertain address.
Both methods are required for the decoupled front-end.
Change-Id: I311fe3f637e867c42dee7781f5373ea2e69e2072
Adds the instruction size to all static instruction. x86, arm
and RISC-V decoders add the instruction size to every decoded
macro instruction. As microops should reflect the size of the
their parent macroop the set method is overwritten to pass the
size to all microops.
Furthermore, we add a set method to the PC state. It allows
setting a PC state to a certain address.
Both methods are required for the decoupled front-end.
Change-Id: I311fe3f637e867c42dee7781f5373ea2e69e2072
Signed-off-by: David Schall <david.schall@ed.ac.uk>
This problem is similar to the problem described in [1].
This problem produces symptoms as described in [2].
In short, the Linux kernel relies on the CSR_STATUS's FS bits
to decide whether to save the floating point registers. If
the FS bits are set to DIRTY, the floating point registers will
be saved during context switching / task switching.
Currently, with the patch in [1], we only change the FS bits
upon every floating arithmetic instruction. However, since
floating load instructions also mutate the state of floating
point registers, the FS bits should be updated to DIRTY.
The problem in [2] arose when the program populates the content
of one floating register to an array by repeatedly using
`fld fa5, EA`. A context switch occured upon a page fault, and
while handling that page fault, the kernel might have to handle
an interrupt. This caused the kernel to task switch between
handling page fault and handling interrupt. This caused
__switch_to() to be called, which will save the floating point
registers only if the SD (indirectly set by FS) bits are set to
DIRTY, while restoring the floating point registers to the
switch-to task [3]. This caused the floating point registers to
be zeroed out when it was restored as it was never saved before.
[1] https://gem5-review.googlesource.com/c/public/gem5/+/65272
[2] https://github.com/gem5/gem5/issues/349
[3] https://github.com/torvalds/linux/blob/v6.5/arch/riscv/include/asm/switch_to.h#L56
Change-Id: Ia5656da5a589a8e29fb699d2ee12885b8f3fa2d2
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
The auxv platform string was not copied to the same location that was
pointed to by the value of AT_PLATFORM; instead, it was copied over the
auxv random buffer. This patch fixes this by copying the auxv platform
string to the right offset in the initial program stack.
GitHub issue: https://github.com/gem5/gem5/issues/346
The popx87 micro-op did not in fact pop the st(0) floating-point
register off the stack; it acted as a no-op. This patch fixes the bug by
passing the spm=1 argument to PopX87's superclass to indicate the
floating-point stack pointer should be incremented.
GitHub issue: https://github.com/gem5/gem5/issues/344
At the moment the instruction is disassembled as an integer
operation:
msrNZCV x547, x0
Instead of
msr nzcv x0
Change-Id: I3f6576dccbe86db401c73747750ca3cfdf4055d5
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The change will allow developers to implement and decode their
non-standard instructions to the CPU models
Bug: 289467440
Test: None
Change-Id: I67f4abc71596f819c1265e325784f51c8e9bb359
The auxv platform string was not copied to the same location that was
pointed to by the value of AT_PLATFORM; instead, it was copied over
the auxv random buffer. This patch fixes this by copying the auxv
platform string to the right offset in the initial program stack.
GitHub issue: https://github.com/gem5/gem5/issues/346
Change-Id: Ied4b660d5fc444a94acb97b799be0a3722438b5e
The popx87 micro-op did not in fact pop the st(0) floating-point
register off the stack; it acted as a no-op. This patch fixes the bug
by passing the spm=1 argument to PopX87's superclass to indicate the
floating-point stack pointer should be incremented.
GitHub issue: https://github.com/gem5/gem5/issues/344
Change-Id: I6e731882b6bcf8f0e06ebd2f66f673bf9da80717
The jal and jalr share the same instruction format JumpConstructor,
which sets the IsCall and IsReturn flags by the register ID. However, it
may cause wrong instruction flags set for jal because the section
"handle the 'Jalr' instruction" misses the opcode checking. The PR fix
the issue to ensure the IsReturn can be only set in Jalr.
The jal and jalr share the same instruction format JumpConstructor,
which sets the IsCall and IsReturn flags by the register ID.
However, it may cause wrong instruction flags set for jal because
the section "handle the 'Jalr' instruction" misses the opcode
checking. The PR fix the issue to ensure the IsReturn can be only
set in Jalr.
Change-Id: I9ad867a389256f9253988552e6567d2b505a6901
The implementation of the x86 PACK micro-op had a logical bug that
caused the `PACKSSWB` and `PACKSSDW` instructions to produce
incorrect results. Specifically, due to a signedness error, the
overflow check for negative integers being packed always evaluated
to true, resulting in all negative integers being packed as -1 in
the output.
This patch fixes the signedness error that causes the bug.
GitHub issue: https://github.com/gem5/gem5/issues/331
Change-Id: I44b7328a8ce31742a3c0dfaebd747f81751e8851
The x87 FPU tag word (FTW) was not explicitly initialized in
{X86_64,i386}Process::initState(), resulting in holding an initial
value of zero, resulting in an invalid x87 FPU state. This commit
initializes FTW to 0xFFFF, indicating the FPU is empty at program
start during syscall emulation.
The 16-bit FTW register was also incorrectly masked down to 8-bits
in X86ISA::ISA::setMiscRegNoEffect(), leading to an invalid X87 FPU
state that later caused crashes in the X86KvmCPU. This commit
corrects the bitwidth of the mask to 16.
GitHub issue: https://github.com/gem5/gem5/issues/303
Change-Id: I97892d707998a87c1ff8546e08c15fede7eed66f
If the XSAVE KVM capability is available (KVM_CAP_XSAVE), the X86KvmCPU
will try to set the x87 FPU + SSE state using KVM_SET_XSAVE, which
expects a buffer (struct kvm_xsave) in XSAVE area format (Vol. 1, Sec.
13.4 of Intel x86 SDM). The original implementation of
`X86KvmCPU::updateKvmStateFPUXSave()`, however, improperly sets the
xsave header, which contains a bitmap of state components present in the
xsave area.
This patch defines `XSaveHeader` structure to model the xsave header,
which is expected directly following the legacy FPU region (defined in
the `FXSave` structure) in the xsave area. It then sets two bist in the
xsave header to indicate the presence of x86 FPU and SSE state
components.
GitHub issue: https://github.com/gem5/gem5/issues/296
The changes includes:
1. Add VL, Vtype and VlenbBits operands
2. Change R/W methods of VL, Vtype and VlenbBits from PCState
Change-Id: I0531ddc14344f2cca94d0e750a3b4291e0227d54
We should not try to check vtype when decoding the instruction.
It should be checked in vset{i}vl{i} since the register can be
modified via vset{i}vl{i}
Change-Id: I403e5c4579bc5b8e6af10f93eac20c14662e4d2d
If the XSAVE KVM capability is available (KVM_CAP_XSAVE), the X86KvmCPU
will try to set the x87 FPU + SSE state using KVM_SET_XSAVE, which
expects a buffer (struct kvm_xsave) in XSAVE area format (Vol. 1,
Sec. 13.4 of Intel x86 SDM). The original implementation of
`X86KvmCPU::updateKvmStateFPUXSave()`, however, improperly sets the
xsave header, which contains a bitmap of state components present
in the xsave area.
This patch defines `XSaveHeader` structure to model the xsave header,
which is expected directly following the legacy FPU region (defined in
the `FXSave` structure) in the xsave area. It then sets two bist in
the xsave header to indicate the presence of x86 FPU and SSE state
components.
GitHub issue: https://github.com/gem5/gem5/issues/296
Change-Id: I5c5c7925fa7f78a7b5e2adc209187deff53ac039
The RM field of ModRM was printed as Reg field for several instructions.
For reference, this change fixes typos introduced by [1].
[1] https://gem5-review.googlesource.com/c/public/gem5/+/40339
Change-Id: I41eb58e6a70845c4ddd6774ccba81b8069888be5
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
Flat scratch instructions (aka private) are the 3rd and final segment of
flat instructions in gfx9 (Vega) and beyond. These are used for things
like spills/fills and thread local storage. This commit enables two
forms of flat scratch instructions: (1) flat_load/flat_store
instructions where the memory address resolves to private memory and (2)
the new scratch_load/scratch_store instructions in Vega. The first are
similar to older generation ISAs where the aperture is unknown until
address translation. The second are instructions guaranteed to go to
private memory.
Since these are very similar to flat global instructions there are
minimal changes needed:
- Ensure a flat instruction is either regular flat, global, XOR scratch
- Rename the global op_encoding methods to GlobalScratch to indicate
they are for both and are intentionally used.
- Flat instructions in segment 1 output scratch_ in the disassembly
- Flat instruction executed as private use similar mem helpers as global
- Flat scratch cannot be an atomic
This was tested using a modified version of the 'square' application:
template <typename T>
__global__ void
scratch_square(T *C_d, T *A_d, size_t N)
{
size_t offset = (blockIdx.x * blockDim.x + threadIdx.x);
size_t stride = blockDim.x * gridDim.x ;
volatile int foo; // Volatile ensures scratch / unoptimized code
for (size_t i=offset; i<N; i+=stride) {
foo = A_d[i];
C_d[i] = foo * foo;
}
}
Change-Id: Icc91a7f67836fa3e759fefe7c1c3f6851528ae7d
There are a few LDS instructions that perform local ALU operations and
writeback which are marked as loads. These are marked as loads because
they fit in the pipeline logic better, according to a several year old
comment. In the VEGA ISA these instructions (swizzle, permute, bpermute)
are not decrementing the LDS load counter. As a result, the counter will
gradually increase over time. Since wavefront slots are persistent, this
can cause applications with a few thousand kernels to eventually hang
thinking there are not enough resources.
This changeset fixes this by decrementing the LDS load counter for these
instructions. This fix was already integrated in the GCN3 ISA in the
exact same way. This changeset moves it near a similar comment about
scheduling register file writes.
Change-Id: Ife5237a2cae7213948c32ef266f4f8f22917351c
On RISC-V when trap occurs the contents of PC register contains the
address of instruction that caused that trap (as opposed to the address
of instruction following it in instruction stream). Therefore this commit
does not advance the PC before reporting trap in SE mode.
Change-Id: I83f3766cff276312cefcf1b4ac6e78a6569846b9
Due to inverted logic in POWER fault handlers, unimplemented opcode and
trap faults did not report trap to GDB (if connected). This commit fixes
the problem.
While at it, I opted to use `if (! ...) { panic(...) }` rather than
`panic_if(...)`. I find it easier to understand in this case.
Change-Id: I6cd5dfd5f6546b8541d685e877afef21540d6824
The mem instructions usually executed from initiateAcc. We also need
to check vector required in those instructions
Change-Id: I97b4fec7fada432abb55ca58050615e12e00d1ca