Commit Graph

5643 Commits

Author SHA1 Message Date
Giacomo Travaglini
df60b0f5c9 arch-arm: Implement FEAT_FGT
Change-Id: I89391f17f353ab6ce555d65783977c1f30f64fc5
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-09-22 16:33:58 +01:00
Giacomo Travaglini
37b6824c4c arch-arm: Fix disassembly for NZCV read/writes
At the moment the instruction is disassembled as an integer
operation:

msrNZCV   x547, x0

Instead of

msr nzcv x0

Change-Id: I3f6576dccbe86db401c73747750ca3cfdf4055d5
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-09-22 16:33:58 +01:00
Bobby R. Bruce
958eda6961 arch-riscv: Fix inst flags for jal and jalr (#325)
The jal and jalr share the same instruction format JumpConstructor,
which sets the IsCall and IsReturn flags by the register ID. However, it
may cause wrong instruction flags set for jal because the section
"handle the 'Jalr' instruction" misses the opcode checking. The PR fix
the issue to ensure the IsReturn can be only set in Jalr.
2023-09-20 16:25:21 -07:00
Roger Chang
70c1d762c7 arch-riscv: Fix inst flags for jal and jalr
The jal and jalr share the same instruction format JumpConstructor,
which sets the IsCall and IsReturn flags by the register ID.
However, it may cause wrong instruction flags set for jal because
the section "handle the 'Jalr' instruction" misses the opcode
checking. The PR fix the issue to ensure the IsReturn can be only
set in Jalr.

Change-Id: I9ad867a389256f9253988552e6567d2b505a6901
2023-09-20 14:27:23 +08:00
Nicholas Mosier
741a901d8d arch-x86: fix negative overflow check bug in PACK micro-op
The implementation of the x86 PACK micro-op had a logical bug that
caused the `PACKSSWB` and `PACKSSDW` instructions to produce
incorrect results. Specifically, due to a signedness error, the
overflow check for negative integers being packed always evaluated
to true, resulting in all negative integers being packed as -1 in
the output.

This patch fixes the signedness error that causes the bug.

GitHub issue: https://github.com/gem5/gem5/issues/331

Change-Id: I44b7328a8ce31742a3c0dfaebd747f81751e8851
2023-09-20 05:09:32 +00:00
Nicholas Mosier
2178e26bf2 arch-x86: initialize and correct bitwidth for FPU tag word
The x87 FPU tag word (FTW) was not explicitly initialized in
{X86_64,i386}Process::initState(), resulting in holding an initial
value of zero, resulting in an invalid x87 FPU state. This commit
initializes FTW to 0xFFFF, indicating the FPU is empty at program
start during syscall emulation.

The 16-bit FTW register was also incorrectly masked down to 8-bits
in X86ISA::ISA::setMiscRegNoEffect(), leading to an invalid X87 FPU
state that later caused crashes in the X86KvmCPU. This commit
corrects the bitwidth of the mask to 16.

GitHub issue: https://github.com/gem5/gem5/issues/303

Change-Id: I97892d707998a87c1ff8546e08c15fede7eed66f
2023-09-12 15:39:29 +00:00
Bobby R. Bruce
d67a6603c1 cpu-kvm: properly set x86 xsave header on gem5->KVM transition (#298)
If the XSAVE KVM capability is available (KVM_CAP_XSAVE), the X86KvmCPU
will try to set the x87 FPU + SSE state using KVM_SET_XSAVE, which
expects a buffer (struct kvm_xsave) in XSAVE area format (Vol. 1, Sec.
13.4 of Intel x86 SDM). The original implementation of
`X86KvmCPU::updateKvmStateFPUXSave()`, however, improperly sets the
xsave header, which contains a bitmap of state components present in the
xsave area.

This patch defines `XSaveHeader` structure to model the xsave header,
which is expected directly following the legacy FPU region (defined in
the `FXSave` structure) in the xsave area. It then sets two bist in the
xsave header to indicate the presence of x86 FPU and SSE state
components.

GitHub issue: https://github.com/gem5/gem5/issues/296
2023-09-12 08:32:20 -07:00
Roger Chang
def89745bc arch-riscv: Allow Minor and O3 CPU execute RVV
Change-Id: I4780b42c25d349806254b5053fb0da3b6993ca2f
2023-09-12 13:56:22 +08:00
Roger Chang
0f54cb0593 arch-riscv: Remove check vconf done implementation
Change-Id: If633cef209390d0500c4c2c5741d56158ef26c00
2023-09-12 13:56:22 +08:00
Roger Chang
31b95987da arch-riscv: Change the instruction family to jump like
The method that get the vl, vtype from PCState in the next changes

Change-Id: I022b47b7a96572f6434eed30dd9f7caa79854c31
2023-09-12 13:56:22 +08:00
Roger Chang
282765234b arch-riscv: Implement the branchTarget for vset*vl*
Change-Id: I10bf6be736ce2b99323ace410bff1d8e1e2a4123
2023-09-12 13:56:22 +08:00
Roger Chang
a3aaad2ecd arch-riscv: Refactor the execution part of vset*vl*
Change-Id: Ie0d9671242481a85bb0fe5728748b16c3ef62592
2023-09-12 13:56:21 +08:00
Roger Chang
1bde42760f arch-riscv: Get vl, vtype and vlenb from PCState
Change-Id: I0ded57a3dc2db6fcc7121f147bcaf6d8a8873f6a
2023-09-12 13:56:21 +08:00
Roger Chang
8918302239 arch-riscv: Change the implementation of vset*vl*
The changes includes:

1. Add VL, Vtype and VlenbBits operands
2. Change R/W methods of VL, Vtype and VlenbBits from PCState

Change-Id: I0531ddc14344f2cca94d0e750a3b4291e0227d54
2023-09-12 13:56:21 +08:00
Roger Chang
7b5d8b4e5b arch-riscv: Add vlenb, vtype and vl in PCState
Change-Id: I7c2aed7dda34a1a449253671d7b86aa615c28464
2023-09-12 13:56:21 +08:00
Roger Chang
f94658098d arch-riscv: Remove checked_type in StaticInst Constructor
We should not try to check vtype when decoding the instruction.
It should be checked in vset{i}vl{i} since the register can be
modified via vset{i}vl{i}

Change-Id: I403e5c4579bc5b8e6af10f93eac20c14662e4d2d
2023-09-12 13:56:21 +08:00
Roger Chang
3f0475321a arch-riscv: Change VTYPE to BitUnion64
Change-Id: I7620ad1ef3ee0cc045bcd02b3c9a2d83f93bf3fe
2023-09-12 13:56:21 +08:00
Roger Chang
dfc725838e arch-riscv: Refactor PCState class
Change-Id: I1d25350ba2a3c7c366f42340c20b4488c33cde6f
2023-09-12 13:56:21 +08:00
Nicholas Mosier
2b9d558cef cpu-kvm: properly set x86 xsave header on gem5->KVM transition
If the XSAVE KVM capability is available (KVM_CAP_XSAVE), the X86KvmCPU
will try to set the x87 FPU + SSE state using KVM_SET_XSAVE, which
expects a buffer (struct kvm_xsave) in XSAVE area format (Vol. 1,
Sec. 13.4 of Intel x86 SDM). The original implementation of
`X86KvmCPU::updateKvmStateFPUXSave()`, however, improperly sets the
xsave header, which contains a bitmap of state components present
in the xsave area.

This patch defines `XSaveHeader` structure to model the xsave header,
which is expected directly following the legacy FPU region (defined in
the `FXSave` structure) in the xsave area. It then sets two bist in
the xsave header to indicate the presence of x86 FPU and SSE state
components.

GitHub issue: https://github.com/gem5/gem5/issues/296

Change-Id: I5c5c7925fa7f78a7b5e2adc209187deff53ac039
2023-09-10 15:16:50 +00:00
Hoa Nguyen
4ff1f160ec arch-x86: Fix wrong x86 assembly
The RM field of ModRM was printed as Reg field for several instructions.

For reference, this change fixes typos introduced by [1].

[1] https://gem5-review.googlesource.com/c/public/gem5/+/40339

Change-Id: I41eb58e6a70845c4ddd6774ccba81b8069888be5
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
2023-09-01 00:26:51 +00:00
Matthew Poremba
60f071d09a gpu-compute,arch-vega: Implement flat scratch insts
Flat scratch instructions (aka private) are the 3rd and final segment of
flat instructions in gfx9 (Vega) and beyond. These are used for things
like spills/fills and thread local storage. This commit enables two
forms of flat scratch instructions: (1) flat_load/flat_store
instructions where the memory address resolves to private memory and (2)
the new scratch_load/scratch_store instructions in Vega. The first are
similar to older generation ISAs where the aperture is unknown until
address translation. The second are instructions guaranteed to go to
private memory.

Since these are very similar to flat global instructions there are
minimal changes needed:

- Ensure a flat instruction is either regular flat, global, XOR scratch
- Rename the global op_encoding methods to GlobalScratch to indicate
  they are for both and are intentionally used.
- Flat instructions in segment 1 output scratch_ in the disassembly
- Flat instruction executed as private use similar mem helpers as global
- Flat scratch cannot be an atomic

This was tested using a modified version of the 'square' application:

template <typename T>
__global__ void
scratch_square(T *C_d, T *A_d, size_t N)
{
    size_t offset = (blockIdx.x * blockDim.x + threadIdx.x);
    size_t stride = blockDim.x * gridDim.x ;

    volatile int foo; // Volatile ensures scratch / unoptimized code

    for (size_t i=offset; i<N; i+=stride) {
        foo = A_d[i];
        C_d[i] = foo * foo;
    }
}

Change-Id: Icc91a7f67836fa3e759fefe7c1c3f6851528ae7d
2023-08-26 13:40:12 -05:00
Matthew Poremba
90a518e885 gpu-compute,arch-vega: Fix ALU-only LDS counters
There are a few LDS instructions that perform local ALU operations and
writeback which are marked as loads. These are marked as loads because
they fit in the pipeline logic better, according to a several year old
comment. In the VEGA ISA these instructions (swizzle, permute, bpermute)
are not decrementing the LDS load counter. As a result, the counter will
gradually increase over time. Since wavefront slots are persistent, this
can cause applications with a few thousand kernels to eventually hang
thinking there are not enough resources.

This changeset fixes this by decrementing the LDS load counter for these
instructions. This fix was already integrated in the GCN3 ISA in the
exact same way. This changeset moves it near a similar comment about
scheduling register file writes.

Change-Id: Ife5237a2cae7213948c32ef266f4f8f22917351c
2023-08-23 19:30:24 -05:00
Jason Lowe-Power
22c52f4fba Fix reporting traps (faults) to GDB in SE mode (#166)
This addresses #123
2023-08-17 16:08:49 -07:00
Jan Vrany
3564348eec arch-riscv: Report traps to GDB in SE mode
This commit add code to report illegal instruction and breakpoint traps
to GDB (if connected). This merely follows what POWER does.
2023-08-17 15:55:04 +01:00
Jan Vrany
546b3eac7d arch-riscv: Do not advance PC when handling faults in SE mode
On RISC-V when trap occurs the contents of PC register contains the
address of instruction that caused that trap (as opposed to the address
of instruction following it in instruction stream). Therefore this commit
does not advance the PC before reporting trap in SE mode.

Change-Id: I83f3766cff276312cefcf1b4ac6e78a6569846b9
2023-08-17 15:55:04 +01:00
Jan Vrany
fde58a4365 arch-power: Fix reporting traps to GDB
Due to inverted logic in POWER fault handlers, unimplemented opcode and
trap faults did not report trap to GDB (if connected). This commit fixes
the problem.

While at it, I opted to use `if (! ...) { panic(...) }` rather than
`panic_if(...)`. I find it easier to understand in this case.

Change-Id: I6cd5dfd5f6546b8541d685e877afef21540d6824
2023-08-17 15:55:04 +01:00
Roger Chang
fe142f485a arch-riscv: Add missing vector required check for vmem instructions
The mem instructions usually executed from initiateAcc. We also need
to check vector required in those instructions

Change-Id: I97b4fec7fada432abb55ca58050615e12e00d1ca
2023-08-17 09:53:30 +08:00
Roger Chang
35a6fe6f3d arhc-riscv: Check vill in vector mem instructions
Any vector instructions using vtype should check vill flag is set

Change-Id: Ia9a2695f3005a176422da78e6f413cc789116faa
2023-08-17 09:53:30 +08:00
Bobby R. Bruce
3ff6fe0e90 arch-x86,cpu-kvm: Fix gem5.fast due to unused variable (#189)
Detected via this failing workload:
https://github.com/gem5/gem5/actions/runs/5861958237

Ir caused the following compilation error to be thrown:

```
src/arch/x86/kvm/x86_cpu.cc:1462:22: error: unused variable ‘rv’ [-Werror=unused-variable]
 1462 |                 bool rv = isa->cpuid->doCpuid(tc, function, idx, cpuid);
      |                      ^~
```

`rv` is unused in the .fast compilation as it's only used in the
`assert` statement immediately after.

To fix this, the `[[maybe_unused]]` annotation is used.
2023-08-16 12:52:44 -07:00
Bobby R. Bruce
c835c9faa3 arch-x86,cpu-kvm: Fix gem5.fast due to unused variable
Detected via this failing workload:
https://github.com/gem5/gem5/actions/runs/5861958237

It caused the following compilation error to be thrown:

```
src/arch/x86/kvm/x86_cpu.cc:1462:22: error: unused variable ‘rv’ [-Werror=unused-variable]
 1462 |                 bool rv = isa->cpuid->doCpuid(tc, function, idx, cpuid);
      |                      ^~
```

`rv` is unused in the .fast compilation as it's only used in the
`assert` statement immediately after.

 To fix this, the `[[maybe_unused]]` annotation is used

Change-Id: Ib98dd859c62f171c8eeefae93502f92a8f133776
2023-08-16 10:06:39 -07:00
Nicolas Boichat
3ea7a792b0 fastmodel: Add option to retry licence server connection.
We're seeing some occasional connection timeouts in CI, possibly
when we aggressively hit the license server, so let's add a
parameter to retry the connection a few times.

Also, print the time required to connect to the server to help
debug issues.

Change-Id: I804af28f79f893fcdca615d7bf82dd9b8686a74c
2023-08-15 10:47:32 +00:00
Roger Chang
42c2ed6c2d arch-riscv: Add condition for setting misa and mstatus CSR
Change-Id: I7e03b60d0de32fe8169dd79ded485d560aca64aa
2023-08-09 19:32:04 +08:00
Roger Chang
43adc5309a arch-riscv: Add Illegal Instruction Fault Condition for RVV Config
Check the status.vs and misa.rvv CSR registers before executing
RVV instructions

Change-Id: I0355b94ea8ee4018be11a75aab8c19b10cb36126
2023-08-09 19:31:58 +08:00
Roger Chang
85549842c7 arch-riscv: Add Illegal Instruction Fault Condition for Mem RVV
Check the status.vs and misa.rvv CSR registers before executing
RVV instructions

Change-Id: If1f6a440713612b9a044de4f320997e99722c06c
2023-08-09 19:22:32 +08:00
Roger Chang
c18e43a0ab arch-riscv: Add Illegal Instruction Fault Condition for Arith RVV
Check the status.vs and misa.rvv CSR registers before executing
RVV instructions

Change-Id: Idc143e1ba90320254926de9fa7a7b343bb96ba88
2023-08-09 19:20:53 +08:00
zmckevitt
14c25a383c arch-riscv: Implemented zicbom/zicboz extensions for RISC V
Change-Id: I79d0e6059a2dbb5a0057c4f7489b999f9e803684
2023-08-04 10:05:15 +08:00
Harshil Patel
23f5535ef5 Merge branch 'develop' into riscv-fix-style 2023-08-03 13:32:53 -07:00
Harshil Patel
5cfac2cc94 stdlib: Fixed stype issue pcstate.hh
- Changed _rv_type to _rvType.
- Changed rv_type to rvType.

Change-Id: I27bdf342b038f5ebae78b104a29892684265584a
2023-08-03 13:04:17 -07:00
Harshil Patel
51d492487e stdlib: stlye fix rv_type to _rvType in isa.hh and isa.cc
Change-Id: I68e2b1be9150e6528693e68fb73470d158838885
2023-08-02 14:06:30 -07:00
Adrià Armejach
884d62b33a arch-riscv: Make vset*vl* instructions serialize
Current implementation of vset*vl* instructions serialize pipeline and
are non-speculative.

Change-Id: Ibf93b60133fb3340690b126db12827e36e2c202d
2023-08-02 14:46:36 +02:00
Jason Lowe-Power
98d68a7307 arch-riscv: Improve style
Minor style fixes in vector code

Change-Id: If0de45a2dbfb5d5aaa65ed3b5d91d9bee9bcc960
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2023-08-02 14:46:36 +02:00
Jason Lowe-Power
af1b2ec2d5 arch-riscv: Add fatal if RVV used with o3 or minor
Since the O3 and Minor CPU models do not support RVV right now as the
implementation stalls the decode until vsetvl instructions are exectued,
this change calls `fatal` if RVV is not explicitly enabled.

It is possible to override this if you explicitly enable RVV in the
config file.

Change-Id: Ia801911141bb2fb2bedcff3e139bf41ba8936085
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2023-08-02 14:46:36 +02:00
Xuan Hu
a9f9c4d6d3 arch-riscv: Add risc-v vector ext v1.0 arith insts support
TODOs:
  + vcompress.vm

Change-Id: I86eceae66e90380416fd3be2c10ad616512b5eba
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
Co-authored-by: Jerin Joy <joy@rivosinc.com>

arch-riscv: Add LICENCE to template files

Change-Id: I825e72bffb84cce559d2e4c1fc2246c3b05a1243
2023-08-02 14:46:36 +02:00
Xuan Hu
91b1d50f59 arch-riscv: Add risc-v vector ext v1.0 mem insts support
* TODOs:
  + Vector Segment Load/Store
  + Vector Fault-only-first Load

Change-Id: I2815c76404e62babab7e9466e4ea33ea87e66e75
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
Co-authored-by: Jerin Joy <joy@rivosinc.com>
2023-08-02 14:46:35 +02:00
Xuan Hu
e14e066fde arch-riscv: Add risc-v vector ext v1.0 vset insts support
Change-Id: I84363164ca327151101e8a1c3d8441a66338c909
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>

arch-riscv: Add a todo to fix vsetvl stall on decode

Change-Id: Iafb129648fba89009345f0c0ad3710f773379bf6
2023-08-02 14:46:35 +02:00
Xuan Hu
73892c9b47 arch-riscv: Add risc-v vector regs and configs
This commit add regs and configs for vector extension

* Add 32 vector arch regs as spec defined and 8 internal regs for
  uop-based vector implementation.
* Add default vector configs(VLEN = 256, ELEN = 64). These cannot
  be changed yet, since the vector implementation has only be tested
  with such configs.
* Add disassamble register name v0~v31 and vtmp0~vtmp7.
* Add CSR registers defined in RISCV Vector Spec v1.0.
* Add vector bitfields.
* Add vector operand_types and operands.

Change-Id: I7bbab1ee9e0aa804d6f15ef7b77fac22d4f7212a
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
Co-authored-by: Jerin Joy <joy@rivosinc.com>

arch-riscv: enable rvv flags only for RV64

Change-Id: I6586e322dfd562b598f63a18964d17326c14d4cf
2023-08-02 14:46:35 +02:00
Matthew Poremba
3589a4c11f arch-vega: Implement translate further
Starting with ROCm 5.4+, MI100 and MI200 make use of the translate
further bit in the page table. This bit enables mixing 4kiB and 2MiB
pages and is functionally equivalent to mixing page sizes using the
PDE.P bit for which gem5 currently has support.

With PDE.P bit set, we stop walking and the page size is equal to the
level in the page table we stopped at. For example, stopping at level
2 would be a 1GiB page, stopping at level 3 would be a 2MiB page.
This assumes most pages are 4kiB.

When the F bit is used, it is assumed most pages are 2MiB and we will
stop walking at the 3rd level of the page table unless the F bit is set.
When the F bit is set, the 2nd level PDE contains a block fragment size
representing the page size of the next PDE in the form of 2^(12+size).
If the next page has the F bit set we continue walking to the 4th level.
The block fragment size is hardcoded to 9 in the driver therefore we
assert that the block fragment size must be 0 or 9.

This enables MI200 with ROCm 5.4+ in gem5. This functionality was
determine by examining the driver source code in Linux and there is no
public documentation about this feature or why the change is made in or
around ROCm 5.4.

Change-Id: I603c0208cd9e821f7ad6eeb1d94ae15eaa146fb9
2023-07-30 13:17:05 -05:00
Matthew Poremba
618b2a60de arch-vega, dev-amdgpu: Fix for memory leaks (#129)
When using the new operator, delete should be called
on any allocated memory after it's use is complete.

Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019
2023-07-30 10:48:17 -07:00
Matthew Poremba
b35c2ba8c5 arch-vega: Fix vop2Helper scalar support (#142)
A previous change added a vop2Helper to remove 100s of lines of common
code from VOP2 instructions related to processing SDWA and DPP support.
That change inadvertently changed the type of operand source 0 from
const to non-const. The vector container operator[] does not allow
reading a scalar value such as a constant, a dword literal, etc. The
error shows up in the form of: assert(!scalar) in operand.hh.

Since the SDWA and DPP cases need to modify the source vector and
non-SDWA/DPP cases might require const, we make a non-const copy of the
const source 0 vector and place it in a temporary non-const vector. This
non-const vector is passed to the lambda function implementation of the
instruction. This prevents needing a const and non-const version of the
lambda and avoids needing to propagate the template parameters through
the various SDWA/DPP helper methods which seems like it will not work
anyways as they need to modify the vector.

As a result of this, as more VOP2 instructions are implemented using
this helper, they will need to specify the const and non-const template
parameters of the vector container needed for the instruction.

Change-Id: Ia0b3c550d7de32b830040007a110f4821e3385aa
2023-07-30 10:47:36 -07:00
Ranganath (Bujji) Selagamsetty
ede4d89a83 arch-vega, dev-amdgpu: Fix for memory leaks
When using the new operator, delete should be called
on any allocated memory after it's use is complete.

Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019
2023-07-28 19:14:46 -05:00