Commit Graph

5608 Commits

Author SHA1 Message Date
zmckevitt
14c25a383c arch-riscv: Implemented zicbom/zicboz extensions for RISC V
Change-Id: I79d0e6059a2dbb5a0057c4f7489b999f9e803684
2023-08-04 10:05:15 +08:00
Harshil Patel
23f5535ef5 Merge branch 'develop' into riscv-fix-style 2023-08-03 13:32:53 -07:00
Harshil Patel
5cfac2cc94 stdlib: Fixed stype issue pcstate.hh
- Changed _rv_type to _rvType.
- Changed rv_type to rvType.

Change-Id: I27bdf342b038f5ebae78b104a29892684265584a
2023-08-03 13:04:17 -07:00
Harshil Patel
51d492487e stdlib: stlye fix rv_type to _rvType in isa.hh and isa.cc
Change-Id: I68e2b1be9150e6528693e68fb73470d158838885
2023-08-02 14:06:30 -07:00
Adrià Armejach
884d62b33a arch-riscv: Make vset*vl* instructions serialize
Current implementation of vset*vl* instructions serialize pipeline and
are non-speculative.

Change-Id: Ibf93b60133fb3340690b126db12827e36e2c202d
2023-08-02 14:46:36 +02:00
Jason Lowe-Power
98d68a7307 arch-riscv: Improve style
Minor style fixes in vector code

Change-Id: If0de45a2dbfb5d5aaa65ed3b5d91d9bee9bcc960
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2023-08-02 14:46:36 +02:00
Jason Lowe-Power
af1b2ec2d5 arch-riscv: Add fatal if RVV used with o3 or minor
Since the O3 and Minor CPU models do not support RVV right now as the
implementation stalls the decode until vsetvl instructions are exectued,
this change calls `fatal` if RVV is not explicitly enabled.

It is possible to override this if you explicitly enable RVV in the
config file.

Change-Id: Ia801911141bb2fb2bedcff3e139bf41ba8936085
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2023-08-02 14:46:36 +02:00
Xuan Hu
a9f9c4d6d3 arch-riscv: Add risc-v vector ext v1.0 arith insts support
TODOs:
  + vcompress.vm

Change-Id: I86eceae66e90380416fd3be2c10ad616512b5eba
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
Co-authored-by: Jerin Joy <joy@rivosinc.com>

arch-riscv: Add LICENCE to template files

Change-Id: I825e72bffb84cce559d2e4c1fc2246c3b05a1243
2023-08-02 14:46:36 +02:00
Xuan Hu
91b1d50f59 arch-riscv: Add risc-v vector ext v1.0 mem insts support
* TODOs:
  + Vector Segment Load/Store
  + Vector Fault-only-first Load

Change-Id: I2815c76404e62babab7e9466e4ea33ea87e66e75
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
Co-authored-by: Jerin Joy <joy@rivosinc.com>
2023-08-02 14:46:35 +02:00
Xuan Hu
e14e066fde arch-riscv: Add risc-v vector ext v1.0 vset insts support
Change-Id: I84363164ca327151101e8a1c3d8441a66338c909
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>

arch-riscv: Add a todo to fix vsetvl stall on decode

Change-Id: Iafb129648fba89009345f0c0ad3710f773379bf6
2023-08-02 14:46:35 +02:00
Xuan Hu
73892c9b47 arch-riscv: Add risc-v vector regs and configs
This commit add regs and configs for vector extension

* Add 32 vector arch regs as spec defined and 8 internal regs for
  uop-based vector implementation.
* Add default vector configs(VLEN = 256, ELEN = 64). These cannot
  be changed yet, since the vector implementation has only be tested
  with such configs.
* Add disassamble register name v0~v31 and vtmp0~vtmp7.
* Add CSR registers defined in RISCV Vector Spec v1.0.
* Add vector bitfields.
* Add vector operand_types and operands.

Change-Id: I7bbab1ee9e0aa804d6f15ef7b77fac22d4f7212a
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
Co-authored-by: Jerin Joy <joy@rivosinc.com>

arch-riscv: enable rvv flags only for RV64

Change-Id: I6586e322dfd562b598f63a18964d17326c14d4cf
2023-08-02 14:46:35 +02:00
Matthew Poremba
3589a4c11f arch-vega: Implement translate further
Starting with ROCm 5.4+, MI100 and MI200 make use of the translate
further bit in the page table. This bit enables mixing 4kiB and 2MiB
pages and is functionally equivalent to mixing page sizes using the
PDE.P bit for which gem5 currently has support.

With PDE.P bit set, we stop walking and the page size is equal to the
level in the page table we stopped at. For example, stopping at level
2 would be a 1GiB page, stopping at level 3 would be a 2MiB page.
This assumes most pages are 4kiB.

When the F bit is used, it is assumed most pages are 2MiB and we will
stop walking at the 3rd level of the page table unless the F bit is set.
When the F bit is set, the 2nd level PDE contains a block fragment size
representing the page size of the next PDE in the form of 2^(12+size).
If the next page has the F bit set we continue walking to the 4th level.
The block fragment size is hardcoded to 9 in the driver therefore we
assert that the block fragment size must be 0 or 9.

This enables MI200 with ROCm 5.4+ in gem5. This functionality was
determine by examining the driver source code in Linux and there is no
public documentation about this feature or why the change is made in or
around ROCm 5.4.

Change-Id: I603c0208cd9e821f7ad6eeb1d94ae15eaa146fb9
2023-07-30 13:17:05 -05:00
Matthew Poremba
618b2a60de arch-vega, dev-amdgpu: Fix for memory leaks (#129)
When using the new operator, delete should be called
on any allocated memory after it's use is complete.

Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019
2023-07-30 10:48:17 -07:00
Matthew Poremba
b35c2ba8c5 arch-vega: Fix vop2Helper scalar support (#142)
A previous change added a vop2Helper to remove 100s of lines of common
code from VOP2 instructions related to processing SDWA and DPP support.
That change inadvertently changed the type of operand source 0 from
const to non-const. The vector container operator[] does not allow
reading a scalar value such as a constant, a dword literal, etc. The
error shows up in the form of: assert(!scalar) in operand.hh.

Since the SDWA and DPP cases need to modify the source vector and
non-SDWA/DPP cases might require const, we make a non-const copy of the
const source 0 vector and place it in a temporary non-const vector. This
non-const vector is passed to the lambda function implementation of the
instruction. This prevents needing a const and non-const version of the
lambda and avoids needing to propagate the template parameters through
the various SDWA/DPP helper methods which seems like it will not work
anyways as they need to modify the vector.

As a result of this, as more VOP2 instructions are implemented using
this helper, they will need to specify the const and non-const template
parameters of the vector container needed for the instruction.

Change-Id: Ia0b3c550d7de32b830040007a110f4821e3385aa
2023-07-30 10:47:36 -07:00
Ranganath (Bujji) Selagamsetty
ede4d89a83 arch-vega, dev-amdgpu: Fix for memory leaks
When using the new operator, delete should be called
on any allocated memory after it's use is complete.

Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019
2023-07-28 19:14:46 -05:00
Matthew Poremba
c722b0c73d arch-vega: Fix vop2Helper scalar support
A previous change added a vop2Helper to remove 100s of lines of common
code from VOP2 instructions related to processing SDWA and DPP support.
That change inadvertently changed the type of operand source 0 from
const to non-const. The vector container operator[] does not allow
reading a scalar value such as a constant, a dword literal, etc. The
error shows up in the form of: assert(!scalar) in operand.hh.

Since the SDWA and DPP cases need to modify the source vector and
non-SDWA/DPP cases might require const, we make a non-const copy of the
const source 0 vector and place it in a tempoary non-const vector. This
non-const vector is passed to the lambda function implementation of the
instruction. This prevents needing a const and non-const version of the
lambda and avoids needing to propagate the template parameters through
the various SDWA/DPP helper methods which seems like it will not work
anyways as they need to modify the vector.

As a result of this, as more VOP2 instructions are implemented using
this helper,they will need to specify the const and non-const template
parameters of the vector container needed for the instruction.

Change-Id: Ia0b3c550d7de32b830040007a110f4821e3385aa
2023-07-28 13:47:55 -05:00
Matthew Poremba
7c3c2b05f3 arch-x86: Add extended state CPUID function
The extended state CPUID function is used to set the values of the XCR0
register as well as specify the size of storage for context switching
storage for x87 and AVX+. This function is iterative and therefore
requires (1) marking it as such in the hsaSignificantIndex function (2)
setting multiple sets of 4-tuples for the default CPUID values where the
last 4-tuple ends with all zeros.

Change-Id: Ib6a43925afb1cae75f61d8acff52a3cc26ce17c8
2023-07-28 11:34:04 -05:00
Matthew Poremba
3584c3126c arch-x86: Expose CR4.osxsave bit
Related to the recent changes with moving CPUID values to python, this
value is needed to enable AVX and needs a way to be exposed to python as
well in order to set the bit and the corresponding CPUID values at the
same time.

Change-Id: I3cadb0fe61ff4ebf6de903018a8d8a411bfdb4e0
2023-07-28 11:34:04 -05:00
Matthew Poremba
3946f7ba2c arch-x86: Support CPUID functions with indexes
Various CPUID functions will return different values depending on the
value of ECX when executing the CPUID instruction. Add support for this
in the X86 KVM CPU. A subsequent patch will add a CPUID function which
requires iterating through multiple ECX values.

Change-Id: Ib44a52be52ea632d5e2cee3fb2ca390b60a7202a
2023-07-28 11:34:04 -05:00
Matthew Poremba
63d98018ea arch-x86: Move CPUID values to python
CPUID values for X86 are currently hard-coded in the C++ source file.
This makes it difficult to configure the bits if needed. Move these to
python instead. This will provide a few benefits:

1. We can enable features for certain configurations, for example AVX
can be enabled when the KVM CPU is used, but otherwise should not be
enabled as gem5 does not have full AVX support.
2. We can more accurately communicate things like cache/TLB sizes based
on the actual gem5 configuration. The CPUID values are can be used by
some libraries, e.g., MPI, to query system topology.
3. Enabling some bits breaks things in certain configurations and this
can be prevented by configuring in python. For example, enabling AVX
seems to currently be breaking SMP, meaning gem5 can only boot one CPU
in that configuration.

Change-Id: Ib3866f39c86d61374b9451e60b119a3155575884
2023-07-28 11:34:04 -05:00
Giacomo Travaglini
7dba30209a arch-arm: Hook TLBIOS instructions to the TlbiShareable obj
FEAT_TLBIOS has been introduced by a recent patch [1] which
was however missing to include the outer shareable case in the
Msr disambiguation switch. Which meant the TLBIOS instructions
were decoded as normal MSR instructions, with no effect whatsoever
on the TLBs

[1]: https://gem5-review.googlesource.com/c/public/gem5/+/70567

Change-Id: I41665a4634fbe0ee8cc30dbc5d88d63103082ae9
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-07-24 09:05:01 +01:00
rogerchang23424
5d2edca1e3 arch-riscv: Set default check alignment True (#98)
Raise misaligned trap if effective address if not aligned by default

Change-Id: I634aa7ddbf5282fc583316fc77ab1e37bfe415e3
2023-07-19 11:14:27 -07:00
rogerchang23424
52d9259396 arch-riscv: Fix clearLoadReservation merge (#81)
The previous change
(https://gem5-review.googlesource.com/c/public/gem5/+/71818) makes
the clearLoadReservation be RISC-V only.

Change-Id: I5df1a7fa688489d57fff8da937e3c8addfe4c299
2023-07-14 08:48:43 -07:00
Giacomo Travaglini
18470b4747 arch-arm: Fix assert fail when UQRSHL shiftAmt==0 (#75)
When shiftAmt is 0 for a UQRSHL instruction, the code called bits() with
incorrect arguments. This fixes a left-shift of 0 to be a NOP/mov, as
required.

Change-Id: Ic86ca40ac42bfb767a09e8c65a53cec56382a008

Co-authored-by: Marton Erdos <marton.erdos@arm.com>
2023-07-13 10:57:51 -07:00
Bobby R. Bruce
753933d471 gpu-compute, tests: Fix GPU_X86 compilation, add compiler tests (#64)
* gpu-compute: Remove use of 'std::random_shuffle'

This was deprecated in C++14 and removed in C++17. This has been
replaced with std::random. This has been implemented to ensure
reproducible results despite (pseudo)random behavior.

Change-Id: Idd52bc997547c7f8c1be88f6130adff8a37b4116

* dev-amdgpu: Add missing 'overrides'

This causes warnings/errors in some compilers.

Change-Id: I36a3548943c030d2578c2f581c8985c12eaeb0ae

* dev: Fix Linux specific includes to be portable

This allows for compilation in non-linux systems (e.g., Mac OS).

Change-Id: Ib6c9406baf42db8caaad335ebc670c1905584ea2

* tests: Add 'VEGA_X86' build target to compiler-tests.sh

Change-Id: Icbf1d60a096b1791a4718a7edf17466f854b6ae5

* tests: Add 'GCN3_X86' build target to compiler-tests.sh

Change-Id: Ie7c9c20bb090f8688e48c8619667312196a7c123
2023-07-11 14:35:03 -07:00
Yang Liu
35763bdfb2 arch: Add setRegOperand in VecRegOperand
VecRegOperand also need setRegOperand method to write back execution
result.

Change-Id: Ie50606014827c14a7219558dd003eb4747231649
Co-authored-by: Xuan Hu <huxuan@bosc.ac.cn>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67292
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-07-10 22:59:12 +00:00
Bobby R. Bruce
63bdde4f63 arch-riscv: Remove clearLoadReservation
This was added due to a bad merge from stable to develop.

Change-Id: I7adf9604ee4d6f1cf11c404af5e8e1c071461a4a
2023-07-10 15:28:41 -07:00
Bobby R. Bruce
54501c3e2b misc: Merge branch 'stable' into 'develop'
This ensures all commits in v23.0 are now in the develop branch.

Change-Id: I791346115dd123f3541a3c8060482e00cf4dbfb5
2023-07-10 12:24:27 -07:00
Adrià Armejach
fe7b18c2d7 arch-riscv: Make virtual method RISC-V private
* Prior commit defined a shared virtual method that is only used in
  RISC-V. This patch makes the method only visible to the RISC-V ISA.

Change-Id: Ie31e1e1e5933d7c3b9f5af0c20822d3a6a382eee
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71818
Reviewed-by: Roger Chang <rogerycchang@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-06-30 07:39:40 +00:00
Matthew Poremba
841e6fe978 arch-vega: Add Vega D16 decodings and fix V_SWAP_B32
Vega adds multiple new D16 instructions which load a byte or short into
the lower or upper 16 bits of a register for packed math. The decoder
table has subDecode tables for FLAT instructions which represents 32
opcodes in each subDecode table. The subDecode table for opcodes 32-63
is missing so it is added here.

The opcode for V_SWAP_B32 is also off by one- In the ISA manual this
instruction is opcode 81, the instruction before is 79, and there is no
opcode 80, so the decoder entry is swapped with the invalid decoding
below it.

Change-Id: I278fea574ea684ccc6302d5b4d0f5dd8813a88ad
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71899
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-06-29 19:56:56 +00:00
Adrià Armejach
d54b8f8475 arch-riscv: fix load reserved store conditional
* According to the manual, load reservations must be cleared on a
    failed or a successful SC attempt.
  * A load reservation can be arbitrarily large. The current
    implementation was reserving something different than cacheBlockSize
    which could lead to problems if snoop addresses are cache block
    aligned. This patch implementation assumes a cacheBlock granularity.
  * Load reservations should also be cleared on faults

Change-Id: I64513534710b5f269260fcb204f717801913e2f5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71520
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
2023-06-16 16:44:49 +00:00
Adrià Armejach
7398e1e401 arch-riscv: fix load reserved store conditional
* According to the manual, load reservations must be cleared on a
    failed or a successful SC attempt.
  * A load reservation can be arbitrarily large. The current
    implementation was reserving something different than cacheBlockSize
    which could lead to problems if snoop addresses are cache block
    aligned. This patch implementation assumes a cacheBlock granularity.
  * Load reservations should also be cleared on faults

Change-Id: I64513534710b5f269260fcb204f717801913e2f5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71558
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Roger Chang <rogerycchang@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-06-16 06:49:25 +00:00
Matthew Poremba
db903f4fd4 arch-vega: Helper methods for SDWA/DPP for VOP2
Many of the outstanding issues with the GPU model are related to
instructions not having SDWA/DPP implementations and executing by
ignoring the special registers leading to incorrect executiong.
Adding SDWA/DPP is current very cumbersome as there is a lot of
boilerplate code.

This changeset adds helper methods for VOP2 with one instruction
changed as an example. This review is intended to get feedback
before applying this change to all VOP2 instructions that support
SDWA/DPP.

Change-Id: I1edbc3f3bb166d34f151545aa9f47a94150e1406
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70738
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-06-15 23:02:39 +00:00
Roger Chang
328aaa626f arch-riscv: Fix unexpected behavior of float operations in Mac OS
The uint_fast16_t is the integer at least 16 bits size, it can be
32, 64 bits and more. Usually most of the simulations are in the
x86-64 linux host, the size of uint_fast16_t is 64 bits. Therefore,
there is no problem for double precision float operations and it can
pass FloatMM test. However, in the Mac OS, the size of uint_fast16_t
is 16 bits, it will lose the upper bits when converting float
register bits to freg_t and it will generate unexpected results for
FloatMM test.

The change can guarantee that the size of data in freg_t is at least
64 bits and it will not lose any data from floating point to freg_t.

Reference:
https://developer.apple.com/documentation/kernel/uint_fast16_t

https://codebrowser.dev/glibc/glibc/stdlib/stdint.h.html

Change-Id: I3df6610f0903cdee0f56584d6cbdb51ac26c86c8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71519
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
2023-06-15 20:10:04 +00:00
Yu-hsin Wang
694673f1d7 arch: set multiline re as default in isa_parser
In python3.11, it requires the global specifier should be the first
token of regex. However it's not possible when using ply library.
Instead, we set the rules are multiline regex by default and modifies
those single line rules.

Ref: https://github.com/dabeaz/ply/issues/282

Change-Id: I7bdbfeb97a9dd74f45c1890a76f8cc16100e5a42
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71019
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
2023-06-15 10:04:00 +00:00
Yu-hsin Wang
23a88d0400 fastmodel: only support single line literal when paring project file
In python3.11, it requires the global specifier should be the first
token of regex. However it's not possible when using ply library. In
fastmodel case, we actually don't need to support multiline string
literal. We fix this issue by just making the string literal single
line.

Ref: https://github.com/dabeaz/ply/issues/282

Change-Id: I746b628db7ad4c1d7834f1a1b2c1243cef68aa01
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71018
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
2023-06-15 10:03:47 +00:00
Roger Chang
9a27322c7b arch-riscv: Fix unexpected behavior of float operations in Mac OS
The uint_fast16_t is the integer at least 16 bits size, it can be
32, 64 bits and more. Usually most of the simulations are in the
x86-64 linux host, the size of uint_fast16_t is 64 bits. Therefore,
there is no problem for double precision float operations and it can
pass FloatMM test. However, in the Mac OS, the size of uint_fast16_t
is 16 bits, it will lose the upper bits when converting float
register bits to freg_t and it will generate unexpected results for
FloatMM test.

The change can guarantee that the size of data in freg_t is at least
64 bits and it will not lose any data from floating point to freg_t.

Reference:
https://developer.apple.com/documentation/kernel/uint_fast16_t

https://codebrowser.dev/glibc/glibc/stdlib/stdint.h.html

Change-Id: I3df6610f0903cdee0f56584d6cbdb51ac26c86c8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71578
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-06-13 23:20:47 +00:00
Roger Chang
0fef2300c0 arch-riscv: Refactor fmax and fmin instructions
Currently fmax and fmin instructions convert source float registers such as
Fs1_bits to float64_t(or float32_t and float16_t) many times in the single
instruction. It is not efficient for the future maintenance of these
instructions.

The change adds non-register float_t intermediate variables fs1 and fs2 to
keep converted results so that we don’t need to do it repeatedly. It also
added an intermediate variable fd for specific float type to assume the upper
bits of the packed float register are all one.

Change-Id: Ic508d5255db6c4b38ca4df6dd805df440c043fff
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71479
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-06-13 00:09:34 +00:00
Giacomo Travaglini
4434d48973 arch-arm: Apply FEAT_IDST to missing ID registers
When FEAT_IDST got implemented [1], we forgot to add the
logic for AArch64 ID registers tracking AArch32 state/capabilities

[1]: https://gem5-review.googlesource.com/c/public/gem5/+/70723

Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Change-Id: I19bddf67ecc379a14f91cfede385692536982101
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71178
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
2023-06-07 07:38:08 +00:00
Roger Chang
5e5e81d1c5 arch-riscv: Check FPU status for c.flwsp c.fldsp c.fswsp c.fsdsp
The change adds the missing FPU checking for these instructions.

Change-Id: I7f2ef89786af0d528f2029f1097cfeac6c7d65f2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71198
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
2023-06-03 00:58:25 +00:00
Yu-hsin Wang
bd1d72f61e fastmodel: add src include path by default
We have some customized protocols in gem5 repository and they require
the include path from src directory. It causes the users of those
protocols need to handle the include path correctly by theirselve. This
is tedious and unstable. We should add the default include path in
SIMGEN command line to prevent issues.

Change-Id: I2a3748646567635d131a8fb4099e02e332691e97
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71118
Reviewed-by: Wei-Han Chen <weihanchen@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
2023-05-31 23:47:30 +00:00
Giacomo Travaglini
5095e29c8e arch-arm: Implement FEAT_HCX
This is just making the HCRX_EL2 register read/writable;
trapping behaviour will be implemented with further extensions

Change-Id: Id1ec42a754b7d999782edde3a8ec6c6099e3331e
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70939
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-05-25 21:36:58 +00:00
Giacomo Travaglini
0fae6e8163 arch-arm: Implement FEAT_EVT
This extension is optional in Armv8.2 but mandatory since Armv8.5
We only implement this for AArch64

Change-Id: I063642ac24d27f0a81ba79b1d38f72468bb130eb
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70938
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
2023-05-25 21:36:58 +00:00
Richard Cooper
9de1443ebb arch-arm: Add support for Armv8.2-I8MM NEON extension.
Add support for the Armv8.2-I8MM NEON extension. This provides the
SUDOT and USDOT mixed-sign SIMD Dot Product instructions, as well as
the SMMLA, UMMLA, and USMMLA SIMD Matrix Multiply-Accumulate
instructions.

For more information please refer to the Arm Architecture Reference
Manual (https://developer.arm.com/documentation/ddi0487/latest/).

Additional Contributors: Giacomo Travaglini

Change-Id: I6fb9318f67cc9d2737079283e1a095630c4d2ad9
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70737
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-05-25 21:36:39 +00:00
Richard Cooper
eb4f83b178 arch-arm: Add support for Armv8.2-DotProd NEON extension.
Add support for the Armv8.2-DotProd NEON extension. This provides the
SDOT and UDOT SIMD Dot Product instructions.

For more information please refer to the Arm Architecture Reference
Manual (https://developer.arm.com/documentation/ddi0487/latest/).

Change-Id: I4caa3b97a74c65f32421487c55c3e36427194e61
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70736
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-05-25 21:36:39 +00:00
Richard Cooper
fab3d8a1c1 arch-arm: Fix too long lines in existing Arm NEON instructons.
These lines break the current gem5 coding guidelines.

Change-Id: I587fcb2d75c4ab9de47fa53b4ae96526a20afe3f
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70735
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-05-25 21:36:39 +00:00
Richard Cooper
d02ea0dfbb arch-arm, cpu, configs: Add new Op Classes for Matrix Multiply insts
Add SimdMatMultAcc and SimdFloatMatMultAcc Op Classes for the SVE
Matrix Multiply Accumulate instructions in the SVE F32MM, F64MM and
I8MM extensions.

Initial latencies have been set to be the same as SimdMultAcc and
SimdFloatMultAcc respectively.

Change-Id: Ifab63a0efbb0ccfbd272245e0b0b055279f66e3a
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70734
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2023-05-25 21:36:39 +00:00
Richard Cooper
560df49c28 arch-arm: Declare support for Armv8.2-I8MM.
Sets the appropriate bit in the ID_AA64ZFR0_EL1 sysreg that declares
support for ARMv8.2-I8MM.

This indicates that all pre-requisites for Armv8.2 SVE Int8 matrix
multiplication instructions have been met.

SMMLA, SUDOT, UMMLA, USMMLA, and USDOT instructions are implemented.

For more information please refer to the "ARM Architecture Reference
Manual Supplement - The Scalable Vector Extension (SVE), for ARMv8-A"
(https://developer.arm.com/architectures/cpu-architecture/a-profile/
docs/arm-architecture-reference-manual-supplement-armv8-a)

Additional Contributors: Giacomo Travaglini

Change-Id: Id97e1c5de8c23a25336a6b323034e9eca8e598e4
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70733
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
2023-05-25 21:36:39 +00:00
Richard Cooper
f8b60b7a1d arch-arm: Added Armv8.2-I8MM SVE mixed-sign dot product instrs.
Add support for the SVE mixed sign dot product instructions (USDOT,
SUDOT) required by the Armv8.2 SVE Int8 matrix multiplication
extension (ARMv8.2-I8MM).

For more information please refer to the "ARM Architecture Reference
Manual Supplement - The Scalable Vector Extension (SVE), for ARMv8-A"
(https://developer.arm.com/architectures/cpu-architecture/a-profile/
docs/arm-architecture-reference-manual-supplement-armv8-a)

Change-Id: I83841654cee74b940f967b3a37b99d87c01bd92c
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70732
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-05-25 21:36:39 +00:00
Richard Cooper
9421a46d71 arch-arm: Re-factor Arm decoder for SVE mixed-sign DOT insts.
Re-factored the Arm instruction decoder to add placeholders for the
SVE Integer mixed-sign DOT product instructions. This has involved
moving some existing decode helper functions.

Change-Id: I42b280d4bd1b4ab9d8c633bdc523bd08c281d218
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70731
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-05-25 21:36:39 +00:00