Since the O3 and Minor CPU models do not support RVV right now as the
implementation stalls the decode until vsetvl instructions are exectued,
this change calls `fatal` if RVV is not explicitly enabled.
It is possible to override this if you explicitly enable RVV in the
config file.
Change-Id: Ia801911141bb2fb2bedcff3e139bf41ba8936085
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
TODOs:
+ vcompress.vm
Change-Id: I86eceae66e90380416fd3be2c10ad616512b5eba
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
Co-authored-by: Jerin Joy <joy@rivosinc.com>
arch-riscv: Add LICENCE to template files
Change-Id: I825e72bffb84cce559d2e4c1fc2246c3b05a1243
* TODOs:
+ Vector Segment Load/Store
+ Vector Fault-only-first Load
Change-Id: I2815c76404e62babab7e9466e4ea33ea87e66e75
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
Co-authored-by: Jerin Joy <joy@rivosinc.com>
Change-Id: I84363164ca327151101e8a1c3d8441a66338c909
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
arch-riscv: Add a todo to fix vsetvl stall on decode
Change-Id: Iafb129648fba89009345f0c0ad3710f773379bf6
This commit add regs and configs for vector extension
* Add 32 vector arch regs as spec defined and 8 internal regs for
uop-based vector implementation.
* Add default vector configs(VLEN = 256, ELEN = 64). These cannot
be changed yet, since the vector implementation has only be tested
with such configs.
* Add disassamble register name v0~v31 and vtmp0~vtmp7.
* Add CSR registers defined in RISCV Vector Spec v1.0.
* Add vector bitfields.
* Add vector operand_types and operands.
Change-Id: I7bbab1ee9e0aa804d6f15ef7b77fac22d4f7212a
Co-authored-by: Yang Liu <numbksco@gmail.com>
Co-authored-by: Fan Yang <1209202421@qq.com>
Co-authored-by: Jerin Joy <joy@rivosinc.com>
arch-riscv: enable rvv flags only for RV64
Change-Id: I6586e322dfd562b598f63a18964d17326c14d4cf
Starting with ROCm 5.4+, MI100 and MI200 make use of the translate
further bit in the page table. This bit enables mixing 4kiB and 2MiB
pages and is functionally equivalent to mixing page sizes using the
PDE.P bit for which gem5 currently has support.
With PDE.P bit set, we stop walking and the page size is equal to the
level in the page table we stopped at. For example, stopping at level
2 would be a 1GiB page, stopping at level 3 would be a 2MiB page.
This assumes most pages are 4kiB.
When the F bit is used, it is assumed most pages are 2MiB and we will
stop walking at the 3rd level of the page table unless the F bit is set.
When the F bit is set, the 2nd level PDE contains a block fragment size
representing the page size of the next PDE in the form of 2^(12+size).
If the next page has the F bit set we continue walking to the 4th level.
The block fragment size is hardcoded to 9 in the driver therefore we
assert that the block fragment size must be 0 or 9.
This enables MI200 with ROCm 5.4+ in gem5. This functionality was
determine by examining the driver source code in Linux and there is no
public documentation about this feature or why the change is made in or
around ROCm 5.4.
Change-Id: I603c0208cd9e821f7ad6eeb1d94ae15eaa146fb9
When using the new operator, delete should be called
on any allocated memory after it's use is complete.
Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019
A previous change added a vop2Helper to remove 100s of lines of common
code from VOP2 instructions related to processing SDWA and DPP support.
That change inadvertently changed the type of operand source 0 from
const to non-const. The vector container operator[] does not allow
reading a scalar value such as a constant, a dword literal, etc. The
error shows up in the form of: assert(!scalar) in operand.hh.
Since the SDWA and DPP cases need to modify the source vector and
non-SDWA/DPP cases might require const, we make a non-const copy of the
const source 0 vector and place it in a temporary non-const vector. This
non-const vector is passed to the lambda function implementation of the
instruction. This prevents needing a const and non-const version of the
lambda and avoids needing to propagate the template parameters through
the various SDWA/DPP helper methods which seems like it will not work
anyways as they need to modify the vector.
As a result of this, as more VOP2 instructions are implemented using
this helper, they will need to specify the const and non-const template
parameters of the vector container needed for the instruction.
Change-Id: Ia0b3c550d7de32b830040007a110f4821e3385aa
When using the new operator, delete should be called
on any allocated memory after it's use is complete.
Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019
A previous change added a vop2Helper to remove 100s of lines of common
code from VOP2 instructions related to processing SDWA and DPP support.
That change inadvertently changed the type of operand source 0 from
const to non-const. The vector container operator[] does not allow
reading a scalar value such as a constant, a dword literal, etc. The
error shows up in the form of: assert(!scalar) in operand.hh.
Since the SDWA and DPP cases need to modify the source vector and
non-SDWA/DPP cases might require const, we make a non-const copy of the
const source 0 vector and place it in a tempoary non-const vector. This
non-const vector is passed to the lambda function implementation of the
instruction. This prevents needing a const and non-const version of the
lambda and avoids needing to propagate the template parameters through
the various SDWA/DPP helper methods which seems like it will not work
anyways as they need to modify the vector.
As a result of this, as more VOP2 instructions are implemented using
this helper,they will need to specify the const and non-const template
parameters of the vector container needed for the instruction.
Change-Id: Ia0b3c550d7de32b830040007a110f4821e3385aa
The extended state CPUID function is used to set the values of the XCR0
register as well as specify the size of storage for context switching
storage for x87 and AVX+. This function is iterative and therefore
requires (1) marking it as such in the hsaSignificantIndex function (2)
setting multiple sets of 4-tuples for the default CPUID values where the
last 4-tuple ends with all zeros.
Change-Id: Ib6a43925afb1cae75f61d8acff52a3cc26ce17c8
Related to the recent changes with moving CPUID values to python, this
value is needed to enable AVX and needs a way to be exposed to python as
well in order to set the bit and the corresponding CPUID values at the
same time.
Change-Id: I3cadb0fe61ff4ebf6de903018a8d8a411bfdb4e0
Various CPUID functions will return different values depending on the
value of ECX when executing the CPUID instruction. Add support for this
in the X86 KVM CPU. A subsequent patch will add a CPUID function which
requires iterating through multiple ECX values.
Change-Id: Ib44a52be52ea632d5e2cee3fb2ca390b60a7202a
CPUID values for X86 are currently hard-coded in the C++ source file.
This makes it difficult to configure the bits if needed. Move these to
python instead. This will provide a few benefits:
1. We can enable features for certain configurations, for example AVX
can be enabled when the KVM CPU is used, but otherwise should not be
enabled as gem5 does not have full AVX support.
2. We can more accurately communicate things like cache/TLB sizes based
on the actual gem5 configuration. The CPUID values are can be used by
some libraries, e.g., MPI, to query system topology.
3. Enabling some bits breaks things in certain configurations and this
can be prevented by configuring in python. For example, enabling AVX
seems to currently be breaking SMP, meaning gem5 can only boot one CPU
in that configuration.
Change-Id: Ib3866f39c86d61374b9451e60b119a3155575884
FEAT_TLBIOS has been introduced by a recent patch [1] which
was however missing to include the outer shareable case in the
Msr disambiguation switch. Which meant the TLBIOS instructions
were decoded as normal MSR instructions, with no effect whatsoever
on the TLBs
[1]: https://gem5-review.googlesource.com/c/public/gem5/+/70567
Change-Id: I41665a4634fbe0ee8cc30dbc5d88d63103082ae9
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
When shiftAmt is 0 for a UQRSHL instruction, the code called bits() with
incorrect arguments. This fixes a left-shift of 0 to be a NOP/mov, as
required.
Change-Id: Ic86ca40ac42bfb767a09e8c65a53cec56382a008
Co-authored-by: Marton Erdos <marton.erdos@arm.com>
* gpu-compute: Remove use of 'std::random_shuffle'
This was deprecated in C++14 and removed in C++17. This has been
replaced with std::random. This has been implemented to ensure
reproducible results despite (pseudo)random behavior.
Change-Id: Idd52bc997547c7f8c1be88f6130adff8a37b4116
* dev-amdgpu: Add missing 'overrides'
This causes warnings/errors in some compilers.
Change-Id: I36a3548943c030d2578c2f581c8985c12eaeb0ae
* dev: Fix Linux specific includes to be portable
This allows for compilation in non-linux systems (e.g., Mac OS).
Change-Id: Ib6c9406baf42db8caaad335ebc670c1905584ea2
* tests: Add 'VEGA_X86' build target to compiler-tests.sh
Change-Id: Icbf1d60a096b1791a4718a7edf17466f854b6ae5
* tests: Add 'GCN3_X86' build target to compiler-tests.sh
Change-Id: Ie7c9c20bb090f8688e48c8619667312196a7c123
Vega adds multiple new D16 instructions which load a byte or short into
the lower or upper 16 bits of a register for packed math. The decoder
table has subDecode tables for FLAT instructions which represents 32
opcodes in each subDecode table. The subDecode table for opcodes 32-63
is missing so it is added here.
The opcode for V_SWAP_B32 is also off by one- In the ISA manual this
instruction is opcode 81, the instruction before is 79, and there is no
opcode 80, so the decoder entry is swapped with the invalid decoding
below it.
Change-Id: I278fea574ea684ccc6302d5b4d0f5dd8813a88ad
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71899
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
* According to the manual, load reservations must be cleared on a
failed or a successful SC attempt.
* A load reservation can be arbitrarily large. The current
implementation was reserving something different than cacheBlockSize
which could lead to problems if snoop addresses are cache block
aligned. This patch implementation assumes a cacheBlock granularity.
* Load reservations should also be cleared on faults
Change-Id: I64513534710b5f269260fcb204f717801913e2f5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71520
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
* According to the manual, load reservations must be cleared on a
failed or a successful SC attempt.
* A load reservation can be arbitrarily large. The current
implementation was reserving something different than cacheBlockSize
which could lead to problems if snoop addresses are cache block
aligned. This patch implementation assumes a cacheBlock granularity.
* Load reservations should also be cleared on faults
Change-Id: I64513534710b5f269260fcb204f717801913e2f5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71558
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Roger Chang <rogerycchang@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Many of the outstanding issues with the GPU model are related to
instructions not having SDWA/DPP implementations and executing by
ignoring the special registers leading to incorrect executiong.
Adding SDWA/DPP is current very cumbersome as there is a lot of
boilerplate code.
This changeset adds helper methods for VOP2 with one instruction
changed as an example. This review is intended to get feedback
before applying this change to all VOP2 instructions that support
SDWA/DPP.
Change-Id: I1edbc3f3bb166d34f151545aa9f47a94150e1406
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70738
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The uint_fast16_t is the integer at least 16 bits size, it can be
32, 64 bits and more. Usually most of the simulations are in the
x86-64 linux host, the size of uint_fast16_t is 64 bits. Therefore,
there is no problem for double precision float operations and it can
pass FloatMM test. However, in the Mac OS, the size of uint_fast16_t
is 16 bits, it will lose the upper bits when converting float
register bits to freg_t and it will generate unexpected results for
FloatMM test.
The change can guarantee that the size of data in freg_t is at least
64 bits and it will not lose any data from floating point to freg_t.
Reference:
https://developer.apple.com/documentation/kernel/uint_fast16_thttps://codebrowser.dev/glibc/glibc/stdlib/stdint.h.html
Change-Id: I3df6610f0903cdee0f56584d6cbdb51ac26c86c8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71519
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
The uint_fast16_t is the integer at least 16 bits size, it can be
32, 64 bits and more. Usually most of the simulations are in the
x86-64 linux host, the size of uint_fast16_t is 64 bits. Therefore,
there is no problem for double precision float operations and it can
pass FloatMM test. However, in the Mac OS, the size of uint_fast16_t
is 16 bits, it will lose the upper bits when converting float
register bits to freg_t and it will generate unexpected results for
FloatMM test.
The change can guarantee that the size of data in freg_t is at least
64 bits and it will not lose any data from floating point to freg_t.
Reference:
https://developer.apple.com/documentation/kernel/uint_fast16_thttps://codebrowser.dev/glibc/glibc/stdlib/stdint.h.html
Change-Id: I3df6610f0903cdee0f56584d6cbdb51ac26c86c8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71578
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently fmax and fmin instructions convert source float registers such as
Fs1_bits to float64_t(or float32_t and float16_t) many times in the single
instruction. It is not efficient for the future maintenance of these
instructions.
The change adds non-register float_t intermediate variables fs1 and fs2 to
keep converted results so that we don’t need to do it repeatedly. It also
added an intermediate variable fd for specific float type to assume the upper
bits of the packed float register are all one.
Change-Id: Ic508d5255db6c4b38ca4df6dd805df440c043fff
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71479
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
We have some customized protocols in gem5 repository and they require
the include path from src directory. It causes the users of those
protocols need to handle the include path correctly by theirselve. This
is tedious and unstable. We should add the default include path in
SIMGEN command line to prevent issues.
Change-Id: I2a3748646567635d131a8fb4099e02e332691e97
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71118
Reviewed-by: Wei-Han Chen <weihanchen@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>