The .mailmap file is designed to maintain a record of unique
contributors, aiming for a single identifier for each person. What is
included in this file does not impact or alter commits; rather, it just
merges the counts for all commits by one person under a single name.
Related to issue #703 , this PR removes GCN3 related files and updates
source code, documentation, and tests to switch over to Vega is that was
not done already. Highlights are:
- Remove all src/arch/amdgpu/gcn3 files and update Kconfigs.
- Remove references to GCN3 and replace with Vega where applicable.
- Update the build targets in the gcn-gpu Docker. This will need to be
rebuilt but not urgently.
- Remove the GCN3 tag in testlib. Most tests seem to be using Vega
already, so that commit is small.
Vega (gfx900) introduced new memory aperture registers to get the base
address and limit for LDS and private (scratch) memory. These have not
commonly been used by the compiler until ROCm 6. Now that the compiler
is generating reads from these special registers, implement the support
for them.
Tested with LULESH which is using the SHARED_BASE register (LDS) with
ROCm 6.0. This assembly seems to replace S_GETREG_B32 emitted by the
ROCm 5 compiler.
Change-Id: Id2bd26ce8ef687c84a647fa2ac2da54d657913e5
By default all SDMA queues are privileged queues, meaning the addresses
in SDMA packets use the privileged translation tables. RLC queues
(sometimes called user queues) are not necessarily privileged and might
use user translation tables. RLC queues are used more often in ROCm 6.0
exposing an issue with invalid translations with RLC queues.
This changeset checks the priv bit in the SDMA MQD when an RLC queue is
mapped. Each packet type which uses an address then checks the bit
before performing translation. Tested with daily/weekly tests with a
ROCm 6.0 disk image and tests are passing.
Change-Id: I6122fbc194e8d6f5d38e81f1b0e11646d90e0ea0
This PR reorganizes the instructions.cc into multiple files and renames
some files which do not match their corresponding header file names. The
intention is to make iterating on development of these files faster.
An issue raised in #240 where if an address range ends
at the last byte of a 64 bit address space, it will be
considered a subset of any other address range that starts
at the first byte of the range.
Change-Id: I517f4717052eda2504de971be0eb59ee9a623dd3
This separation was only for convenience while GPU tests were under
development and rapidly changing. This test merges the GPU tests into
the weekly tests where they belong.
The files registers.cc, isa.cc, and decoder.cc do not match the header
name. This is a minor cleanup to make development more straightforward.
Change-Id: Ibab18dfe315b0ce84359939b490f8227ea43cac0
The Vega instructions.cc file is 47k lines long which results in both
large compilation times whenever it is modified and long style check
times. This makes iterating over more complex instruction
implementations very time consuming.
This commit moves the instruction definitions to multiple files based on
the instruction encoding (SOP2, VOP2, FLAT, DS, etc.). The resulting
files are much smaller (max is 8k lines) and compilation and style check
times are much more reasonable. Other than moving code around, there are
no functional changes in this commit.
Change-Id: Id4ac8e98ef11a58de5fd328f8a0cd7ce60a11819
The addition of std::optional in #732 caused a compile error. This
change fixes the error by checking to see if the value is present and
panicing otherwise.
Change-Id: I46c3fb76eb0e14ba7bede7c336293fbe9add8c84
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
1. Add the new double width for int64_t and uint64_t
2. Use the wider type to get the upper result of multiplication
Change-Id: Id6cfa6f274c65592b2b3e2b70c00f82954b41f1a
Replace instances of "GCN3" with Vega. Remove gfx801 and gfx803. Rename
FIJI to Vega and Carrizo to Raven.
Using misc since there is not enough room to fit all the tags.
Change-Id: Ibafc939d49a69be9068107a906e878408c7a5891
I’ve been working on a fix for the issue #759 where ‘vd’ incorrectly
stores all zeros when ‘vl’ is set to 0 in VectorIntMaskMacroConstructor.
My solution seems to work, but it behaves differently from other macros
when ‘vl’ = 0. Instead of pushing a ‘nop’ to ‘microops’, it pushes a
micro operation that remains ineffective due to ‘vl’ being 0.
When restoring the simulate_limit_event pointer is not
restored after running the dry simulation run which ends up in
"Panic: event not found!"
In this commit we fix this issue by correctly restoring
the pointer value along with the event queue head
Change-Id: Id5ad4d2a270a6cd34eec1dc5c9b170b2b84610d4
---------
Co-authored-by: narya <nitish.arya@bsc.es>
Co-authored-by: Jason Lowe-Power <jason@lowepower.com>
Newer compilers error on -Warray-length in the recent MI200 patches due
to casting from a 32-bit data type to a 64-bit type. Change it to cast
the 32-bit integer first then 64-bit integer latter to remove the
warning.
Rerun of validation tests on the three instructions passed.
Change-Id: I0309e5f7b5b8cc8ce1651660ddddb120fa6e7666
Currently, the TLB enforces that the bit 63 of a physical address to be
zero. This check stems from the riscv-tests that checks for the bit 63
of a physical address [1]. This is due to the fact that the ISA
implicitly says that the physical address must be zero-extended on the
most significant bits that are not translated [2]. More details on this
issue is here [3].
The check for bit 63 of a physical address in the TLB is rather too
specific, and I believe the check of invalid physical address is alread
implemented in PMA. Thus, this change proposes to remove this check from
RISC-V TLB.
[1]
bd0a19c136/isa/rv64mi/access.S (L18)
[2] https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/8kO7X0y4ubo
[3] https://github.com/gem5/gem5/issues/238
Change-Id: I247e4d4c75c1ef49a16882c431095f6e83f30383
Signed-off-by: Hoa Nguyen <hn@hnpl.org>
This is matching what we are already doing in the starter_fs.py script
Change-Id: I50239050be9bd151a607ec892f8dd9322b24040b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The PMAChecker and PMP are only used in the RisvISA and it should be in
the RiscvISA to simply the implementation
Change-Id: I4968e2de4c028cb2dceed977f2173fc8b1efd175
The RFC is defaulted to a size of 0 which removes it completely. To use
the RFC set the --register-file-cache-size to a non-zero multiple of
two. In addition, rfc_pipe_length may be altered to increase or decrease
RFC latency benefit.
The RFC is defaulted to a size of 0 which removes it completely. To use
the RFC set the --register-file-cache-size to a non-zero multiple of
two. In addition, rfc_pipe_length may be altrered to increase or
decrease RFC latency benefit.
Change-Id: I6f5bf5b750eb64155fbc8c8343e9feadce5c9f79
This patch is amending encodeAArch64SysReg so that it covers the case
where there are no arch numbers available for the misc index passed as
an argument.
This could happen if the register ID is a gem5 pseudo register which is
not associated with any architected op1/op2/crn/crm tuple.
Rather than panicking we return a nullopt.
Change-Id: I7ab70467105ef93c0c78ac4e999c7dc8e5e09925
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Implemented according to the ISA spec. Validated with silion. In
particular the sign extend is important for the signed variants and the
unsigned variants seem to overflow lanes (hence why there is no mask()
in the unsigned varints. FP16 -> FP32 continues using ARM's fplib.
Tested vs. an MI210. Clamp has not been verified.
Change-Id: Ifc09aecbc1ef2c92a5524a43ca529983018a6d59
Starting with MI200, packed math can operate on double dword inputs. In
this case, 64-bits of inputs (two VGPRs per lane) contain two FP32
values.
Add instructions to perform add, multiply, and FMA on packed FP32 types.
Change-Id: Ib838bff91a10e02e013cc7c33ec3d91ff08647b0
This change adds all of the missing flat/global atomics up to including
the new atomics in gfx90a (MI200). Adds all decodings and instruction
implementations with the exception of __half2 which does not have a
corresponding data type in gem5. This refactors the execute() and
completeAcc() methods by creating helper functions similar to what
initiateAcc() uses. This reduces redundant code for global atomic
instruction implementations.
Validated all except PK_ADD_F16, ADD_F32, and ADD_F64 which will be done
shortly. Verified the source/dest register sizes in the header are
correct and the template parameters for the new execute()/completeAcc()
methods are correct.
Change-Id: I4b3351229af401a1a4cbfb97166801aac67b74e4
The AMDKernelCode struct is very outdated. Most of the fields are no
longer used and have been replaced with new fields that are used.
Therefore in order to support the new fields the code object needs to be
updated. The new structure is based on the table located at
https://llvm.org/docs/AMDGPUUsage.html#code-object-v3-kernel-descriptor
Most notably this adds the new compute_pgm_rsrc3 and kernarg preload
fields which are new features in gfx90a (MI200). The accum_offset in
compute_pgm_rsrc3 and kergarg preload values are necessary to run
application which enable those features and therefore a way to check
their values is needed.
Also noteable is the removal of enable_sgpr_workgroup_id_{X,Y,Z}. These
seem to be unused in all versions of ROCm that gem5 supports and
therefore these fields can be removed. They are replaced with a reserved
field in the new code object.
Change-Id: I5542442e1e5961b05e17affad0adb5186d6d9d1a
Use the opSelectorToRegSym which will print the full range of VGPRs
(e.g., will now print v[2:3] instead of v2 when the source / dest is
64-bits). Fixes atomic disassembly prints. Now shows "glc" if GLC bit is
enabled. Fixes some VGPR fields being printed as an SGPR in places where
the 9-bit register index bit is implied (e.g., VDST).
This makes it easier to use a GPUExec trace to match with LLVM
disassembly when debugging.
Change-Id: Ia163774850f0054243907aca8fc8d0361e37fdd5
This adds the VOP3P and VOP3P_MAI encodings from the MI200 spec. These
instructions are used for packed math and miSIMD instructions. The first
19 VOP3P opcodes are implemented and validated against hardware. This
includes all instructions which operate on one dword containing two
packed 16-bit values of fp16, int16_t, or uint16_t.
Implement one MFMA instruction for now which was also validated against
hardware.
This is useful in other ISAs to implement FP16 computation. For example,
it can be used in the GPU model. The ARM specific misc register is
ignored in that case.
Change-Id: I339ac0ccd9be4371b0f220ad99068e5e12b3d263
This initialization method is used in gfx90a (MI200). Rather than using
three VGPRs for X,Y,Z dimensions of the kernel, pack them into one
register with 10-bits for each dimensions.
Change-Id: I8e5b681c8287779ff9f80451d6028e862322294a
The version is necessary for determining the correct ABI init process.
Add it to the task queue so it is accessible when doing ABI init.
Change-Id: If77434b0f93614057b5c40fcf612d59b54e05dbb
this adds an option --with-libcxx, that adds the -stdlib=libc++ flag to
link against libc++ instead of libstdc++ on Linux. Currently this is
only possible with clang and may not work with all build configurations
(e.g. protobuf linked against libstdc++), so this needs to be opt-in
rather than being on by default for clang whenever libc++ is detected.
Change-Id: Ib4022a58bb2dbd32417c58f01c7443a02ff710fe