Commit Graph

119 Commits

Author SHA1 Message Date
Matthew Poremba
db903f4fd4 arch-vega: Helper methods for SDWA/DPP for VOP2
Many of the outstanding issues with the GPU model are related to
instructions not having SDWA/DPP implementations and executing by
ignoring the special registers leading to incorrect executiong.
Adding SDWA/DPP is current very cumbersome as there is a lot of
boilerplate code.

This changeset adds helper methods for VOP2 with one instruction
changed as an example. This review is intended to get feedback
before applying this change to all VOP2 instructions that support
SDWA/DPP.

Change-Id: I1edbc3f3bb166d34f151545aa9f47a94150e1406
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70738
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-06-15 23:02:39 +00:00
Matthew Poremba
ae7476bcdc arch-gcn3,arch-vega: Fix ds_read2st64_b32
This instruction has two issues. The first is that it should write two
consecutive registers, starting with vdst because it is writing two
dwords. The second is that the data assignment to the lanes from the
dynamic instruction should cast to a U32 type otherwise the array index
goes out of bounds and returns the wrong data.

The first issue was fixed in GCN3 a few years ago in this review:
https://gem5-review.googlesource.com/c/public/gem5/+/32236. This
changeset makes the same change for Vega and applies the U32 cast in
both ISAs.

Tested with rocPRIM unit test. The test was failing before this
changeset and now passes.

Change-Id: Ifb110fc9a36ad198da7eaf86b1e3e37eccd3bb10
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70577
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-05-13 20:09:37 +00:00
Matthew Poremba
cd76f92c94 arch-vega: Add decodings for new MI100 VOP2 insts
VOP2 with opcodes 55-61 were added in MI100 and are not in Vega10. This
changeset adds the decodings for these instructions.

The changeset does not implement the instructions, however the fatal
message is much more helpful for debugging compared so a generic
decode_invalid handler.

Change-Id: Ibde0880c35ff915bf8e50772df9ce263e55ca893
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70042
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2023-04-28 00:48:35 +00:00
Matthew Poremba
f028bd55e0 arch-vega: Update API for some flat atomics
Some recently submitted atomic instructions were using two older APIs.
Update these to use the newer APIs to support all apertures and avoid
compilation issue.

Change-Id: Ibd6bc00177d33236946f54ef8e5c7544af322852
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67977
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2023-02-16 03:28:25 +00:00
Matthew Poremba
ea9239ae09 dev-amdgpu: Update deprecated ports
Change-Id: Icbc5636c33b437c7396ee27363eed1cf006f8882
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67837
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2023-02-14 18:57:33 +00:00
Alexandru Dutu
bb8f370e4d arch-vega: Implementing global_atomic_smax
Change-Id: Id4053424c98eec1e98eb555bb35b48f0b5d2407b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64513
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-02-14 18:57:14 +00:00
Alexandru Dutu
8375058e73 arch-vega: Implementing global_atomic_smin
Change-Id: Iffb366190f9e3f7ffbacde5dbb3abc97226926d4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64512
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2023-02-14 18:57:14 +00:00
Alexandru Dutu
d7516a26dc arch-vega: Implementing global_atomic_or
Change-Id: I13065186313ca784054956e1165b1b2fd8ce4a19
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64511
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2023-02-14 18:57:14 +00:00
Matthew Poremba
bc9e90d65e arch-vega: Make VGPR-offset for global SGPR-base signed
The VGPR-offset used when SGPR-base addressing is used can be signed in
Vega. These are global instructions of the format:
`global_load_dword v0, v1, s[0:1]`. This is not explicitly stated in the
ISA manual however based on compiler output the offset can be negative.

This changeset assigns the offset to a signed 32-bit integer and the
compiler takes care of the signedness in the expression which calculates
the final address. This fixes a bad address calculation in a rocPRIM
unit test.

Change-Id: I271edfbb4c6344cb1a6a69a0fd3df58a6198d599
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67412
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-02-09 17:08:34 +00:00
Matthew Poremba
905b8ebd22 arch-vega: Implement ds_write_b8_d16_hi
Writes a byte to the upper 16-bit input word to an address.

Change-Id: I0bfd573526b9c46585d0008cde07c769b1d29ebd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67411
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-02-09 17:08:34 +00:00
Gabriel Busnot
7f4c92c910 mem,arch-arm,mem-ruby,cpu: Remove use of deprecated base port owner
Change-Id: I29214278c3dd4829c89a6f7c93214b8123912e74
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67452
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
2023-02-03 06:11:45 +00:00
Matthew Poremba
03083ba5e3 arch-vega: Implement ds_write2st64_b64
Write two qwords at offsets multiplied by 8 * 64 bytes.

Change-Id: I0d0e05f3e848c2fd02d32095e32b7f023bd8803b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67078
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2023-01-05 23:12:10 +00:00
Matthew Poremba
450bc254bd arch-vega: Read one dword for SGPR base global insts
Global instructions in Vega can either use a VGPR base address plus
instruction offset or SGPR base address plus VGPR offset plus
instruction offset. Currently the VGPR address/offset is always read as
two dwords. This causes problems if the VGPR number is the last VGPR
allocated to a wavefront since the second dword would be beyond the
allocation and trip an assert.

This changeset sets the operand size of the VGPR operand to one dword
when SGPR base is used and two dwords otherwise so initDynOperandInfo
does not assert. It also moves the read of the VGPR into the calcAddr
method so that the correct ConstVecOperandU## is used to prevent another
assertion failure when reading from the register file. These two changes
are made to all flat instructions, as global instructions are a
subsegement of flat instructions.

Change-Id: I79030771aa6deec05ffa5853ca2d8b68943ee0a0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67077
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-01-05 23:12:10 +00:00
Matthew Poremba
3bfa220e4e arch-vega: Implement ds_read_i8
Read one byte with sign extended from LDS.

Change-Id: I9cb9b4033c6f834241cba944bc7e6a7ebc5401be
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67076
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2023-01-05 23:12:10 +00:00
Matthew Poremba
b83457df0b arch-vega: Implement ds_add_u64
This instruction does an atomic add of an unsigned 64-bit data with a
VGPR and value in LDS atomically without return.

Change-Id: I6a7d6713b256607c4e69ddbdef5c83172493c077
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67075
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-01-05 23:12:10 +00:00
Matthew Poremba
892e3057f7 arch-vega: Implement ds_add_f32 atomic
This instruction does an atomic add of a 32-bit float with a VGPR and
value in LDS atomically without return.

Change-Id: Id4f23a1ab587a23edfd1d88ede1cbcc5bdedc0cb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67074
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2023-01-05 23:12:10 +00:00
Matthew Poremba
022a48f9f6 arch-vega: Implement ds_add_u32 atomic
This instruction does an atomic add of unsigned 32-bit data with a VGPR
and value in LDS atomically, without return.

Change-Id: I87579a94f6200a9a066f8f7390e57fb5fb6eff8e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67072
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2023-01-03 22:53:30 +00:00
Matthew Poremba
e392603d81 arch-vega: Add missing operand size for ds_write2st64_b64
This instruction takes three operands (address, and two datas) but there
were only operand sizes for two operands tripping assert in default
case.

Change-Id: I3f505b6432aee5f3f265acac46b83c0c7daff3e7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67071
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2023-01-03 22:53:30 +00:00
Matthew Poremba
6449633827 arch-vega: Add DPP support for V_AND_B32
A DPP variant of V_AND_B32 was found in rocPRIM. With this changeset the
unit tests for rocPRIM scan_inclusive are passing.

Change-Id: I5a65f2cf6b56ac13609b191e3b3dfeb55e630942
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66753
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2023-01-03 22:53:30 +00:00
Matthew Poremba
bbdebc25da arch-vega: Fix several issues with DPP
DPP processing has several issues which are fixed in this changeset:

1) Incorrect comment is updated
2) newLane calculation for shift/rotate instructions is corrected
3) A copy of original data is made so that a copy of a copy is not made
4) Reset all booleans (OOB, zeroSrc, laneDisabled) after each lane
iteration

The shift, rotate, and broadcast variants were tested by implementing
them in assembly and running on silicon.

Change-Id: If86fbb26c87eaca4ef0587fd846978115858b168
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66752
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2023-01-03 22:53:30 +00:00
Matthew Poremba
f99a3c1f96 arch-vega: Fix signed BFE instructions
The bitfield extract instructions come in unsigned and signed variants.
The documentation on this is not correct, however the GCN3 documentation
gives some clues. The instruction should extract an N-bit integer where
N is defined in a source operand starting at some bit also defined by a
source operand. For signed variants of this instruction, the N-bit
integer should be sign extended but is currently not.

This changeset does sign extension using the runtime value of N by ORing
the upper bits with ones if the most significant bit is one. This was
verified by writing these instructions in assembly and running on a real
GPU. Changes are made to v_bfe_i32, s_bfe_i32, and s_bfe_i64.

Change-Id: Ia192f5940200c6de48867b02f709a7f1b2daa974
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66751
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2023-01-03 22:53:30 +00:00
Matthew Poremba
8693d725e2 arch-vega: Fix SOPK instruction sign extends
See: https://gem5-review.googlesource.com/c/public/gem5/+/37495

Same patch but for vega. This fixes issues with lulesh and probably
rodinia - heartwall as well in fullsystem.

Change-Id: I3af36bb9b60d32dc96cc3b439bb1167be1b0945d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65432
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-11-09 15:47:50 +00:00
Matthew Poremba
6a4a12ebbd arch-vega: Improve non-native page size support
Vega allows for any integer multiple of 4kB pages. However, the current
implementation is designed for 4kB page primarily. In order to support
variable page sizes, the physical address calculation needs to be
updated to add the virtual page offset to the base physical address
rather than bitwise-OR. Bitwise-OR assumes physical pages are at
aligned to the page size which is generally not the case for very
large pages (1GB+).

This changeset changes all of the physical address computations to add
the virtual offset to the physical page address. This fixes many GPUFS
applications which use larger pages. The support was tested by
hipMalloc'ing ~5GB to induce a large page being created. The test
application now passes verification with this change.

Change-Id: Ic8d1475e001def443f3e4ab609449bca0c40b638
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64751
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-10-31 14:30:13 +00:00
Matthew Poremba
e48285c244 arch-vega: Implement PDE2 and PDE1 as PTE
Page directory entries (PDEs) can be interpreted as leaf node page
table entries (PTEs) if the "p" bit is set. This is used for flexible
page sizes in Vega. Currently there is only support for PDE level 0
entries which can be interpreted as 2MB pages. This changeset adds
support for PDE1 and PDE2 which can be used to represent 1GB and 512GB
pages. PDE1-as-PTE entries can be tested and were verified on
applications by allocating >2GB of data. PDE0 is untested due to being
too large for simulation, but the implementation is similar to PDE0
and PDE1.

Change-Id: I801cbb5ec79110d57d2db760cc689c2e5778f9bb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64451
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2022-10-13 20:17:00 +00:00
Matthew Poremba
925b1b5c8e arch-vega: Implement V_XAD_U32 instruction
Used in rodinia:heartwall.  This instruction is new in vega and does not
exist in GCN3.

Change-Id: I6127290d1c85688a7f82e149e97762ca55e05fc6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63972
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-10-06 16:42:40 +00:00
Matthew Poremba
b919d9c5c9 arch-vega: Improve disasm for GLOBAL insts with scalar offset
The previous print statement was not clear that a scalar offset was
being used when printing disassembly, which made it slightly more
difficult to track down bugs related to this (relatively) rare usage of
global load/store instructions.

This change improves the disassembly to closer match the output of
hipcc's assembly code output.

Change-Id: I8514aedacb5b1db93d0586c408c4cf1ce77a7db3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63175
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2022-09-09 04:13:49 +00:00
Matthew Poremba
f91abb9770 arch-vega: Allow unaligned large host pages
The virtual and physical address for device memory are typically aligned
to the page size. On the host (x86), however, the physical address may
not be aligned to page size for large page sizes when mixed with 4kB
pages. As a result, the physical address calculation must add, rather
than bitwise-OR, the virtual page offset to the physical page number.
The virtual page offset on the GPU continues to use the variable page
bytes for masking and shifting.

Change-Id: I6563a1eb43d9b59577d32268b8645a7436304bcb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63034
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2022-09-09 04:13:49 +00:00
Alexandru Dutu
241023329d arch-vega: DS_OR_B32 does not return data
The DS_OR_B32 instruction should not return data. This
changeset updates its implementation.

Change-Id: Ib5fccdbb69cd1ad2639bbf54824dc363a3fbf599
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62656
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2022-09-01 15:16:37 +00:00
Gabe Black
f4209bbdee misc: Remove lingering uses of TheISA::.
Change-Id: Ie55e0d79867fbc8f75a993fb456a58c84de5def4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62196
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
2022-08-20 07:30:16 +00:00
Gabe Black
2da4a2cdca cpu: Stop including arch/isa.hh.
This header file is no longer needed.

Also fix some places where the isa header file was being transitively
included.

Change-Id: Ib9a9d7db0c9808b29d7614bbd68e2052ea345e9f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51239
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-08-20 01:17:06 +00:00
Alexandru Dutu
c6b38909e1 gpu-compute: Adding support for LDS atomics
This changeset is adding support for LDS atomics
and implementing DS_OR_B32 instruction.

Change-Id: I84c5cf6ce0e9494726dc7299f360551cd2a485f5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61791
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-08-19 16:44:31 +00:00
Bobby R. Bruce
787204c92d python: Apply Black formatter to Python files
The command executed was `black src configs tests util`.

Change-Id: I8dfaa6ab04658fea37618127d6ac19270028d771
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47024
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-08-03 09:10:41 +00:00
Matthew Poremba
f65f5a8981 gpu-compute,arch-vega: Overhaul HWRegs, setreg, getreg
These instructions are supposed to be read/writing special shader
hardware registers. Currently they are getting/setting to an SGPR. This
results in getting incorrect registers at best and clobbering an SGPR
being used by an application at worst. Furthermore, some registers need
to be set in the shader and the application will never (can never) set
them.

This patch overhauls the getreg/setreg instructions to use different
storage in the shader. The values will be updated either via setreg from
an application (e.g., mode register) or set by a PM4 MAP_PROCESS.

Change-Id: Ie5e5d552bd04dc47f5b35b5ee40a569ae345abac
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61655
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2022-07-28 14:10:33 +00:00
Matthew Poremba
5c7514c81c arch-vega: Fix S_GETREG_B32 masking/shifting
Here the mask should not be inverted. We also need to shift by the
offset to remove the padding as the consumer of the value expects the
offset to be removed.

This can be easily tested by running a GPU kernel with __shared__
variables. This will generate the following assembly:

s_getreg_b32 s6, hwreg(HW_REG_SH_MEM_BASES, 16, 16)

The current implementation returns the lower 16 bits (private memory
aperture) while the correct behavior is the uppter 16 bits (shared/LDS
memory aperture).

Change-Id: Iea8f0adceeadb24cdcf46ef4183fcaa8262ab9e7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61654
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2022-07-28 14:10:33 +00:00
Matthew Poremba
618d16d6fc arch-vega: Fix V_READFIRSTLANE_B32
This instruction appears to be the only VOP1 instruction that has a
scalar destination using VDST as the destination register number.
However, since VDST is only 8 bits it cannot encode all possible
registers. Therefore, use the opcode to determine if the destination is
a scalar or vector destination.

This issue manifests as a VGPR dest being out of range for a kernel
where the number of SGPRs is more than the number of VGPRs and the
intended SGPR dest is larger than the count of VGPRs

Change-Id: I95a7de1ddb97f7171f48331fed36aef776fa0cb4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61649
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-07-28 14:10:33 +00:00
Matthew Poremba
af9ecf7920 arch-vega,arch-gcn3: Add support for VCC_HI as scalar source
Currently there is only support for VCC_LO as a scalar source. Add
support for VCC_HI as well. The op selector symbol is also changed to be
vcc_hi/vcc_lo as it is in disassembly from LLVM.

Change-Id: I19ea8e23873049c33ffe2eb4ec8504a18f371c0e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61329
Reviewed-by: Matt Sinclair <mattdsinclair.wisc@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-07-18 17:27:50 +00:00
Matthew Poremba
3a73aa3ac1 arch-vega: Implement new VOP2 using VOP3 insts
Vega adds three new VOP2 instructions that may use VOP3 encoding that
are not part of the GCN3 ISA: v_add_u32, v_sub_u32, v_subrev_u32. This
changeset implements those three new instructions to fix errors related
to "invalid encoding" when those instructions are seen.

Tested using srad from Rodinia 3.0 HIP port which compiles a v_add_u32
instruction with VOP3 encoding.

Change-Id: I409a9f72f5c37895c3a0ab7ceb14a4dd121874a4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61330
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2022-07-15 14:55:33 +00:00
Matthew Poremba
40077055cf arch-vega: Fix disassembly for two dword VOPC
Calling opSelectorToRegSym in the disassembly for VOPC when there is a
second dword (SDWA, DPP, or Literal) causes a panic as those registers
do not have a string symbol. This is fixed by checking for a second
dword before printing similar to how VOP1, VOP2, SOP1, etc. function.

Change-Id: I97b33e1e45abcf3ff1d0bc5754773b4eee961a98
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61269
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-07-12 19:05:32 +00:00
Matt Sinclair
590719a383 arch-vega: explain when op encoder ignores src reg
Previously b40b361bee added support for the Vega operand encoder.  As
part of this, it made sure to check for the S_GETPC_B64 instruction,
which appears to be the only instruction in the Vega ISA that does not
use the source register.  However, at the time the commit used magic
numbers without comment, which can be difficult for users to interpret.

To resolve this, this commit adds a comment to explain where the magic
numbers come from (Table 58 in the Vega ISA manual).

Change-Id: Ic5007b510e0175558d21ede8eb6db273113187b2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/60650
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
2022-06-22 17:14:35 +00:00
Matt Sinclair
00008b725c arch-vega: some Vega instructions don't use dest reg
Some of the Vega scalar instructions (S_SETPC_B64, S_RFE_B64,
S_CBRANCH_JOIN, and S_SET_GPR_IDX_IDX) do not use the SDST scalar
destination register.  However, Vega's operand encoding function for the
SOP1 instruction type's class assumed all instructions used the
destination register, which results in an assert failure for these
instructions.

To resolve this, this commit updates the Vega SOP1 operand encoder to
ignore the destination register for these specific instructions.

Change-Id: I2f0d830f6264fc7f47c0694a2fd5da5d33d2ea0b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/60649
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
2022-06-22 17:14:35 +00:00
Michael Boyer
81058189af arch-vega,arch-gcn3: Implement S_MEMTIME instruction
Change-Id: I3e286eb6ff8af4097ad03d4066be79f73d938cea
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53603
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-06-21 20:19:46 +00:00
Charles Jamieson
7170c365be arch-vega: implement S_GETREG_B32 instruction
This commit adds support for the Vega GPU ISA's S_GETREG_B32
instruction.

This work was done by Charles Jamieson but I am committing.

Change-Id: Ic2e24f667ed1aec7b8b1404a06e17e7ffb192fba
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/60589
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-06-17 02:24:05 +00:00
Gabe Black
84ae0afa59 arch-x86: Put misc reg indexes into a name space.
Also make them match the style guide.

Change-Id: I845f141f85d4499a5acf56c2161240764906a232
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49758
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
2022-06-08 07:09:05 +00:00
Gabe Black
88143c940b arch-x86: Convert segment indices to fit the style guide.
Capitalize only their first letter, and use a namespace to namespace
them instead of a SEGMENT_REG_ prefix.

Change-Id: I69778c8d052ad6cc0ffd9e74dd1c643e9d28048d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49756
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Reviewed-by: Gabe Black <gabe.black@gmail.com>
2022-06-08 07:08:37 +00:00
Matthew Poremba
b64467025d arch-vega: Implement SOP2 S_MUL_HI instructions
Two new 32-bit signed and unsigned variants of S_MUL were added in
gfx900 which operate similar to S_MUL expect they shift the product by
32 bits after multiplication. Tested with Histogram HIP-Sample and
b+tree in rodinia 3.0 HIP port.

Change-Id: I1bed32b17ccda7aa47f3b59528eb3304912d3610
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58473
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-04-11 17:06:43 +00:00
Matthew Poremba
e3f65393fd dev-amdgpu,arch-vega: Implement TLB invalidation logic
Add logic to collect pointers to all GPU TLBs in full system. Implement
the invalid TLBs PM4 packet. The invalidate is done functionally since
there is really no benefit to simulate it with timing and there is no
support in the TLB to do so. This allow application with much larger
data sets which may reuse device memory pages to work in gem5 without
possibly crashing due to a stale translation being leftover in the TLB.

Change-Id: Ia30cce02154d482d8f75b2280409abb8f8375c24
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58470
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-04-08 17:12:32 +00:00
Gabe Black
e6c0ba97db scons: Put all config variables in an env['CONF'] sub-dict.
This makes what are configuration and what are internal SCons variables
explicit and separate, and makes it unnecessary to call out what
variables to export to C++.

These variables will also be plumbed into and out of kconfiglib in later
changes.

Change-Id: Iaf5e098d7404af06285c421dbdf8ef4171b3f001
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56892
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-03-28 20:31:21 +00:00
Matthew Poremba
dd90417211 arch-vega: Bypass Ruby for functional page walks
Currently if a Ruby functional access fails to find an address in the
caches, it gives up. For functional page table walks we need to be able
to go all the way to memory. This adds a pointer to the system object
which allows the walker to get a pointer to device memory which can be
used to do a functional access directly to memory bypassing Ruby.

Change-Id: I0ead6e5e130a0d53021c44ae9221b167c6316ab2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57529
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-03-25 19:51:29 +00:00
Matthew Poremba
9cbdf75295 dev-amdgpu: Add VM class for apertures, TranslationGens
Create a VM class to reduce clutter in the amdgpu_device.* files. This
new file is in charge of reading/writting MMIOs related to VM contexts
and apertures. It also provides ranges checks for various apertures and
breaks out the MMIO interface so that there are not overloaded macro
definitions in the device MMIO methods.

The new translation generator classes for the various apertures are also
added to this class.

Change-Id: Ic224c1aa485685685b1136a46eed50bcf99d2350
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53066
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-03-24 14:59:57 +00:00
Matthew Poremba
f64f05eff6 arch-vega: Mark global instructions executed as global
The executed_as field is currently not set for global memory
instructions. This results in the default of SC_NONE, causing the status
vector to be all zeros. The GM pipe sees this and completes the
instruction immediately rather than issuing memory requests. This is
fixed by marking the instruction as executed as SC_GLOBAL always. Flat
instructions use resolvedFlatSegment for this, however since global
instructions are known to be global we can set this field directly. This
results in the expected issuing of memory requests to GPU memory.

Change-Id: Ic23102853ccd49a41e2f083b7bb24f033dfed18a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57829
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-03-18 22:44:28 +00:00