Many of the outstanding issues with the GPU model are related to
instructions not having SDWA/DPP implementations and executing by
ignoring the special registers leading to incorrect executiong.
Adding SDWA/DPP is current very cumbersome as there is a lot of
boilerplate code.
This changeset adds helper methods for VOP2 with one instruction
changed as an example. This review is intended to get feedback
before applying this change to all VOP2 instructions that support
SDWA/DPP.
Change-Id: I1edbc3f3bb166d34f151545aa9f47a94150e1406
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70738
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This instruction has two issues. The first is that it should write two
consecutive registers, starting with vdst because it is writing two
dwords. The second is that the data assignment to the lanes from the
dynamic instruction should cast to a U32 type otherwise the array index
goes out of bounds and returns the wrong data.
The first issue was fixed in GCN3 a few years ago in this review:
https://gem5-review.googlesource.com/c/public/gem5/+/32236. This
changeset makes the same change for Vega and applies the U32 cast in
both ISAs.
Tested with rocPRIM unit test. The test was failing before this
changeset and now passes.
Change-Id: Ifb110fc9a36ad198da7eaf86b1e3e37eccd3bb10
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70577
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
VOP2 with opcodes 55-61 were added in MI100 and are not in Vega10. This
changeset adds the decodings for these instructions.
The changeset does not implement the instructions, however the fatal
message is much more helpful for debugging compared so a generic
decode_invalid handler.
Change-Id: Ibde0880c35ff915bf8e50772df9ce263e55ca893
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70042
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
The VGPR-offset used when SGPR-base addressing is used can be signed in
Vega. These are global instructions of the format:
`global_load_dword v0, v1, s[0:1]`. This is not explicitly stated in the
ISA manual however based on compiler output the offset can be negative.
This changeset assigns the offset to a signed 32-bit integer and the
compiler takes care of the signedness in the expression which calculates
the final address. This fixes a bad address calculation in a rocPRIM
unit test.
Change-Id: I271edfbb4c6344cb1a6a69a0fd3df58a6198d599
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67412
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Global instructions in Vega can either use a VGPR base address plus
instruction offset or SGPR base address plus VGPR offset plus
instruction offset. Currently the VGPR address/offset is always read as
two dwords. This causes problems if the VGPR number is the last VGPR
allocated to a wavefront since the second dword would be beyond the
allocation and trip an assert.
This changeset sets the operand size of the VGPR operand to one dword
when SGPR base is used and two dwords otherwise so initDynOperandInfo
does not assert. It also moves the read of the VGPR into the calcAddr
method so that the correct ConstVecOperandU## is used to prevent another
assertion failure when reading from the register file. These two changes
are made to all flat instructions, as global instructions are a
subsegement of flat instructions.
Change-Id: I79030771aa6deec05ffa5853ca2d8b68943ee0a0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67077
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
DPP processing has several issues which are fixed in this changeset:
1) Incorrect comment is updated
2) newLane calculation for shift/rotate instructions is corrected
3) A copy of original data is made so that a copy of a copy is not made
4) Reset all booleans (OOB, zeroSrc, laneDisabled) after each lane
iteration
The shift, rotate, and broadcast variants were tested by implementing
them in assembly and running on silicon.
Change-Id: If86fbb26c87eaca4ef0587fd846978115858b168
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66752
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
The bitfield extract instructions come in unsigned and signed variants.
The documentation on this is not correct, however the GCN3 documentation
gives some clues. The instruction should extract an N-bit integer where
N is defined in a source operand starting at some bit also defined by a
source operand. For signed variants of this instruction, the N-bit
integer should be sign extended but is currently not.
This changeset does sign extension using the runtime value of N by ORing
the upper bits with ones if the most significant bit is one. This was
verified by writing these instructions in assembly and running on a real
GPU. Changes are made to v_bfe_i32, s_bfe_i32, and s_bfe_i64.
Change-Id: Ia192f5940200c6de48867b02f709a7f1b2daa974
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66751
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Vega allows for any integer multiple of 4kB pages. However, the current
implementation is designed for 4kB page primarily. In order to support
variable page sizes, the physical address calculation needs to be
updated to add the virtual page offset to the base physical address
rather than bitwise-OR. Bitwise-OR assumes physical pages are at
aligned to the page size which is generally not the case for very
large pages (1GB+).
This changeset changes all of the physical address computations to add
the virtual offset to the physical page address. This fixes many GPUFS
applications which use larger pages. The support was tested by
hipMalloc'ing ~5GB to induce a large page being created. The test
application now passes verification with this change.
Change-Id: Ic8d1475e001def443f3e4ab609449bca0c40b638
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64751
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Page directory entries (PDEs) can be interpreted as leaf node page
table entries (PTEs) if the "p" bit is set. This is used for flexible
page sizes in Vega. Currently there is only support for PDE level 0
entries which can be interpreted as 2MB pages. This changeset adds
support for PDE1 and PDE2 which can be used to represent 1GB and 512GB
pages. PDE1-as-PTE entries can be tested and were verified on
applications by allocating >2GB of data. PDE0 is untested due to being
too large for simulation, but the implementation is similar to PDE0
and PDE1.
Change-Id: I801cbb5ec79110d57d2db760cc689c2e5778f9bb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64451
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
The previous print statement was not clear that a scalar offset was
being used when printing disassembly, which made it slightly more
difficult to track down bugs related to this (relatively) rare usage of
global load/store instructions.
This change improves the disassembly to closer match the output of
hipcc's assembly code output.
Change-Id: I8514aedacb5b1db93d0586c408c4cf1ce77a7db3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63175
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
The virtual and physical address for device memory are typically aligned
to the page size. On the host (x86), however, the physical address may
not be aligned to page size for large page sizes when mixed with 4kB
pages. As a result, the physical address calculation must add, rather
than bitwise-OR, the virtual page offset to the physical page number.
The virtual page offset on the GPU continues to use the variable page
bytes for masking and shifting.
Change-Id: I6563a1eb43d9b59577d32268b8645a7436304bcb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63034
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
These instructions are supposed to be read/writing special shader
hardware registers. Currently they are getting/setting to an SGPR. This
results in getting incorrect registers at best and clobbering an SGPR
being used by an application at worst. Furthermore, some registers need
to be set in the shader and the application will never (can never) set
them.
This patch overhauls the getreg/setreg instructions to use different
storage in the shader. The values will be updated either via setreg from
an application (e.g., mode register) or set by a PM4 MAP_PROCESS.
Change-Id: Ie5e5d552bd04dc47f5b35b5ee40a569ae345abac
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61655
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Here the mask should not be inverted. We also need to shift by the
offset to remove the padding as the consumer of the value expects the
offset to be removed.
This can be easily tested by running a GPU kernel with __shared__
variables. This will generate the following assembly:
s_getreg_b32 s6, hwreg(HW_REG_SH_MEM_BASES, 16, 16)
The current implementation returns the lower 16 bits (private memory
aperture) while the correct behavior is the uppter 16 bits (shared/LDS
memory aperture).
Change-Id: Iea8f0adceeadb24cdcf46ef4183fcaa8262ab9e7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61654
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
This instruction appears to be the only VOP1 instruction that has a
scalar destination using VDST as the destination register number.
However, since VDST is only 8 bits it cannot encode all possible
registers. Therefore, use the opcode to determine if the destination is
a scalar or vector destination.
This issue manifests as a VGPR dest being out of range for a kernel
where the number of SGPRs is more than the number of VGPRs and the
intended SGPR dest is larger than the count of VGPRs
Change-Id: I95a7de1ddb97f7171f48331fed36aef776fa0cb4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61649
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Vega adds three new VOP2 instructions that may use VOP3 encoding that
are not part of the GCN3 ISA: v_add_u32, v_sub_u32, v_subrev_u32. This
changeset implements those three new instructions to fix errors related
to "invalid encoding" when those instructions are seen.
Tested using srad from Rodinia 3.0 HIP port which compiles a v_add_u32
instruction with VOP3 encoding.
Change-Id: I409a9f72f5c37895c3a0ab7ceb14a4dd121874a4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61330
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Previously b40b361bee added support for the Vega operand encoder. As
part of this, it made sure to check for the S_GETPC_B64 instruction,
which appears to be the only instruction in the Vega ISA that does not
use the source register. However, at the time the commit used magic
numbers without comment, which can be difficult for users to interpret.
To resolve this, this commit adds a comment to explain where the magic
numbers come from (Table 58 in the Vega ISA manual).
Change-Id: Ic5007b510e0175558d21ede8eb6db273113187b2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/60650
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Some of the Vega scalar instructions (S_SETPC_B64, S_RFE_B64,
S_CBRANCH_JOIN, and S_SET_GPR_IDX_IDX) do not use the SDST scalar
destination register. However, Vega's operand encoding function for the
SOP1 instruction type's class assumed all instructions used the
destination register, which results in an assert failure for these
instructions.
To resolve this, this commit updates the Vega SOP1 operand encoder to
ignore the destination register for these specific instructions.
Change-Id: I2f0d830f6264fc7f47c0694a2fd5da5d33d2ea0b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/60649
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Add logic to collect pointers to all GPU TLBs in full system. Implement
the invalid TLBs PM4 packet. The invalidate is done functionally since
there is really no benefit to simulate it with timing and there is no
support in the TLB to do so. This allow application with much larger
data sets which may reuse device memory pages to work in gem5 without
possibly crashing due to a stale translation being leftover in the TLB.
Change-Id: Ia30cce02154d482d8f75b2280409abb8f8375c24
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58470
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This makes what are configuration and what are internal SCons variables
explicit and separate, and makes it unnecessary to call out what
variables to export to C++.
These variables will also be plumbed into and out of kconfiglib in later
changes.
Change-Id: Iaf5e098d7404af06285c421dbdf8ef4171b3f001
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56892
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently if a Ruby functional access fails to find an address in the
caches, it gives up. For functional page table walks we need to be able
to go all the way to memory. This adds a pointer to the system object
which allows the walker to get a pointer to device memory which can be
used to do a functional access directly to memory bypassing Ruby.
Change-Id: I0ead6e5e130a0d53021c44ae9221b167c6316ab2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57529
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Create a VM class to reduce clutter in the amdgpu_device.* files. This
new file is in charge of reading/writting MMIOs related to VM contexts
and apertures. It also provides ranges checks for various apertures and
breaks out the MMIO interface so that there are not overloaded macro
definitions in the device MMIO methods.
The new translation generator classes for the various apertures are also
added to this class.
Change-Id: Ic224c1aa485685685b1136a46eed50bcf99d2350
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53066
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The executed_as field is currently not set for global memory
instructions. This results in the default of SC_NONE, causing the status
vector to be all zeros. The GM pipe sees this and completes the
instruction immediately rather than issuing memory requests. This is
fixed by marking the instruction as executed as SC_GLOBAL always. Flat
instructions use resolvedFlatSegment for this, however since global
instructions are known to be global we can set this field directly. This
results in the expected issuing of memory requests to GPU memory.
Change-Id: Ic23102853ccd49a41e2f083b7bb24f033dfed18a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57829
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>