Commit Graph

9 Commits

Author SHA1 Message Date
Matthew Poremba
c5feca8251 dev-amdgpu: Rework PM4 NOP packet
The PM4 NOP header is used to insert spaces in the PM4 ring and can
therefore be any size. This includes zero. A size of zero is denoted by
a value of 0x3fff in the NOP packet header. Currently we assume this
means the remainder of the PM4 queue up to the wptr is empty/NOPs. This
is not always true.

This changeset reworks the PM4 NOP packet to handle the value of 0x3fff
as a special value and advances the rptr by 0 bytes. This fixes issues
where there were additional packets in the queue which were being
skipped over by fast forwarding. Since those packets could be anything,
that leads to undefined behavior afterwards.

Change-Id: I3f5c3f4b7dd50f93ba503fea97454a9d41771e30
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65094
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2022-11-01 15:34:08 +00:00
Matthew Poremba
b623d26543 dev-amdgpu: Fix interrupt call for release mem
Both the client id and source id are incorrect for the release mem CP
packet. This changeset sets both to the correct value and adds asserts
that the value is declared in the client ID and source ID enums.

Change-Id: I4cc6c3a5f2a482e8f7dcd2a529c4a69bf71742c0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63177
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2022-09-09 04:13:49 +00:00
Matthew Poremba
4211962f8c dev-amdgpu: Fix translation reading SDMA MQD ("RLC queue")
The RLC queue MQD address is a GART address, not a system address, so it
must be translated through the GART first.

Change-Id: Ie52b0e65ebf57141b8ba6f88a49989813750eeec
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62711
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2022-09-03 16:05:58 +00:00
Matthew Poremba
68115460d8 gpu-compute: Set LDS and Scratch apertures in FS
The LDS and scratch aperture base and limits are hardcoded to some
values that are useful for SE mode. In reality, these are chosen by the
driver so we need to honor whatever values the driver passes so that
when addresses are calculated they fall into the correct aperture to
route flat instructions to those apertures.

This overwrites the default hardcoded values for LDS and scratch base
and limit using the values providing by the driver in a MAP_PROCESS
packet.

Change-Id: I0e194a26631f697819d8aaecf1bf346a7b7c7026
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61656
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2022-07-28 14:10:33 +00:00
Matthew Poremba
f65f5a8981 gpu-compute,arch-vega: Overhaul HWRegs, setreg, getreg
These instructions are supposed to be read/writing special shader
hardware registers. Currently they are getting/setting to an SGPR. This
results in getting incorrect registers at best and clobbering an SGPR
being used by an application at worst. Furthermore, some registers need
to be set in the shader and the application will never (can never) set
them.

This patch overhauls the getreg/setreg instructions to use different
storage in the shader. The values will be updated either via setreg from
an application (e.g., mode register) or set by a PM4 MAP_PROCESS.

Change-Id: Ie5e5d552bd04dc47f5b35b5ee40a569ae345abac
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61655
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2022-07-28 14:10:33 +00:00
Matthew Poremba
54d2438066 dev-amdgpu: Removed hardcoded AQL queue size
The AQL queue size is currently hardcoded to 64kB. For longer running
applications this causes the circular queue to wrap before reaching the
real end of the queue. Add the computation for queue size instead.

Previously longer applications (e.g., bc in pannotia) were hanging
around 4k kernels. With change the application launches 10k+ kernels.

Change-Id: I6c31677c1799a3c9ce28cf4e7e79efcb987e3b7f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/59449
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2022-05-07 03:47:06 +00:00
Matthew Poremba
e3f65393fd dev-amdgpu,arch-vega: Implement TLB invalidation logic
Add logic to collect pointers to all GPU TLBs in full system. Implement
the invalid TLBs PM4 packet. The invalidate is done functionally since
there is really no benefit to simulate it with timing and there is no
support in the TLB to do so. This allow application with much larger
data sets which may reuse device memory pages to work in gem5 without
possibly crashing due to a stale translation being leftover in the TLB.

Change-Id: Ia30cce02154d482d8f75b2280409abb8f8375c24
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58470
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-04-08 17:12:32 +00:00
Bobby R. Bruce
ea9b7ef6a2 dev-amdgpu: Add braces to stop clang compilation braces error
Additional braces are needed due to a clang compilation bug that falsely
throws a "suggest braces around initialization of subject" error. More
info on this bug is available here:
https://stackoverflow.com/questions/31555584

Change-Id: Ide5cdd260716ba06f6da4663732e39d18e00af97
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58150
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-03-25 13:40:04 +00:00
Matthew Poremba
1be246bbe3 dev-amdgpu: Add PM4PP, VMID, Linux definitions
The PM4 packet processor is handling all non-HSA GPU packets such
as packets for (un)mapping HSA queues. This commit pulls many
Linux structs and defines out into their own files for clarity.
Finally, it implements the VMID related functions in AMDGPU device.

Change-Id: I5f0057209305404df58aff2c4cd07762d1a31690
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53068
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-03-24 14:59:57 +00:00