Commit Graph

13 Commits

Author SHA1 Message Date
Matthew Poremba
04767ddc62 dev-amdgpu: Fix SDMA ring buffer wrap around
The current SDMA wrap around handling only considers the ring buffer
location as seen by the GPU. Eventually when the end of the SDMA ring
buffer is reached, the driver waits until the rptr written back to the
host catches up to what the driver sees before wrapping around back to
the beginning of the buffer. This writeback currently does not happen at
all, causing hangs for applications with a lot of SDMA commands.

This changeset first fixes the sizes of the queues, especially RLC
queues, so that the wrap around occurs in the correct place. Second, we
now store the rptr writeback address and the absoluate (unwrapped) rptr
value in each SDMA queue. The absolulte rptr is what the driver sends to
the device and what it expects to be written back.

This was tested with an application which basically does a few hundred
thousand hipMemcpy() calls in a loop. It should also fix the issue with
pannotia BC in fullsystem mode.

Change-Id: I53ebdcc6b02fb4eb4da435c9a509544066a97069
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65351
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
(cherry picked from commit c8d687b05c)
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65451
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
2022-11-11 00:11:21 +00:00
Matthew Poremba
752b696883 dev-amdgpu: Fix SDMA trap ring ID, context
SDMA traps are used in the driver as a DMA fence. To pass a fence, the
SDMA sends the driver the interrupt context from a trap packet and the
ring ID which specifies which queue in the SDMA engine is passing a
fence. Currently the interrupt context is using the wrong value in the
packet and the ring ID is hard-coded to always be the gfx queue.

This changeset uses the correct interrupt context from the SDMA packet
and sets the ring ID to either 0 if the gfx queue is currently being
processed or 3 if the page queue is being processed.

The relevant interrupt service routine in the driver can be found at:
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/roc-4.3.x/
    drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c#L2129

Change-Id: Ie4a4a9d6ab1d3bf83bf76bb57a02a91100217b51
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65093
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-11-01 15:34:08 +00:00
Matthew Poremba
7b16b17e61 dev-amdgpu: Chunkify SDMA copies that use device memory
The current implementation of SDMA copy calls the GPU memory manager's
read/write method one time passing a physical address as the
source/destination. This implicitly assumes the physical addresses are
contiguous which is generally not true for large allocations. This
results in reading from/writing to the wrong address.

This changeset fixes the problem by copying large copies in chunks of
the minimum possible page size on the GPU (4kB). Each page is translated
seperately to ensure the correct physical address. The final copy "done"
callback is only used for the last transfer. The transfers should
complete in order so the copy command will not complete until all chunks
have been copied. Tested and verified on an application with a large
allocation (~5GB).

Change-Id: I27018a963da7133f5e49dec13b0475c3637c8765
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64752
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-10-31 14:30:24 +00:00
Matthew Poremba
a648be2338 dev-amdgpu: Add an SDMA data debug flag
This debug flag is used to print spammy SDMA DPRINTFs, such as an SDMA
copy printing the data of large transfers 8 bytes per line at a time. For
those prints, the SDMAEngine flag will now only print the first and last
qword of the transfer and the new SDMAData flag is needed for verbose
data printing. This makes the SDMAEngine flag still useful for verifying
copies in applications with predictable data such as square.

Additionally, the memory allocation/deallocation done solely for a print
statement is removed in favor of casting the data to the printed type.

Change-Id: I18c1918ef9085cca4570f79881ee63d510ccc32f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64452
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2022-10-13 20:17:00 +00:00
Matthew Poremba
6c935657fd dev-amdgpu: Implement SDMA atomic packet
SDMA atomic packets are used in conjunction with RLC queues in SDMA for
synchronization similar to how HSA signals are used with BLIT kernels
when SDMA is disabled. Implement a skeleton of the SDMA atomic packet
methods as well as the atomic add64 operation.

The atomic add operation appears to be the only operation used in ROCm,
so this implementation is fairly complete. See:

https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/
    rocm-4.2.x/src/core/runtime/amd_blit_sdma.cpp#L880

Change-Id: I62cc337f2ffe590bdb947b48053760ee8b3a6f32
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63174
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-09-09 04:13:49 +00:00
Matthew Poremba
9ea28bd782 dev-amdgpu: Implement SDMA RLC queue unmapping
The unmap queues packet specifies all non-static queues should be
unmapped which includes RLC queues in the SMDA. This functionality did
not exist before and is added in this changeset.

Fixes bug with rodinia_3.0/hip/bfs.

Change-Id: I80ca8cf8d89559625b5870745889b0a27916635e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63173
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2022-09-09 04:13:49 +00:00
Matthew Poremba
af4251f6ae dev-amdgpu: Rework SDMA RLC queue data structure
There can only ever be two RLC queues maximum. Use this information for
a simpler data structure to store doorbell information. The patch
changes the std::unordered_map previously used to std::array. This will
also be useful in avoiding erase-while-iterating issues needed to
unregister all queues at once.

Change-Id: I95600e40de51cb1a992a20bcebaf7580ea4d0be8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63172
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
2022-09-09 04:13:49 +00:00
Matthew Poremba
a5dfb0718d dev-amdgpu: Add user-mode TranslationGen to SDMA
RLC queue do translation using user mode addresses. To support this, add
the final aperture translation needed to the SDMA engine.

Change-Id: I25841e240e3b44f66d26d503ab52b54379daa49a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63032
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
2022-09-09 04:13:49 +00:00
Matthew Poremba
e0e2806fc4 dev-amdgpu: Add SDMA device translation helper
Adding a helper function to remove duplicate code in the copy packet
methods. Adds more comments on that code to explain what it is doing.
This could in theory also be used in other packets in the future.

Change-Id: Id0ed50c87260a2f12f53cb14e927f8c49bb99072
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62718
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
2022-09-09 04:13:49 +00:00
Matthew Poremba
3465ff1e7d dev-amdgpu: Add callbacks for all SDMA GPUMemMgr reqs
SDMA write, copy, and ptePde use GPUMemMgr to write to device memory and
were dangerously not waiting for write completion which could result in
data not being completely written to memory, the data buffer being freed
and potentially reused in the simulator, or advancing to the next SDMA
packet before the previous one is complete.

This changeset adds callbacks for the corresponding "done" methods
similar to what the dmaVirt methods call when reading or writing to host
memory to fix this issue.

Change-Id: I44ce14c13f812ea2a7a76438e12a6ed7c6e0bff0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62715
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-09-03 16:05:58 +00:00
Matthew Poremba
432329c853 dev-amdgpu: Allow device address source for SDMA COPY
Now that the memory manager can DMA read from device memory, allow the
linear copy SDMA packet to use device memory as a source. This is used
when copying memory from device to host when SDMA engines are enabled.
This improves simulation performance over using (simulated) BLIT kernels
with SDMA engines disabled.

Change-Id: I1f41b294022f0049d154a401c1dc885abb4f223b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62713
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-09-03 16:05:58 +00:00
Matthew Poremba
1be246bbe3 dev-amdgpu: Add PM4PP, VMID, Linux definitions
The PM4 packet processor is handling all non-HSA GPU packets such
as packets for (un)mapping HSA queues. This commit pulls many
Linux structs and defines out into their own files for clarity.
Finally, it implements the VMID related functions in AMDGPU device.

Change-Id: I5f0057209305404df58aff2c4cd07762d1a31690
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53068
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-03-24 14:59:57 +00:00
Alexandru Dutu
f1772d3505 dev-amdgpu: Add SDMAEngine and GPU device methods
SDMAEngine handles copies to device memory. This commit
updates sdma_packets.hh style as well. Added several methods needed by
SDMAEngine to GPU device including GART table, various getters, and
aperture range checkers. Move the MMIO interface from GPUController to
SDMAEngine. Create an SDMA MMIO and commands header with only the macros
we use so that we don't need to check in multi-thousand line header
files from the linux kernel. Keep SOC15 IH client ID macros as that file
is small.

Change-Id: I986fede90cc1bc16ee56d4e8598cf9283bde034e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53064
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2022-03-24 14:59:57 +00:00