derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matthew Poremba	2703fb5699	gpu-compute: Fix valgrind memleak complaints Fixes several memory leaks, mostly of small and medium severity. Fixes mismatched new/new[] and delete/delete[] calls. Change-Id: Iedafc409389bd94e45f330bc587d6d72d1971219	2024-05-03 14:29:31 -07:00
Matthew Poremba	823b5a6eb8	dev-amdgpu: Support multiple CPs and MMIO AddrRanges Currently gem5 assumes that there is only one command processor (CP) which contains the PM4 packet processor. Some GPU devices have multiple CPs which the driver tests individually during POST if they are used or not. Therefore, these additional CPs need to be supported. This commit allows for multiple PM4 packet processors which represent multiple CPs. Each of these processors will have its own independent MMIO address range. To more easily support ranges, the MMIO addresses now use AddrRange to index a PM4 packet processor instead of the hard-coded constexpr MMIO start and size pairs. By default only one PM4 packet processor is created, meaning the functionality of the simulation is unchanged for devices currently supported in gem5. Change-Id: I977f4fd3a169ef4a78671a4fb58c8ea0e19bf52c	2024-03-21 10:13:55 -05:00
Matthew Poremba	998709d4fc	dev-amdgpu: Improve PM4 write data packet The write data packet can write multiple dwords but currently always assumes there is one dword, which can cause some write data to be missed. This case is not common, but the number of dwords is implicitly defined in the PM4 header. This changeset passes the PM4 header to write data so that the correct number of dwords can be determined. For now we assume no page crossing when writing multiple dwords as the driver should be checking for that. Change-Id: I0e8c3cbc28873779f468c2a11fdcf177210a22b7	2024-03-21 10:10:01 -05:00
Matthew Poremba	c045c68540	dev-amdgpu: Add node_id to interrupt handler The ROCm 6.0 driver adds a node_id field to interrupts which must match before passing on the interrupt to be cleared by the cookie from gem5's interrupt handler implementation. Add this field and enable for gfx942. The usage of the field can be seen in event_interrupt_isr_v9_4_3 at https://github.com/ROCm/ROCK-Kernel-Driver/blob/roc-6.0.x/drivers/ gpu/drm/amd/amdkfd/kfd_int_process_v9.c#L449 Change-Id: Iae8b8f0386a5ad2852b4a3c69f2c161d965c4922	2024-03-21 10:10:01 -05:00
Matthew Poremba	9e6a87e67a	dev-amdgpu: Writeback PM4 queue rptr when empty (#597 ) The GPU device keeps a local copy of each ring buffers read pointer (rptr) to avoid constant DMAs to/from host memory. This means it needs to be periodically updated on the host side as the driver uses this to determine how much space is left in the queue and may hang if it believe the queue is full. For user-mode queues, this already happens when queues are unmapped. For kernel mode queues (e.g., HIQ, KIQ) the rptr is never updated leading to a hang. In this patch the rptr for all queues is reported back to the kernel whenever the queue reaches an empty state (rptr == wptr). Additionally to handle PM4 queue wrap-around, the queue processing function checks if the queue is not empty instead of rptr < wptr. This is state because the driver fills PM4 queues with NOP packets on initialization and when wrap around occurs. Change-Id: Ie13a4354f82999208a75bb1eaec70513039ff30f	2023-11-27 11:02:11 -08:00
Matthew Poremba	37da1c45f3	dev-amdgpu: Better handling for queue remapping The amdgpu driver can, at any time, tell the device to unmap a queue to force the queue descriptor to be written back to main memory in the form of a memory queue descriptor (MQD). It will then immediately remap the queue and continue writing the doorbell to the queue. It is possible that the doorbell write occurs after the queue is unmapped but before it is remapped. In this situation, we need to check the updated value of the doorbell for the queue and write that to the queue after it is mapped. To handle this, a pending doorbell packet map is created to hold a packet to replay when the queue is mapped. Because PCI in gem5 implements only the atomic protocol port, we cannot use the original packet as it must respond in the same Tick. This patch fixes issues with the doorbell maps not being cleared on unmapping to ensure the doorbell is not found in writeDoorbell and places in the pending doorbell map. This includes fixing the doorbell offset value in the doorbell to VMID map which was is now multiplied by four as it is a dword address. This was tested using tensorflow 2.0's MNIST example which was seeing this issue consistently. With this patch it now makes progress and does issue pending doorbell writes. Change-Id: Ic6b401d3fe7fc46b7bcbf19a769cdea6814e7d1e	2023-11-01 14:52:39 -05:00
Vishnu Ramadas	f69191a31d	dev-amdgpu: Remove duplicate writes to PM4 queue pointers During checkpoint restoration, the unserialize() function writes rptr, wptr, and indirect buffer rptr, wptr to PM4 queue's rptr, wptr fields. This commit updates this to write only the relevant pointers to the queue structure. If indirect buffers are used, then it writes only the indirect buffer pointers to the queue. If they are not used, then it writes rptr, wptr values to the queue. Change-Id: Iedb25a726112e1af99cc1e7bc012de51c4ebfd45	2023-10-02 19:37:46 -05:00
Vishnu Ramadas	107e05266d	dev-amdgpu: Add aql, hsa queue information to checkpoint-restore GPUFS uses aql information from PM4 queues to initialize doorbells. This commit adds aql information to the checkpoint so that it can be used during restoration to correctly initialize all doorbells. Additionally, this commit also sets the hsa queue correctly during checkpoint-restoration Change-Id: Ief3ef6dc973f70f27255234872a12c396df05d89	2023-10-02 19:02:50 -05:00
Matthew Poremba	6b4a1020be	configs,dev-amdgpu: GPUFS MI200/gfx90a support Add support for MI200-like device. This includes adding PCI IDs and new MMIOs for the device, a different MAP_PROCESS packet, and a different calculation for the number of VGPRs. Change-Id: I0fb7b3ad928826beaa5386d52a94ba504369cb0d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70317 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-05-25 19:14:32 +00:00
Vishnu Ramadas	f5af8b5876	dev-amdgpu: Add a few MQD attributes to GPUFS checkpoint During GPUFS checkpoint restore, doorbells callbacks are created based on certain MQD attributes. These callbacks are required to create new SDMA doorbells. If these attributes are not present in the checkpoint, the restore hangs indefinitely waiting for ioctl calls that access these doorbells to finish execution. This commit adds the attributes required for checkpoint restore to proceed. Change-Id: Id3d1b7a2627d4c50133d923096495957a233f675 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70077 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com>	2023-04-27 21:15:46 +00:00
Vishnu Ramadas	65e0bd6eb4	dev-amdgpu: Added PM4MapQueues to GPUFS checkpoint The GPUFS checkpoint restoration mechanism expects to find a PM4MapQueues packet in the checkpoint. Since this was not being checkpointed, the restore phase retrieved a null packet which led to a segmentation fault. This commit adds PM4MapQueues to the checkpoint and restores it when deserializing the checkpoint Change-Id: Ib74a9f36fe89d740a74f94314ada41ecc363abe9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69298 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>	2023-04-03 22:28:57 +00:00
Matthew Poremba	eee42275ee	dev-amdgpu: Writeback RLC queue MQD when unmapped Currently when RLC queues (user mode queues) are mapped, the read/write pointers of the ring buffer are set to zero. However, these queues could be unmapped and then remapped later. In that situation the read/write pointers should be the previous value before unmapping occurred. Since the read pointer gets reset to zero, the queue begins reading from the start of the ring, which usually contains older packets. There is a 99% chance those packets contain addresses which are no longer in the page tables which will cause a page fault. To fix this we update the MQD with the current read/write pointer values and then writeback the MQD to memory when the queue is unmapped. This requires adding a pointer to the MQD and the host address of the MQD where it should be written back to. The interface for registering RLC queue is also simplified. Since we need to pass the MQD anyway, we can get values from it as well. Fixes b+tree and streamcluster from rodinia (when using RLC queues). Change-Id: Ie5dad4d7d90ea240c3e9f0cddf3e844a3cd34c4f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65791 Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2022-12-01 21:04:05 +00:00
Matthew Poremba	623e2d3dac	dev-amdgpu: Handle ring buffer wrap for PM4 queue Change-Id: I27bc274327838add709423b072d437c4e727a714 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65431 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2022-11-09 15:47:50 +00:00
Matthew Poremba	c8d687b05c	dev-amdgpu: Fix SDMA ring buffer wrap around The current SDMA wrap around handling only considers the ring buffer location as seen by the GPU. Eventually when the end of the SDMA ring buffer is reached, the driver waits until the rptr written back to the host catches up to what the driver sees before wrapping around back to the beginning of the buffer. This writeback currently does not happen at all, causing hangs for applications with a lot of SDMA commands. This changeset first fixes the sizes of the queues, especially RLC queues, so that the wrap around occurs in the correct place. Second, we now store the rptr writeback address and the absoluate (unwrapped) rptr value in each SDMA queue. The absolulte rptr is what the driver sends to the device and what it expects to be written back. This was tested with an application which basically does a few hundred thousand hipMemcpy() calls in a loop. It should also fix the issue with pannotia BC in fullsystem mode. Change-Id: I53ebdcc6b02fb4eb4da435c9a509544066a97069 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65351 Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-11-09 04:11:35 +00:00
Matthew Poremba	489074fbfd	dev-amdgpu: Fix issues with PM4 queue map, fences The PM4 release_mem packet is used as a DMA fence in the driver. It specifies which queue the interrupt came from by encoding the me, pipe, and queue fields from the map_queue packet into the interrupt ring ID. Currently these fields are incorrect because (1) the order in the bitfield is backwards, (2) the queue constructor assigns a pointer to the PM4MapQueue packet containing this data to the dmaBuffer which gets deleted in short order, and (3) the order of the encoding of ring ID is incorrect. This change fixes these issues by (1) placing the struct vales in correct order, (2) creating a const copy of the dmaBuffer on construction, and (3) using the ring ID encoding expected by the driver: https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/roc-4.3.x/ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c#L5989 Change-Id: I72c382980e57573f8a8a6879912c4139c7e2f505 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65095 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-11-01 15:34:17 +00:00
Matthew Poremba	c5feca8251	dev-amdgpu: Rework PM4 NOP packet The PM4 NOP header is used to insert spaces in the PM4 ring and can therefore be any size. This includes zero. A size of zero is denoted by a value of 0x3fff in the NOP packet header. Currently we assume this means the remainder of the PM4 queue up to the wptr is empty/NOPs. This is not always true. This changeset reworks the PM4 NOP packet to handle the value of 0x3fff as a special value and advances the rptr by 0 bytes. This fixes issues where there were additional packets in the queue which were being skipped over by fast forwarding. Since those packets could be anything, that leads to undefined behavior afterwards. Change-Id: I3f5c3f4b7dd50f93ba503fea97454a9d41771e30 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65094 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2022-11-01 15:34:08 +00:00
Matthew Poremba	b623d26543	dev-amdgpu: Fix interrupt call for release mem Both the client id and source id are incorrect for the release mem CP packet. This changeset sets both to the correct value and adds asserts that the value is declared in the client ID and source ID enums. Change-Id: I4cc6c3a5f2a482e8f7dcd2a529c4a69bf71742c0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63177 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2022-09-09 04:13:49 +00:00
Matthew Poremba	4211962f8c	dev-amdgpu: Fix translation reading SDMA MQD ("RLC queue") The RLC queue MQD address is a GART address, not a system address, so it must be translated through the GART first. Change-Id: Ie52b0e65ebf57141b8ba6f88a49989813750eeec Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62711 Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2022-09-03 16:05:58 +00:00
Matthew Poremba	68115460d8	gpu-compute: Set LDS and Scratch apertures in FS The LDS and scratch aperture base and limits are hardcoded to some values that are useful for SE mode. In reality, these are chosen by the driver so we need to honor whatever values the driver passes so that when addresses are calculated they fall into the correct aperture to route flat instructions to those apertures. This overwrites the default hardcoded values for LDS and scratch base and limit using the values providing by the driver in a MAP_PROCESS packet. Change-Id: I0e194a26631f697819d8aaecf1bf346a7b7c7026 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61656 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2022-07-28 14:10:33 +00:00
Matthew Poremba	f65f5a8981	gpu-compute,arch-vega: Overhaul HWRegs, setreg, getreg These instructions are supposed to be read/writing special shader hardware registers. Currently they are getting/setting to an SGPR. This results in getting incorrect registers at best and clobbering an SGPR being used by an application at worst. Furthermore, some registers need to be set in the shader and the application will never (can never) set them. This patch overhauls the getreg/setreg instructions to use different storage in the shader. The values will be updated either via setreg from an application (e.g., mode register) or set by a PM4 MAP_PROCESS. Change-Id: Ie5e5d552bd04dc47f5b35b5ee40a569ae345abac Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61655 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-07-28 14:10:33 +00:00
Matthew Poremba	54d2438066	dev-amdgpu: Removed hardcoded AQL queue size The AQL queue size is currently hardcoded to 64kB. For longer running applications this causes the circular queue to wrap before reaching the real end of the queue. Add the computation for queue size instead. Previously longer applications (e.g., bc in pannotia) were hanging around 4k kernels. With change the application launches 10k+ kernels. Change-Id: I6c31677c1799a3c9ce28cf4e7e79efcb987e3b7f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/59449 Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2022-05-07 03:47:06 +00:00
Matthew Poremba	e3f65393fd	dev-amdgpu,arch-vega: Implement TLB invalidation logic Add logic to collect pointers to all GPU TLBs in full system. Implement the invalid TLBs PM4 packet. The invalidate is done functionally since there is really no benefit to simulate it with timing and there is no support in the TLB to do so. This allow application with much larger data sets which may reuse device memory pages to work in gem5 without possibly crashing due to a stale translation being leftover in the TLB. Change-Id: Ia30cce02154d482d8f75b2280409abb8f8375c24 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58470 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-08 17:12:32 +00:00
Bobby R. Bruce	ea9b7ef6a2	dev-amdgpu: Add braces to stop clang compilation braces error Additional braces are needed due to a clang compilation bug that falsely throws a "suggest braces around initialization of subject" error. More info on this bug is available here: https://stackoverflow.com/questions/31555584 Change-Id: Ide5cdd260716ba06f6da4663732e39d18e00af97 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58150 Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-25 13:40:04 +00:00
Matthew Poremba	1be246bbe3	dev-amdgpu: Add PM4PP, VMID, Linux definitions The PM4 packet processor is handling all non-HSA GPU packets such as packets for (un)mapping HSA queues. This commit pulls many Linux structs and defines out into their own files for clarity. Finally, it implements the VMID related functions in AMDGPU device. Change-Id: I5f0057209305404df58aff2c4cd07762d1a31690 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53068 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-24 14:59:57 +00:00

24 Commits