derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matthew Poremba	07f6b7c59c	dev-amdgpu: Fix pending PCI RLC doorbell (#1157 ) SDMA RLC queues do not currently remove their doorbell mapping. This can cause issues re-registering the queue and prevents the pending doorbells feature from working. In addition the data value of the doorbell (the ring buffer rptr) is not saved, leading to UB when this workaround is used. This commit removes the doorbell mapping from the gpu device when the SDMA engine unmaps an RLC queue and copies the next doorbell value to the pending packet as was originally intended. Change-Id: Ifd551450f439c065579afcf916f8ff192e7598ab	2024-05-29 07:15:46 -07:00
Matthew Poremba	8be5ce6fc9	dev-amdgpu,configs,gpu-compute: Add gfx942 version This is the version for MI300. For the most part, it is the same as MI200 with the exception of architected flat scratch (not yet implemented in gem5) and therefore a new version enum is required. Change-Id: Id18cd7b57c4eebd467c010a3f61e3117beb8d58a	2024-05-15 12:08:41 -07:00
Matthew Poremba	29f63f630b	dev-amdgpu: Correct missing GART warning SDMA ptePde packets are generating a warning that a GART address is missing, causing a wrong address to be clobbered by the operation. This commit fixes this by converting the GART address when the queue is running in privledged mode, which is the only mode allowed to use GART addresses. This removes the warnings and writes to the correct memory region. Change-Id: I64acac308db2431c5996b876bf4cda704f51cf25	2024-05-03 14:31:17 -07:00
Matthew Poremba	2703fb5699	gpu-compute: Fix valgrind memleak complaints Fixes several memory leaks, mostly of small and medium severity. Fixes mismatched new/new[] and delete/delete[] calls. Change-Id: Iedafc409389bd94e45f330bc587d6d72d1971219	2024-05-03 14:29:31 -07:00
Matthew Poremba	1d64669473	mem,gpu-compute: Implement GPU TCC directed invalidate The GPU device currently supports large BAR which means that the driver can write directly to GPU memory over the PCI bus without using SDMA or PM4 packets. The gem5 PCI interface only provides an atomic interface for BAR reads/writes, which means the values cannot go through timing mode Ruby caches. This causes bugs as the TCC cache is allowed to keep clean data between kernels for performance reasons. If there is a BAR write directly to memory bypassing the cache, the value in the cache is stale and must be invalidated. In this commit a TCC invalidate is generated for all writes over PCI that go directly to GPU memory. This will also invalidate TCP along the way if necessary. This currently relies on the driver synchonization which only allows BAR writes in between kernels. Therefore, the cache should only be in I or V state. To handle a race condition between invalidates and launching the next kernel, the invalidates return a response and the GPU command processor will wait for all TCC invalidates to be complete before launching the next kernel. This fixes issues with stale data in nanoGPT and possibly PENNANT. Change-Id: I8e1290f842122682c271e5508a48037055bfbcdf	2024-04-10 11:35:25 -07:00
Matthew Poremba	823b5a6eb8	dev-amdgpu: Support multiple CPs and MMIO AddrRanges Currently gem5 assumes that there is only one command processor (CP) which contains the PM4 packet processor. Some GPU devices have multiple CPs which the driver tests individually during POST if they are used or not. Therefore, these additional CPs need to be supported. This commit allows for multiple PM4 packet processors which represent multiple CPs. Each of these processors will have its own independent MMIO address range. To more easily support ranges, the MMIO addresses now use AddrRange to index a PM4 packet processor instead of the hard-coded constexpr MMIO start and size pairs. By default only one PM4 packet processor is created, meaning the functionality of the simulation is unchanged for devices currently supported in gem5. Change-Id: I977f4fd3a169ef4a78671a4fb58c8ea0e19bf52c	2024-03-21 10:13:55 -05:00
Matthew Poremba	39153cd234	dev-amdgpu: Implement PCIe indirect read/write PCIe can read/write to any 32-bit address using the PCI index/index2 registers as an address and then reading/writing the corresponding data/data2 register. This commit adds this functionality and removes one magic value being written to support GPU POST. This feature is disabled for Vega10 which relies on an MMIO trace for too many values to implement in the MMIO interface. Change-Id: Iacfdd1294a7652fc3e60304b57df536d318c847b	2024-03-21 10:13:55 -05:00
Matthew Poremba	047c194780	dev-amdgpu: Implement SRBM write The SRBM write packets where previously not required. This commit implements SRBM writes to set a register by using the new setRegVal interface. SRBM writes seem to be used for SRIOV enabled devices. Change-Id: I202653d339e882e8de59d69a995f65332b2dfb8c	2024-03-21 10:10:01 -05:00
Matthew Poremba	6bbde8fbb8	dev-amdgpu: Rework handling of unknown registers The top level AMDGPUDevice currently reads/writes all unknown registers to/from a map containing the previously written value. This is intended as a way to handle registers that are not part of the model but the driver requires for functionality. Since this is at the top level, it can mask changes to register values which do not go through the same interface. For example, reading an MMIO, changing via PM4 queue, and reading again returns the stale cached value. This commit removes the usage of the regs map in AMDGPUDevice, implements some important MMIOs that were previously handled by it, and moves the unknown register handling to the NBIO aperture only. To reduce the number of additional MMIOs to implement, the display manager in vega10 is now disabled. Change-Id: Iff0a599dd82d663c7e710b79c6ef6d0ad1fc44a2	2024-03-21 10:10:01 -05:00
Matthew Poremba	009cec56e0	dev-amdgpu: Check for SDMA copies to GART range The SDMA engine can potentially be used to write to the GART address range. Since gem5 has a shadow copy of the GART table to avoid sending functional reads to device memory, the GART table must be updated when copying to the GART range. This changeset adds a check in the VM for GART range and implements the SDMA copy packet writing to the GART range. A fatal is added to write and ptePde, which are the only other two ways to write to memory, as using these packets to update the GART table has not been observed. Change-Id: I1e62dfd9179cc9e987659e68414209fd77bba2bd	2024-03-21 10:10:01 -05:00
Matthew Poremba	998709d4fc	dev-amdgpu: Improve PM4 write data packet The write data packet can write multiple dwords but currently always assumes there is one dword, which can cause some write data to be missed. This case is not common, but the number of dwords is implicitly defined in the PM4 header. This changeset passes the PM4 header to write data so that the correct number of dwords can be determined. For now we assume no page crossing when writing multiple dwords as the driver should be checking for that. Change-Id: I0e8c3cbc28873779f468c2a11fdcf177210a22b7	2024-03-21 10:10:01 -05:00
Matthew Poremba	c045c68540	dev-amdgpu: Add node_id to interrupt handler The ROCm 6.0 driver adds a node_id field to interrupts which must match before passing on the interrupt to be cleared by the cookie from gem5's interrupt handler implementation. Add this field and enable for gfx942. The usage of the field can be seen in event_interrupt_isr_v9_4_3 at https://github.com/ROCm/ROCK-Kernel-Driver/blob/roc-6.0.x/drivers/ gpu/drm/amd/amdkfd/kfd_int_process_v9.c#L449 Change-Id: Iae8b8f0386a5ad2852b4a3c69f2c161d965c4922	2024-03-21 10:10:01 -05:00
Matthew Poremba	7f71477f15	dev-amdgpu: Limit SDMA NOP count to wptr boundary (#806 ) If the NOP count of an SDMA NOP packet goes beyond the wptr address, the queue decode method will loop infinitely. If a packet comes in with a bad count this causes gem5 to hang. This change advances the rptr one dword at a time until either reaching the NOP count or when rptr == wptr to prevent this issue. Change-Id: Ib2c0f74a477bff27890c9c064bb4190e76e513bd	2024-01-25 15:35:35 -08:00
Ivana Mitrovic	24e0d71034	arch-gcn3: Remove gcn3 (#781 ) Related to issue #703 , this PR removes GCN3 related files and updates source code, documentation, and tests to switch over to Vega is that was not done already. Highlights are: - Remove all src/arch/amdgpu/gcn3 files and update Kconfigs. - Remove references to GCN3 and replace with Vega where applicable. - Update the build targets in the gcn-gpu Docker. This will need to be rebuilt but not urgently. - Remove the GCN3 tag in testlib. Most tests seem to be using Vega already, so that commit is small.	2024-01-25 10:14:46 -08:00
Matthew Poremba	0ac110ac95	dev-amdgpu: Check privledge bit for SDMA RLC queues (#792 ) By default all SDMA queues are privileged queues, meaning the addresses in SDMA packets use the privileged translation tables. RLC queues (sometimes called user queues) are not necessarily privileged and might use user translation tables. RLC queues are used more often in ROCm 6.0 exposing an issue with invalid translations with RLC queues. This changeset checks the priv bit in the SDMA MQD when an RLC queue is mapped. Each packet type which uses an address then checks the bit before performing translation. Tested with daily/weekly tests with a ROCm 6.0 disk image and tests are passing. Change-Id: I6122fbc194e8d6f5d38e81f1b0e11646d90e0ea0	2024-01-24 07:25:43 -08:00
Matthew Poremba	63caa780c2	misc: Remove all references to GCN3 Replace instances of "GCN3" with Vega. Remove gfx801 and gfx803. Rename FIJI to Vega and Carrizo to Raven. Using misc since there is not enough room to fit all the tags. Change-Id: Ibafc939d49a69be9068107a906e878408c7a5891	2024-01-17 11:11:06 -06:00
Bobby R. Bruce	d11c40dcac	misc: Run `pre-commit run --all-files` This ensures `isort` is applied to all files in the repo. Change-Id: Ib7ced1c924ef1639542bf0d1a01c5737f6ba43e9	2023-11-29 22:06:41 -08:00
Matthew Poremba	9e6a87e67a	dev-amdgpu: Writeback PM4 queue rptr when empty (#597 ) The GPU device keeps a local copy of each ring buffers read pointer (rptr) to avoid constant DMAs to/from host memory. This means it needs to be periodically updated on the host side as the driver uses this to determine how much space is left in the queue and may hang if it believe the queue is full. For user-mode queues, this already happens when queues are unmapped. For kernel mode queues (e.g., HIQ, KIQ) the rptr is never updated leading to a hang. In this patch the rptr for all queues is reported back to the kernel whenever the queue reaches an empty state (rptr == wptr). Additionally to handle PM4 queue wrap-around, the queue processing function checks if the queue is not empty instead of rptr < wptr. This is state because the driver fills PM4 queues with NOP packets on initialization and when wrap around occurs. Change-Id: Ie13a4354f82999208a75bb1eaec70513039ff30f	2023-11-27 11:02:11 -08:00
Vishnu Ramadas	06161ded8c	dev-amdgpu: Add VMID map to checkpoint When restoring checkpoints for certain applications, gem5 tries to create new doorbells with a pre-existing queue ID and simulation crashes shortly after. This commit checkpoints the existing VMID map so that any new doorbells after restoration use a unique queue ID Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f	2023-11-20 21:19:17 -06:00
Vishnu Ramadas	d19d6fc31e	dev-amdgpu: Add PM4 queue ID to GPU used VMID map When restoring checkpoints for certain applications, gem5 tries to create new doorbells with a pre-existing queue ID and simulation crashes shortly after. This commit adds existing IDs to the GPU device's used VMID map so that new doorbells are aware of existing queue IDs and use a new ID. This ensures that queue IDs are unique after checkpoint restoration Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f	2023-11-16 17:30:00 -06:00
Matthew Poremba	37da1c45f3	dev-amdgpu: Better handling for queue remapping The amdgpu driver can, at any time, tell the device to unmap a queue to force the queue descriptor to be written back to main memory in the form of a memory queue descriptor (MQD). It will then immediately remap the queue and continue writing the doorbell to the queue. It is possible that the doorbell write occurs after the queue is unmapped but before it is remapped. In this situation, we need to check the updated value of the doorbell for the queue and write that to the queue after it is mapped. To handle this, a pending doorbell packet map is created to hold a packet to replay when the queue is mapped. Because PCI in gem5 implements only the atomic protocol port, we cannot use the original packet as it must respond in the same Tick. This patch fixes issues with the doorbell maps not being cleared on unmapping to ensure the doorbell is not found in writeDoorbell and places in the pending doorbell map. This includes fixing the doorbell offset value in the doorbell to VMID map which was is now multiplied by four as it is a dword address. This was tested using tensorflow 2.0's MNIST example which was seeing this issue consistently. With this patch it now makes progress and does issue pending doorbell writes. Change-Id: Ic6b401d3fe7fc46b7bcbf19a769cdea6814e7d1e	2023-11-01 14:52:39 -05:00
Bobby R. Bruce	ddf6cb88e4	misc: Run `pre-commit run --all-files` This is reflect the updates made to black when running `pre-commit autoupdate`. Change-Id: Ifb7fea117f354c7f02f26926a5afdf7d67bc5919	2023-10-10 14:01:58 -07:00
Matt Sinclair	ec633b3d68	dev-amdgpu,mem-ruby: Add support to checkpoint and restore between kernels in GPUFS (#377 ) Earlier, GPU checkpointing was working only if a checkpoint was created before the first kernel execution. This pull request adds support to checkpoint in-between any two kernel calls. It does so by doing the following. - Adds flush support in the GPU_VIPER protocol - Adds flush support in the GPUCoalescer - Updates cache recorder to use the GPUCoalescer during simulation cooldown and cache warmup times.	2023-10-10 09:41:21 -05:00
Matthew Poremba	75a7f30dfb	dev-amdgpu: Implement GPU clock MMIOs The ROCr runtime uses a combination of HSA signal timestamps and hardware MMIOs to calculate profiling times. At the beginning of an application a timestamp is read from the GPU using MMIOs. The clock MMIOs reside in the GFX MMIO region, so a new AMDGPUGfx class is added to handle these MMIOs. The timestamp value is expected to be in nanoseconds, so we simply use the gem5 tick converted to ns. Change-Id: I7d1cba40d5042a7f7a81fd4d132402dc11b71bd4	2023-10-06 13:21:40 -05:00
Vishnu Ramadas	f69191a31d	dev-amdgpu: Remove duplicate writes to PM4 queue pointers During checkpoint restoration, the unserialize() function writes rptr, wptr, and indirect buffer rptr, wptr to PM4 queue's rptr, wptr fields. This commit updates this to write only the relevant pointers to the queue structure. If indirect buffers are used, then it writes only the indirect buffer pointers to the queue. If they are not used, then it writes rptr, wptr values to the queue. Change-Id: Iedb25a726112e1af99cc1e7bc012de51c4ebfd45	2023-10-02 19:37:46 -05:00
Vishnu Ramadas	107e05266d	dev-amdgpu: Add aql, hsa queue information to checkpoint-restore GPUFS uses aql information from PM4 queues to initialize doorbells. This commit adds aql information to the checkpoint so that it can be used during restoration to correctly initialize all doorbells. Additionally, this commit also sets the hsa queue correctly during checkpoint-restoration Change-Id: Ief3ef6dc973f70f27255234872a12c396df05d89	2023-10-02 19:02:50 -05:00
Matthew Poremba	63cabf2848	dev-amdgpu: Handle GPU atomics on host memory addresses It is possible to execute a GPU atomic instruction using a memory address that is in the host memory space (e.g, HMM, __managed__, hipHostMalloc'd address). Since these are in host memory they are passed to the SystemHub DmaDevice. However, this currently executes as a write packet without modifying data. This leads to hangs in applications that use atomics for forward progress (e.g., HeteroSync). It is not clear where these are handled on a real GPU, but they are certianly not handled by the software stack nor driver, so they must be handled in hardware and therefore implemented in gem5. Handling for atomics in the SystemHub makes the most sense. To make atomics work a few extra changes need to be made to the SystemHub. (1) The atomic is implemented as a host memory read, followed by calling the AtomicOpFunctor, followed by a write. This requires a second event to handle read response, performing atomic, and issuing a write. (2) Atomics must be serialized otherwise two atomics might return the same value which is incorrect. This patch adds serialization logic for all request types to the same address to handle this. (3) With the added complexity of the SystemHub, a new debug flag explicitly for SystemHub is added. Testing done: The heterosync application with input "sleepMutex 10 16 4" previously hung before this patch. It passes with the patch applied. This application tests both (1) and (2) above, as it allocates locks with hipHostMalloc and has multiple workgroups sending an atomic request in the same Tick, verifying the serialization mechanism. Change-Id: Ife84b30037d1447dd384340cfeb06fdfd472fff9	2023-09-20 13:52:25 -05:00
Matthew Poremba	addba01d29	configs,dev-amdgpu: Add PCI express capability info The ROCm stack requires PCI express atomics. Currently the first PCI CapabilityPtr does not point to anything, which signals to the OS (Linux) that this is an early generation PCI device. As PCI express atomics were introduced later, the CapabilityPtr needs to point to at least a PCI express capability structure. This capability is defined as 0x10 in Linux. We additionally set the PCI atomic based bits and implement device specific PCI configuration space reads and writes to the amdgpu device. With this commit, the output of simulation when loading the amdgpu driver no longer outputs "PCIE atomics not supported". Further, an application which uses PCIe atomics (PyTorch with a reduce_sum kernel) now makes further progress. Change-Id: I5e3866979659a2657f558941106ef65c2f4d9988	2023-08-24 09:10:35 -05:00
Matthew Poremba	3b35e73eb8	dev-amdgpu: Implement SDMA constant fill This SDMA packet is much more common starting around ROCm 5.4. Previously this was mostly used to clear page tables after an application ended and was therefore left unimplemented. It is now used for basic operation like device memsets. This patch implements constant fill as it is now necessary. Change-Id: I9b2cf076ec17f5ed07c20bb820e7db0c082bbfbc	2023-07-30 13:17:05 -05:00
Ranganath (Bujji) Selagamsetty	ede4d89a83	arch-vega, dev-amdgpu: Fix for memory leaks When using the new operator, delete should be called on any allocated memory after it's use is complete. Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019	2023-07-28 19:14:46 -05:00
Matthew Poremba	079fc47dc2	dev-amdgpu: Perform frame writes atomically The PCI read/write functions are atomic functions in gem5, meaning they expect a response with a latency value on the same simulation Tick. For reads to a PCI device, the response must also include a data value read from the device. The AMDGPU device has a PCI BAR which mirrors the frame buffer memory. Currently reads are done atomically, but writes are sent to a DMA device without waiting for a write completion ACK. As a result, it is possible that writes can be queued in the DMA device long enough that another read for a queued address arrives. This happens very deterministically with the AtomicSimpleCPU and causes GPUFS to break with that CPU. This change makes writes to the frame BAR atomic the same as reads. This avoids that problem and as a result the AtomicSimpleCPU can now load the driver for GPUFS simulations. Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71898 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>	2023-06-29 19:56:49 +00:00
Matthew Poremba	6b4a1020be	configs,dev-amdgpu: GPUFS MI200/gfx90a support Add support for MI200-like device. This includes adding PCI IDs and new MMIOs for the device, a different MAP_PROCESS packet, and a different calculation for the number of VGPRs. Change-Id: I0fb7b3ad928826beaa5386d52a94ba504369cb0d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70317 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-05-25 19:14:32 +00:00
Matthew Poremba	4d18546bfb	dev-amdgpu: Update SDMA checkpointing Patch https://gem5-review.googlesource.com/c/public/gem5/+/70040 added support for a variable number of SDMA engines to support newer GPU models. As part of this an SDMA IDs map was added to map from SDMA ID number to the SDMA SimObject pointer. In order to get the correct pointer in unserialize now, we need to store the ID in the checkpoint and use that to index the new map. We can't simply assign using the loop variable as the SDMAs might not be in order in the checkpoint and additionally the checkpoint contains both the gfx and page offset for the SDMA engines, so each SDMA is inserted into the SDMA offset map (sdmaEngs) twice. Change-Id: I08e9a8d785f467b6eebff8ab0a9336851c87258d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70878 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-05-23 14:28:16 +00:00
Matthew Poremba	08644a7670	dev-amdgpu: Fix nbio psp ring assert The size of the packet changes between ROCm 4.x and ROCm 5.x. Change how the address is set based on the incoming packet size so that both versions continue to work for now. Change-Id: I91694e4760198fd9129e60140df4e863666be2e2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70677 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2023-05-22 15:08:11 +00:00
Melissa Jost	dd5b1a674e	dev-amdgpu: Remove unused psp_ring_retval integer This change addresses the compiler failures that have been causing any GCN3_X86 build to fail. https://jenkins.gem5.org/job/compiler-checks/589/ Change-Id: Ifd8e2ef89549752ca4aedf0bc9fa47e831a822d3 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70217 Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-05-02 16:46:01 +00:00
Matthew Poremba	316538bf8a	dev-amdgpu: Enable more GPUs with device specific registers Currently gem5 assumes the amdgpu device to be Vega10. In order to support more devices we need to handle situations where different registers and addresses have the same functionality but different offsets on different devices. This changeset adds an NBIO class to handle device discovery and driver initialization related tasks, pulling them out of the AMDGPUDevice class. The offsets used for MMIOs are reworked slightly to use offsets rather than absolute addresses. This is because we cannot determine the absolute address in the constructor since the BAR has not been assigned by the OS yet. Change-Id: I14b364374e086e185978334425a4e265cf2760d0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70041 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-04-28 00:48:35 +00:00
Matthew Poremba	8b91ac6f8d	dev-amdgpu: Refactor MMIO interface for SDMA engines Currently the amdgpu simulated device is assumed to be a Vega10. As a result there are a few things that are hardcoded. One of those is the number of SDMAs. In order to add a newer device, such as MI100+, we need to enable a flexible number of SDMAs. In order to support a variable number of SDMAs and with the MMIO offsets of each device being potentially different, the MMIO interface for SDMAs is changed to use an SDMA class method dispatch table with forwards a 32-bit value from the MMIO packet to the MMIO functions in SDMA of the format `void method(uint32_t)`. Several changes are made to enable this: - Allow the SDMA to have a variable MMIO base and size. These are configured in python. - An SDMA class method dispatch table which contains the MMIO offset relative to the SDMA's MMIO base address. - An updated writeMMIO method to iterate over the SDMA MMIO address ranges and call the appropriate SDMA MMIO method which matches the MMIO offset. - Moved all SDMA related MMIO data bit twiddling, masking, etc. into the MMIO methods themselves instead of in the writeMMIO method in SDMAEngine. Change-Id: Ifce626f84d52f9e27e4438ba4e685e30dbf06dbc Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70040 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-04-28 00:48:35 +00:00
Matthew Poremba	6c1b95ea41	dev-amdgpu: Default MMIO reads when previously written If an MMIO was previously written and the driver reads it, we should return the value that was previously read. This overwrites the MMIO trace value which is the last resort fallback for finding an MMIO value. This is needed to initialize newer GPU devices in gem5. Change-Id: Ida2435290b706288e88518b5d920691cdb6dcc09 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70039 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-04-28 00:48:35 +00:00
Matthew Poremba	9c3107c762	dev-amdgpu,configs: Add human readable names for different GPUs Add a human readable string for GPU device names rather than using the device ID in the code. This is intended to make code more readable. Change-Id: Id3ea74ca37422b1f4a0f09e5a9522d37b5998c1a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70038 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2023-04-28 00:48:35 +00:00
Vishnu Ramadas	f5af8b5876	dev-amdgpu: Add a few MQD attributes to GPUFS checkpoint During GPUFS checkpoint restore, doorbells callbacks are created based on certain MQD attributes. These callbacks are required to create new SDMA doorbells. If these attributes are not present in the checkpoint, the restore hangs indefinitely waiting for ioctl calls that access these doorbells to finish execution. This commit adds the attributes required for checkpoint restore to proceed. Change-Id: Id3d1b7a2627d4c50133d923096495957a233f675 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70077 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com>	2023-04-27 21:15:46 +00:00
Matthew Poremba	c597361a6b	dev-amdgpu: Add writeROM method For non-KVM CPUs the VBIOS memory falls into an I/O hole and therefore gets routed to the PIO bus in gem5. This gets routed to the GPU in the case of a ROM write. We write to the ROM as a way to "load" the VBIOS without creating holes in the KVM VM. This write method allows the same scripts as KVM to be used by writing to the ROM area and overwriting what might already be there from the --gpu-rom option. Change-Id: I8c2d2aa05a823569a774dfdd3bf2d2e773f38683 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70037 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2023-04-22 19:57:26 +00:00
Vishnu Ramadas	8b7e55339a	dev-amdgpu: Add GART translations to GPUFS checkpoint Earlier, the GART entries were not being checkpointed. Therefore, during checkpoint restore, certain SDMA instances were initialized with incorrect addresses that led to incorrect behavior. This commit checkpoints the GART entries and restores them. Change-Id: I5464a39ed431e482ff7519b89bd5b664fd992ccf Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69299 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-04-03 22:29:10 +00:00
Vishnu Ramadas	65e0bd6eb4	dev-amdgpu: Added PM4MapQueues to GPUFS checkpoint The GPUFS checkpoint restoration mechanism expects to find a PM4MapQueues packet in the checkpoint. Since this was not being checkpointed, the restore phase retrieved a null packet which led to a segmentation fault. This commit adds PM4MapQueues to the checkpoint and restores it when deserializing the checkpoint Change-Id: Ib74a9f36fe89d740a74f94314ada41ecc363abe9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69298 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>	2023-04-03 22:28:57 +00:00
Matthew Poremba	ea9239ae09	dev-amdgpu: Update deprecated ports Change-Id: Icbc5636c33b437c7396ee27363eed1cf006f8882 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67837 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-02-14 18:57:33 +00:00
Matthew Poremba	39b5b5e511	dev-amdgpu: Fix address in POLL_REGMEM SDMA packet The address for the POLL_REGMEM packet should not be shifted when the mode is 1 (memory). Relevant driver code below is not shifting the address. The shift is causing a page fault due to the incorrect address. This changeset removes the shift so the correct address is translated. https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/ roc-4.3.x/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c#L903 Change-Id: I7a0ec3245ca14376670df24c5d3773958c08d751 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67877 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-02-14 15:36:56 +00:00
Matthew Poremba	eee42275ee	dev-amdgpu: Writeback RLC queue MQD when unmapped Currently when RLC queues (user mode queues) are mapped, the read/write pointers of the ring buffer are set to zero. However, these queues could be unmapped and then remapped later. In that situation the read/write pointers should be the previous value before unmapping occurred. Since the read pointer gets reset to zero, the queue begins reading from the start of the ring, which usually contains older packets. There is a 99% chance those packets contain addresses which are no longer in the page tables which will cause a page fault. To fix this we update the MQD with the current read/write pointer values and then writeback the MQD to memory when the queue is unmapped. This requires adding a pointer to the MQD and the host address of the MQD where it should be written back to. The interface for registering RLC queue is also simplified. Since we need to pass the MQD anyway, we can get values from it as well. Fixes b+tree and streamcluster from rodinia (when using RLC queues). Change-Id: Ie5dad4d7d90ea240c3e9f0cddf3e844a3cd34c4f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65791 Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2022-12-01 21:04:05 +00:00
Hoa Nguyen	eac06ad681	python: Fix multiline quotes in a single line An example case, ```python mem_side_port = RequestPort( "This port sends requests and " "receives responses" ) ``` This is the residue of running the python formatter. This is done by finding all tokens matching the regex `"\s"(?![.;"])` and manually replacing them by empty strings. Change-Id: Icf223bbe889e5fa5749a81ef77aa6e721f38b549 Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66111 Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu> Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2022-11-29 23:44:38 +00:00
Matthew Poremba	33a36d35de	dev-amdgpu: Store SDMA queue type, use for ring ID Currently the SDMA queue type is guessed in the trap method by looking at which queue in the engine is processing packets. It is possible for both queues to be processing (e.g., one queue sent a DMA and is waiting then switch to another queue), triggering an assert. Instead store the queue type in the queue itself and use that type in trap to determine which ring ID to use for the interrupt packet. Change-Id: If91c458e60a03f2013c0dc42bab0b1673e3dbd84 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65691 Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-11-18 15:30:37 +00:00
Matthew Poremba	623e2d3dac	dev-amdgpu: Handle ring buffer wrap for PM4 queue Change-Id: I27bc274327838add709423b072d437c4e727a714 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65431 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2022-11-09 15:47:50 +00:00
Matthew Poremba	c8d687b05c	dev-amdgpu: Fix SDMA ring buffer wrap around The current SDMA wrap around handling only considers the ring buffer location as seen by the GPU. Eventually when the end of the SDMA ring buffer is reached, the driver waits until the rptr written back to the host catches up to what the driver sees before wrapping around back to the beginning of the buffer. This writeback currently does not happen at all, causing hangs for applications with a lot of SDMA commands. This changeset first fixes the sizes of the queues, especially RLC queues, so that the wrap around occurs in the correct place. Second, we now store the rptr writeback address and the absoluate (unwrapped) rptr value in each SDMA queue. The absolulte rptr is what the driver sends to the device and what it expects to be written back. This was tested with an application which basically does a few hundred thousand hipMemcpy() calls in a loop. It should also fix the issue with pannotia BC in fullsystem mode. Change-Id: I53ebdcc6b02fb4eb4da435c9a509544066a97069 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65351 Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-11-09 04:11:35 +00:00

1 2 3

113 Commits