derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matthew Poremba	db0d5f19cf	dev-amdgpu: Add cleanup events for SDMA SDMA packets which use dmaVirtWrites call their completion event before the write takes place in the Ruby protocol. This causes a use-after-free issue corruption random memory locations leading to random errors. This commit adds a cleanup event for each packet that uses DMA and sets the cleanup latency as 10000 ticks. In atomic mode, the writes complete exactly 2000 ticks after the completion event is called and therefore a fixed latency can be used. This is not tested with timing mode, which does not work with GPUFS at the moment, so a warning is added to give an idea where to look in case the same issue occurs once timing mode is supported. Change-Id: I9ee2689f2becc46bb7794b18b31205f1606109d8	2024-08-07 14:37:49 -07:00
Matthew Poremba	0d0b68266c	dev-amdgpu: Fix bad free in SDMA The SDMA engine copies data in chunks. It currently uses the pointer returned from new[] and manipulates it using pointer arithmetic. This modified pointer is then passed to the completion function which deletes the pointer. Since it is not the original pointer allocated by new[] this triggers issues in ASAN. Change-Id: I03ccf026633285e75005509445c62fcbda8eb978	2024-08-07 12:54:45 -07:00
Alexander Richardson	abbb94af8b	dev-arm: Fix -Wdeprecated-copy warning (#1197 ) Clang warns as follows: `warning: definition of implicit copy constructor for 'TranslResult' is deprecated because it has a user-declared copy assignment operator` Change-Id: Ic701d8522aac75d569f4f513f54de91f76a17e48	2024-06-05 12:36:38 +01:00
Hoa Nguyen	40ef8f3afb	dev: Remove an extra file in virtio (#1191 ) `src/dev/virtio/VirtIORng 2.py` is identical to `src/dev/virtio/VirtIORng.py`, and the former does not appear in any build script. Change-Id: I9c5f1b1a3809d1c7028b630c32310e540613e232 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2024-06-04 08:40:41 -07:00
Matthew Poremba	07f6b7c59c	dev-amdgpu: Fix pending PCI RLC doorbell (#1157 ) SDMA RLC queues do not currently remove their doorbell mapping. This can cause issues re-registering the queue and prevents the pending doorbells feature from working. In addition the data value of the doorbell (the ring buffer rptr) is not saved, leading to UB when this workaround is used. This commit removes the doorbell mapping from the gpu device when the SDMA engine unmaps an RLC queue and copies the next doorbell value to the pending packet as was originally intended. Change-Id: Ifd551450f439c065579afcf916f8ff192e7598ab	2024-05-29 07:15:46 -07:00
Harshil Patel	33cebe9376	dev: add reset wrap mode to mouse.cc (#1149 ) This change fixes #1148 I have only added an acknowledged return, as we dont ahve remote and wrap mode so it can only be in stream mode. Change-Id: I1882042d873ff0e9465c9491238554c8fbb9aa76	2024-05-21 10:55:03 -07:00
Matthew Poremba	8be5ce6fc9	dev-amdgpu,configs,gpu-compute: Add gfx942 version This is the version for MI300. For the most part, it is the same as MI200 with the exception of architected flat scratch (not yet implemented in gem5) and therefore a new version enum is required. Change-Id: Id18cd7b57c4eebd467c010a3f61e3117beb8d58a	2024-05-15 12:08:41 -07:00
Matthew Poremba	29f63f630b	dev-amdgpu: Correct missing GART warning SDMA ptePde packets are generating a warning that a GART address is missing, causing a wrong address to be clobbered by the operation. This commit fixes this by converting the GART address when the queue is running in privledged mode, which is the only mode allowed to use GART addresses. This removes the warnings and writes to the correct memory region. Change-Id: I64acac308db2431c5996b876bf4cda704f51cf25	2024-05-03 14:31:17 -07:00
Matthew Poremba	2703fb5699	gpu-compute: Fix valgrind memleak complaints Fixes several memory leaks, mostly of small and medium severity. Fixes mismatched new/new[] and delete/delete[] calls. Change-Id: Iedafc409389bd94e45f330bc587d6d72d1971219	2024-05-03 14:29:31 -07:00
Bartek Gąsiorzewski	84cba2a8a8	dev: Fix interrupt logic in uart8250 (#1009 ) Hi, we've noticed some issues with the Uart8250 device when using it as the Linux console. Sometimes the Uart interrupt would remain constantly posted, so Linux would continue to try and handle it, effectively resulting in an infinite loop. With this patch, I'm no longer seeing any issues, but my testing has been limited to configurations and workloads we're interested in at Imagination, so please let me know if there's some other tests I should run or if you notice any other issues. This patch fixes several issues with interrupt posting and clearing in the uart8250 device. The "status" member variable and the console interrupt should be kept in sync. However, in one code path in readIir, the interrupt bit was being cleared in the status variable but not in the platform controller. Additionally, in some code paths, the interrupts would be cleared in the status variable and in the interrupt controller, but a future interrupt would remain scheduled, causing a spurious interrupt and setting a bit in status to 1. These issues can confuse the kernel and result in an ininite interrupt handling loop. Another issue is related to the fact that there are two interrupt causes (TX and RX) and both of them can be valid at the same time. When one of them becomes no longer valid, we should check the status of the other one before clearing the interrupt. This patch addresses the issues listed above and refactors the interrupt clearing logic to reduce repetition.	2024-04-17 11:27:39 -07:00
Matthew Poremba	7e2d8dee42	mem,gpu-compute: Implement GPU TCC directed invalidate (#1011 ) The GPU device currently supports large BAR which means that the driver can write directly to GPU memory over the PCI bus without using SDMA or PM4 packets. The gem5 PCI interface only provides an atomic interface for BAR reads/writes, which means the values cannot go through timing mode Ruby caches. This causes bugs as the TCC cache is allowed to keep clean data between kernels for performance reasons. If there is a BAR write directly to memory bypassing the cache, the value in the cache is stale and must be invalidated. In this commit a TCC invalidate is generated for all writes over PCI that go directly to GPU memory. This will also invalidate TCP along the way if necessary. This currently relies on the driver synchonization which only allows BAR writes in between kernels. Therefore, the cache should only be in I or V state. To handle a race condition between invalidates and launching the next kernel, the invalidates return a response and the GPU command processor will wait for all TCC invalidates to be complete before launching the next kernel. This fixes issues with stale data in nanoGPT and possibly PENNANT.	2024-04-15 13:18:01 -07:00
Giacomo Travaglini	bdcffdd0f0	dev-arm: Do not mark the MpamMSC as abstract (#1030 ) This prevents its instantiation Change-Id: I775a64904a01cf36e4cc1e0cd45765f03325c5ca Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-04-15 09:40:22 -07:00
Matthew Poremba	1d64669473	mem,gpu-compute: Implement GPU TCC directed invalidate The GPU device currently supports large BAR which means that the driver can write directly to GPU memory over the PCI bus without using SDMA or PM4 packets. The gem5 PCI interface only provides an atomic interface for BAR reads/writes, which means the values cannot go through timing mode Ruby caches. This causes bugs as the TCC cache is allowed to keep clean data between kernels for performance reasons. If there is a BAR write directly to memory bypassing the cache, the value in the cache is stale and must be invalidated. In this commit a TCC invalidate is generated for all writes over PCI that go directly to GPU memory. This will also invalidate TCP along the way if necessary. This currently relies on the driver synchonization which only allows BAR writes in between kernels. Therefore, the cache should only be in I or V state. To handle a race condition between invalidates and launching the next kernel, the invalidates return a response and the GPU command processor will wait for all TCC invalidates to be complete before launching the next kernel. This fixes issues with stale data in nanoGPT and possibly PENNANT. Change-Id: I8e1290f842122682c271e5508a48037055bfbcdf	2024-04-10 11:35:25 -07:00
Bobby R. Bruce	3af15a535e	mem-cache, configs, arch-arm: Handle partitioning policies through a PartitionManager (#966 ) This PR is offloading some of the partitioning logic to the partitioning manager, effectively changing the partitioning interface. Rather than always relying on the PartitionFieldExtention data structure to convey partition IDs, we make it implementation defined by introducing the partitioning manager abstraction. We want user to be able to extract the partitionId more flexibly and this requires using a SimObject. Users can extend the PartitioningManager, overriding the readPacketPartitionId, therefore providing their own mean of injecting/extracting partitioning data from a packet	2024-04-08 16:05:17 -07:00
Giacomo Travaglini	bdb08a5b6c	arch-arm, dev-arm: Fix typo in PartitionFieldExtention name Rename PartitionFieldExtention into PartitionFieldExtension Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Change-Id: I8072adf78d81b94c5b8bc61a317c0238cf0a9fd9	2024-04-07 11:45:57 +01:00
Giacomo Travaglini	dd45e1c319	misc: Make PartitionFieldExtention private to Arm The new ISA-agnostic interface is the PartitionManager. We therefore make the PartitionFieldExtention private to the Arm implementation of memory partitioning (FEAT_MPAM) Any other partitioning implementation should override the PartitionManager::readPacketPartitionID to provide a mean for extracting partitioning data (partition_id) from the incoming Packet. With this commit we also define an MPAM MSC which is supposed to be the partitioning manager for the Memory System Component Change-Id: I6959ace0c0cbca549dcc1aacd53dff223b5fe328 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-04-07 11:45:57 +01:00
Giacomo Travaglini	63706f04b5	dev: Remove duplicate virtio files (#976 ) Remove the following files: * src/dev/virtio/rng 2.cc * src/dev/virtio/rng 2.hh Which were a copy of rng.hh and rng.cc. Probably added to the repository by accident. They were not compiled by scons Change-Id: I9d1da19cc243c513ab7af887b1b6260d8e361b57 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-28 14:32:11 +00:00
Matthew Poremba	823b5a6eb8	dev-amdgpu: Support multiple CPs and MMIO AddrRanges Currently gem5 assumes that there is only one command processor (CP) which contains the PM4 packet processor. Some GPU devices have multiple CPs which the driver tests individually during POST if they are used or not. Therefore, these additional CPs need to be supported. This commit allows for multiple PM4 packet processors which represent multiple CPs. Each of these processors will have its own independent MMIO address range. To more easily support ranges, the MMIO addresses now use AddrRange to index a PM4 packet processor instead of the hard-coded constexpr MMIO start and size pairs. By default only one PM4 packet processor is created, meaning the functionality of the simulation is unchanged for devices currently supported in gem5. Change-Id: I977f4fd3a169ef4a78671a4fb58c8ea0e19bf52c	2024-03-21 10:13:55 -05:00
Matthew Poremba	39153cd234	dev-amdgpu: Implement PCIe indirect read/write PCIe can read/write to any 32-bit address using the PCI index/index2 registers as an address and then reading/writing the corresponding data/data2 register. This commit adds this functionality and removes one magic value being written to support GPU POST. This feature is disabled for Vega10 which relies on an MMIO trace for too many values to implement in the MMIO interface. Change-Id: Iacfdd1294a7652fc3e60304b57df536d318c847b	2024-03-21 10:13:55 -05:00
Matthew Poremba	047c194780	dev-amdgpu: Implement SRBM write The SRBM write packets where previously not required. This commit implements SRBM writes to set a register by using the new setRegVal interface. SRBM writes seem to be used for SRIOV enabled devices. Change-Id: I202653d339e882e8de59d69a995f65332b2dfb8c	2024-03-21 10:10:01 -05:00
Matthew Poremba	6bbde8fbb8	dev-amdgpu: Rework handling of unknown registers The top level AMDGPUDevice currently reads/writes all unknown registers to/from a map containing the previously written value. This is intended as a way to handle registers that are not part of the model but the driver requires for functionality. Since this is at the top level, it can mask changes to register values which do not go through the same interface. For example, reading an MMIO, changing via PM4 queue, and reading again returns the stale cached value. This commit removes the usage of the regs map in AMDGPUDevice, implements some important MMIOs that were previously handled by it, and moves the unknown register handling to the NBIO aperture only. To reduce the number of additional MMIOs to implement, the display manager in vega10 is now disabled. Change-Id: Iff0a599dd82d663c7e710b79c6ef6d0ad1fc44a2	2024-03-21 10:10:01 -05:00
Matthew Poremba	009cec56e0	dev-amdgpu: Check for SDMA copies to GART range The SDMA engine can potentially be used to write to the GART address range. Since gem5 has a shadow copy of the GART table to avoid sending functional reads to device memory, the GART table must be updated when copying to the GART range. This changeset adds a check in the VM for GART range and implements the SDMA copy packet writing to the GART range. A fatal is added to write and ptePde, which are the only other two ways to write to memory, as using these packets to update the GART table has not been observed. Change-Id: I1e62dfd9179cc9e987659e68414209fd77bba2bd	2024-03-21 10:10:01 -05:00
Matthew Poremba	998709d4fc	dev-amdgpu: Improve PM4 write data packet The write data packet can write multiple dwords but currently always assumes there is one dword, which can cause some write data to be missed. This case is not common, but the number of dwords is implicitly defined in the PM4 header. This changeset passes the PM4 header to write data so that the correct number of dwords can be determined. For now we assume no page crossing when writing multiple dwords as the driver should be checking for that. Change-Id: I0e8c3cbc28873779f468c2a11fdcf177210a22b7	2024-03-21 10:10:01 -05:00
Matthew Poremba	c045c68540	dev-amdgpu: Add node_id to interrupt handler The ROCm 6.0 driver adds a node_id field to interrupts which must match before passing on the interrupt to be cleared by the cookie from gem5's interrupt handler implementation. Add this field and enable for gfx942. The usage of the field can be seen in event_interrupt_isr_v9_4_3 at https://github.com/ROCm/ROCK-Kernel-Driver/blob/roc-6.0.x/drivers/ gpu/drm/amd/amdkfd/kfd_int_process_v9.c#L449 Change-Id: Iae8b8f0386a5ad2852b4a3c69f2c161d965c4922	2024-03-21 10:10:01 -05:00
Giacomo Travaglini	0ec8cf8d05	dev-arm: Fix SMMUv3 DTB autogen (#934 ) Replacing FdtProperyWords (expecting an integer) with FdtPropertyStrings Change-Id: Icd1cf00704e253c88ac9b1d69c3cf946d2a8ca70 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-14 15:42:57 +00:00
Giacomo Travaglini	5161195db5	dev-arm: Remove the SMMUv3 irq_interface_enable parameter The SMMU_IRQ_CTRL had been made optionally writeable by a prior patch [1] even if interrupts were not supported in the SMMUv3 model. As we are partially enabling IRQ support, we remove this option and we make the SMMU_IRQ_CTRL always writeable [1]: https://gem5-review.googlesource.com/c/public/gem5/+/38555 Change-Id: Ie1f9458d583a5d8bcbe450c3e88bda6b3c53cf10 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-08 13:53:44 +00:00
Giacomo Travaglini	d63282a9da	dev-arm: Implement wired interrupt for SMMU event queue See https://github.com/orgs/gem5/discussions/898 The SMMUv3 Event Queue is basically unused at the moment. Whenever a transaction fails we actually abort simulation. The sendEvent method could be used to actually report the failure to the driver but it is lacking interrupt support to notify the PE there is an event to handle. The SMMUv3 spec allows both wired and MSI interrupts to be used. We add the eventq_irq SPI param to the SMMU object and we draft an initial sendInterrupt utility that makes use of it whenever it is needed. Change-Id: I6d103919ca8bf53794ae4bc922cbdc7156adf37a Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-08 13:53:21 +00:00
Giacomo Travaglini	63c815b5fc	dev-arm: Do not panic in the SMMUv3 for fauting transactions Rely on the architected solution instead of aborting simulation. This means handling writes to the Event queue to signal managing software there was a fault in the SMMU Change-Id: I7b69ca77021732c6059bd6b837ae722da71350ff Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-08 11:29:22 +00:00
Giacomo Travaglini	7d5d1cd9c8	dev-arm: Rewrite SMMUEvent The struct fields of the SMMUEvent were not matching the SMMUv3 specs. This was "not an issue" as events have been implicitly disabled until now (every translation error was aborting simulation) With generateEvent we automatically construct a SMMU event from a translation result. Change-Id: Iba6a08d551c0a99bb58c4118992f1d2b683f62cf Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-08 11:29:22 +00:00
Giacomo Travaglini	ef10db5a3e	dev-arm: Record additional information in the TranslResult A faulting translation should return additional information (other than the fault type). This will be used by future patches to properly populate the SMMU event record of the event queue As we currenlty support two faults only: 1) F_TRANSLATION 2) F_PERMISSION We add to TranslResult the relevant fault information only: type, class, stage and ipa Change-Id: I0a81d5fc202e1b6135cecdcd6dfd2239c2f1ba7e Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-08 11:29:22 +00:00
Giacomo Travaglini	3d1f68f205	dev-arm: Return translation fault in doReadCD Reading the Context Descriptor (CD) might require a stage2 translation. At the moment doReadCD does not check for the return value of the translateStage2. This means that any stage2 fault will be silently discarded and an invalid address will be used/returned. By returning a translation result we make sure any error happening in the second stage of translation will be properly flagged Change-Id: I2ecd43f7e23080bf8222bc3addfabbd027ee8feb Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-08 11:29:22 +00:00
Giacomo Travaglini	4a4b775985	dev-arm: Provide encapsulation by adding TranslResult::isFaulting We don't check the fault type directly. This will improve readability once the TranslResult class will be augmented with extra fields Change-Id: I5acafaabf098d6ee79e1f0c384499cc043a75a9d Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-08 11:29:22 +00:00
Giacomo Travaglini	c0e5d58a96	dev: RegisterBank addRegistersAt for fragmented reg banks (#902 ) One of the limitations of the RegBank class is that it does not allow you to pass a non-contiguous set of registers. Its simplest form will just accept an initializer list of registers and it will store them in sequence. A more refined version [1] will optionally accept an offset value to be passed alongside the register reference. This is not meant to be used by the register bank to store the register at the provided offset. It is rather used by the bank to sanity check the register sits exactly at the provided range. The way to work around this for a fragemented register space is to explicitly allocate RAZ/RAO blocks as registers and to pass them to addRegisters together with the others. (See the SysSecCtrl [2] as an example) This makes it a bit tedious to model a register bank with gaps between its registers. First, the exact number and position of the gaps needs to be extraced from a spec. These sometimes report only implemented registers and their offset, and omit to document gaps/reserved space. So a developer needs to manually add register offset and size to check if all registers are contiguous. Second, these reserved register blocks need to be instantiated in the bank adding boilerplate code and affecting readibility. For these reasons we add a new registration method, called addRegistersAt. It reuses the RegisterAdder class but this time the offset field is really used to instruct the bank where the register should be mapped. The method is templated and the template parameter tells the bank which register type should be used to fill the remaining space. We make the RegBank the owner of this filler space (registers are generated internally within addRegistersAt). [1]: https://github.com/gem5/gem5/blob/stable/src/dev/reg_bank.hh#L106 [2]: https://github.com/gem5/gem5/blob/stable/src/dev/arm/ssc.cc#L48 Change-Id: I614ae6e9eeb40b365ac9b6dd8b75abbfdb9cb687 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-01 15:32:40 +00:00
Yu-Cheng Chang	bcf455755e	arch-riscv,dev: Update the PLIC implementation (#886 ) Update the PLIC based on the [riscv-plic-spec](https://github.com/riscv/riscv-plic-spec) in the PR: - Support customized PLIC hardID and privilege mode configuration - Backward compatable with the n_contexts parameter, will generate the config like {0,M}, {0,S}, {1,M} ... Change-Id: Ibff736827edb7c97921e01fa27f503574a27a562	2024-02-26 10:32:53 -08:00
wmin0	4e75e35a33	dev-arm: Remove the dependency of Platform for ArmSigInterruptPin (#878 ) ArmSigInterruptPin don't send the interrupt to GIC. Instead it sends the interrupt to the irq specified in Param. When using ArmSigInterruptPin, we shouldn't ask users to provide "Platform" since it doesn't need it. To reduce the confusion, this change removes the dependency of Platform for ArmSigInterruptPin. Change-Id: I0ee507ed1c08b4fa6d3e384e28732f3acb4f6892	2024-02-20 08:50:27 +00:00
kroarty-lanl	197be3a0dd	dev: Fix off-by-one in IDE controller PCI register allocation (#824 ) The PCI configuration space is 256 bytes, yet because the PCI_CONFIG_SIZE macro is 0xff, the final register allocation in the IDE controller only allocated up to byte 255. Change-Id: I1aef2cad9df366ee8425edb410037061eb29ae33	2024-02-01 10:14:28 -08:00
Matthew Poremba	7f71477f15	dev-amdgpu: Limit SDMA NOP count to wptr boundary (#806 ) If the NOP count of an SDMA NOP packet goes beyond the wptr address, the queue decode method will loop infinitely. If a packet comes in with a bad count this causes gem5 to hang. This change advances the rptr one dword at a time until either reaching the NOP count or when rptr == wptr to prevent this issue. Change-Id: Ib2c0f74a477bff27890c9c064bb4190e76e513bd	2024-01-25 15:35:35 -08:00
Ivana Mitrovic	24e0d71034	arch-gcn3: Remove gcn3 (#781 ) Related to issue #703 , this PR removes GCN3 related files and updates source code, documentation, and tests to switch over to Vega is that was not done already. Highlights are: - Remove all src/arch/amdgpu/gcn3 files and update Kconfigs. - Remove references to GCN3 and replace with Vega where applicable. - Update the build targets in the gcn-gpu Docker. This will need to be rebuilt but not urgently. - Remove the GCN3 tag in testlib. Most tests seem to be using Vega already, so that commit is small.	2024-01-25 10:14:46 -08:00
Matthew Poremba	0ac110ac95	dev-amdgpu: Check privledge bit for SDMA RLC queues (#792 ) By default all SDMA queues are privileged queues, meaning the addresses in SDMA packets use the privileged translation tables. RLC queues (sometimes called user queues) are not necessarily privileged and might use user translation tables. RLC queues are used more often in ROCm 6.0 exposing an issue with invalid translations with RLC queues. This changeset checks the priv bit in the SDMA MQD when an RLC queue is mapped. Each packet type which uses an address then checks the bit before performing translation. Tested with daily/weekly tests with a ROCm 6.0 disk image and tests are passing. Change-Id: I6122fbc194e8d6f5d38e81f1b0e11646d90e0ea0	2024-01-24 07:25:43 -08:00
Matthew Poremba	63caa780c2	misc: Remove all references to GCN3 Replace instances of "GCN3" with Vega. Remove gfx801 and gfx803. Rename FIJI to Vega and Carrizo to Raven. Using misc since there is not enough room to fit all the tags. Change-Id: Ibafc939d49a69be9068107a906e878408c7a5891	2024-01-17 11:11:06 -06:00
Bobby R. Bruce	d11c40dcac	misc: Run `pre-commit run --all-files` This ensures `isort` is applied to all files in the repo. Change-Id: Ib7ced1c924ef1639542bf0d1a01c5737f6ba43e9	2023-11-29 22:06:41 -08:00
Bobby R. Bruce	d94d6017b0	scons: Change to Kconfig build system (#69 ) The PR contains the following changes: - Move all of the config options(`env["CONF"]`) from SConsopt to Kconfig files - Update `build_opts` files to Kconfig option formats - The Ruby Protocol files are only built if `RUBY=y` - Remove the default-default build target - Kconfig commands are included in the PR: - defconfig - setconfig - meunconfig - guiconfig - listnewconfig - savedefconfig - oldconfig - olddefconfig - Add the `python3-tk` package dependencies Jira issue: https://gem5.atlassian.net/browse/GEM5-1211	2023-11-27 13:59:18 -08:00
Matthew Poremba	9e6a87e67a	dev-amdgpu: Writeback PM4 queue rptr when empty (#597 ) The GPU device keeps a local copy of each ring buffers read pointer (rptr) to avoid constant DMAs to/from host memory. This means it needs to be periodically updated on the host side as the driver uses this to determine how much space is left in the queue and may hang if it believe the queue is full. For user-mode queues, this already happens when queues are unmapped. For kernel mode queues (e.g., HIQ, KIQ) the rptr is never updated leading to a hang. In this patch the rptr for all queues is reported back to the kernel whenever the queue reaches an empty state (rptr == wptr). Additionally to handle PM4 queue wrap-around, the queue processing function checks if the queue is not empty instead of rptr < wptr. This is state because the driver fills PM4 queues with NOP packets on initialization and when wrap around occurs. Change-Id: Ie13a4354f82999208a75bb1eaec70513039ff30f	2023-11-27 11:02:11 -08:00
Gabe Black	db3a6e8e84	scons: Use Kconfig to configure gem5. These are not yet consumed by anything, but convert all the settings from SCons variables to Kconfig variables. If you have existing SConsopts files which need to be converted, you should take a look at KCONFIG.md to learn about how kconfig is used in gem5. You should decide if any variables need to be available to C++ or kconfig itself, and whether those are options which should be detected automatically, or should be up to the user. Options which should be measured automatically should still be in SConsopts files, while user facing options should be added to new or existing Kconfig files. Generally, make sure you're storing c++/kconfig visible options in env['CONF'][...]. Also remove references to sticky_vars since persistent options should now be handled with kconfig, and export_vars since everything in env['CONF'] is now exported automatically. Switch SCons/gem5 to use Kconfig for configuration, except EXTRAS which is still a sticky SCons variable. This is necessary because EXTRAS also controls what config options exist. If it came from Kconfig itself, then there would be a circular dependency. This dependency could theoretically be handled by reparsing the Kconfig when EXTRAS directories were added or removed, but that would be complicated, and isn't supported by kconfiglib. It wouldn't be worth the significant effort it would take to add it, just to use Kconfig more purely. Change-Id: I29ab1940b2d7b0e6635a490452d05befe5b4a2c9	2023-11-23 08:26:10 +08:00
Bobby R. Bruce	23a22ed95c	dev-amdgpu: Add VMID map to checkpoint (#570 ) When restoring checkpoints for certain applications, gem5 tries to create new doorbells with a pre-existing queue ID and simulation crashes shortly after. This commit adds existing IDs to the GPU device's used VMID map so that new doorbells are aware of existing queue IDs and use a new ID. This ensures that queue IDs are unique after checkpoint restoration	2023-11-22 10:05:21 -08:00
Vishnu Ramadas	06161ded8c	dev-amdgpu: Add VMID map to checkpoint When restoring checkpoints for certain applications, gem5 tries to create new doorbells with a pre-existing queue ID and simulation crashes shortly after. This commit checkpoints the existing VMID map so that any new doorbells after restoration use a unique queue ID Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f	2023-11-20 21:19:17 -06:00
Bobby R. Bruce	08c0d1f27a	dev: Fix `std::min` type mismatch in reg_bank.hh https://github.com/gem5/gem5/pull/386 included two cases in "src/dev/reg_bank.hh" where `std:: min` was used to compare a an integer of type `size_t` and another of type `Addr`. This cause an error on my Apple Silicon Mac as this is a comparison between an "unsigned long" and an "unsigned long long" which (at least on my setup) was not permitted. To fix this issue the `reg_size` was changed from `size_t` to `Addr`, as well as it the types of the values it was derived from and the variable used to hold the return from the `std::min` calls. Change-Id: I31e9c04a8e0327d4f6f5390bc5a743c629db4746	2023-11-20 17:33:44 -08:00
Vishnu Ramadas	d19d6fc31e	dev-amdgpu: Add PM4 queue ID to GPU used VMID map When restoring checkpoints for certain applications, gem5 tries to create new doorbells with a pre-existing queue ID and simulation crashes shortly after. This commit adds existing IDs to the GPU device's used VMID map so that new doorbells are aware of existing queue IDs and use a new ID. This ensures that queue IDs are unique after checkpoint restoration Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f	2023-11-16 17:30:00 -06:00
hungweihsuG	83f1fe3fec	dev: add debug flag in register bank. (#386 ) Print extra logs for the full/partial read/write access to the registers through the register bank. The debug flag is empty by default and would not print anything. Test: run unittest of dev/reg_bank.test.xml to check the behavior would not affect the original functionality. run gem5 with debug flags and use m5term to poke on registers.	2023-11-15 10:04:46 -08:00
Jason Lowe-Power	71973b386e	gpu-compute,dev-hsa: ROCm 5.5+ support (#498 ) ROCm 5.5 support including: - Vendor packet completion signals - Queue remapping race condition fix - Backwards compatible GPR allocation - Fix transient readBlob fatal reading kernel descriptor	2023-11-06 10:51:37 -08:00

1 2 3 4 5 ...

1490 Commits