derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Giacomo Travaglini	bdcffdd0f0	dev-arm: Do not mark the MpamMSC as abstract (#1030 ) This prevents its instantiation Change-Id: I775a64904a01cf36e4cc1e0cd45765f03325c5ca Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-04-15 09:40:22 -07:00
Bobby R. Bruce	3af15a535e	mem-cache, configs, arch-arm: Handle partitioning policies through a PartitionManager (#966 ) This PR is offloading some of the partitioning logic to the partitioning manager, effectively changing the partitioning interface. Rather than always relying on the PartitionFieldExtention data structure to convey partition IDs, we make it implementation defined by introducing the partitioning manager abstraction. We want user to be able to extract the partitionId more flexibly and this requires using a SimObject. Users can extend the PartitioningManager, overriding the readPacketPartitionId, therefore providing their own mean of injecting/extracting partitioning data from a packet	2024-04-08 16:05:17 -07:00
Giacomo Travaglini	bdb08a5b6c	arch-arm, dev-arm: Fix typo in PartitionFieldExtention name Rename PartitionFieldExtention into PartitionFieldExtension Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Change-Id: I8072adf78d81b94c5b8bc61a317c0238cf0a9fd9	2024-04-07 11:45:57 +01:00
Giacomo Travaglini	dd45e1c319	misc: Make PartitionFieldExtention private to Arm The new ISA-agnostic interface is the PartitionManager. We therefore make the PartitionFieldExtention private to the Arm implementation of memory partitioning (FEAT_MPAM) Any other partitioning implementation should override the PartitionManager::readPacketPartitionID to provide a mean for extracting partitioning data (partition_id) from the incoming Packet. With this commit we also define an MPAM MSC which is supposed to be the partitioning manager for the Memory System Component Change-Id: I6959ace0c0cbca549dcc1aacd53dff223b5fe328 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-04-07 11:45:57 +01:00
Giacomo Travaglini	63706f04b5	dev: Remove duplicate virtio files (#976 ) Remove the following files: * src/dev/virtio/rng 2.cc * src/dev/virtio/rng 2.hh Which were a copy of rng.hh and rng.cc. Probably added to the repository by accident. They were not compiled by scons Change-Id: I9d1da19cc243c513ab7af887b1b6260d8e361b57 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-28 14:32:11 +00:00
Matthew Poremba	823b5a6eb8	dev-amdgpu: Support multiple CPs and MMIO AddrRanges Currently gem5 assumes that there is only one command processor (CP) which contains the PM4 packet processor. Some GPU devices have multiple CPs which the driver tests individually during POST if they are used or not. Therefore, these additional CPs need to be supported. This commit allows for multiple PM4 packet processors which represent multiple CPs. Each of these processors will have its own independent MMIO address range. To more easily support ranges, the MMIO addresses now use AddrRange to index a PM4 packet processor instead of the hard-coded constexpr MMIO start and size pairs. By default only one PM4 packet processor is created, meaning the functionality of the simulation is unchanged for devices currently supported in gem5. Change-Id: I977f4fd3a169ef4a78671a4fb58c8ea0e19bf52c	2024-03-21 10:13:55 -05:00
Matthew Poremba	39153cd234	dev-amdgpu: Implement PCIe indirect read/write PCIe can read/write to any 32-bit address using the PCI index/index2 registers as an address and then reading/writing the corresponding data/data2 register. This commit adds this functionality and removes one magic value being written to support GPU POST. This feature is disabled for Vega10 which relies on an MMIO trace for too many values to implement in the MMIO interface. Change-Id: Iacfdd1294a7652fc3e60304b57df536d318c847b	2024-03-21 10:13:55 -05:00
Matthew Poremba	047c194780	dev-amdgpu: Implement SRBM write The SRBM write packets where previously not required. This commit implements SRBM writes to set a register by using the new setRegVal interface. SRBM writes seem to be used for SRIOV enabled devices. Change-Id: I202653d339e882e8de59d69a995f65332b2dfb8c	2024-03-21 10:10:01 -05:00
Matthew Poremba	6bbde8fbb8	dev-amdgpu: Rework handling of unknown registers The top level AMDGPUDevice currently reads/writes all unknown registers to/from a map containing the previously written value. This is intended as a way to handle registers that are not part of the model but the driver requires for functionality. Since this is at the top level, it can mask changes to register values which do not go through the same interface. For example, reading an MMIO, changing via PM4 queue, and reading again returns the stale cached value. This commit removes the usage of the regs map in AMDGPUDevice, implements some important MMIOs that were previously handled by it, and moves the unknown register handling to the NBIO aperture only. To reduce the number of additional MMIOs to implement, the display manager in vega10 is now disabled. Change-Id: Iff0a599dd82d663c7e710b79c6ef6d0ad1fc44a2	2024-03-21 10:10:01 -05:00
Matthew Poremba	009cec56e0	dev-amdgpu: Check for SDMA copies to GART range The SDMA engine can potentially be used to write to the GART address range. Since gem5 has a shadow copy of the GART table to avoid sending functional reads to device memory, the GART table must be updated when copying to the GART range. This changeset adds a check in the VM for GART range and implements the SDMA copy packet writing to the GART range. A fatal is added to write and ptePde, which are the only other two ways to write to memory, as using these packets to update the GART table has not been observed. Change-Id: I1e62dfd9179cc9e987659e68414209fd77bba2bd	2024-03-21 10:10:01 -05:00
Matthew Poremba	998709d4fc	dev-amdgpu: Improve PM4 write data packet The write data packet can write multiple dwords but currently always assumes there is one dword, which can cause some write data to be missed. This case is not common, but the number of dwords is implicitly defined in the PM4 header. This changeset passes the PM4 header to write data so that the correct number of dwords can be determined. For now we assume no page crossing when writing multiple dwords as the driver should be checking for that. Change-Id: I0e8c3cbc28873779f468c2a11fdcf177210a22b7	2024-03-21 10:10:01 -05:00
Matthew Poremba	c045c68540	dev-amdgpu: Add node_id to interrupt handler The ROCm 6.0 driver adds a node_id field to interrupts which must match before passing on the interrupt to be cleared by the cookie from gem5's interrupt handler implementation. Add this field and enable for gfx942. The usage of the field can be seen in event_interrupt_isr_v9_4_3 at https://github.com/ROCm/ROCK-Kernel-Driver/blob/roc-6.0.x/drivers/ gpu/drm/amd/amdkfd/kfd_int_process_v9.c#L449 Change-Id: Iae8b8f0386a5ad2852b4a3c69f2c161d965c4922	2024-03-21 10:10:01 -05:00
Giacomo Travaglini	0ec8cf8d05	dev-arm: Fix SMMUv3 DTB autogen (#934 ) Replacing FdtProperyWords (expecting an integer) with FdtPropertyStrings Change-Id: Icd1cf00704e253c88ac9b1d69c3cf946d2a8ca70 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-14 15:42:57 +00:00
Giacomo Travaglini	5161195db5	dev-arm: Remove the SMMUv3 irq_interface_enable parameter The SMMU_IRQ_CTRL had been made optionally writeable by a prior patch [1] even if interrupts were not supported in the SMMUv3 model. As we are partially enabling IRQ support, we remove this option and we make the SMMU_IRQ_CTRL always writeable [1]: https://gem5-review.googlesource.com/c/public/gem5/+/38555 Change-Id: Ie1f9458d583a5d8bcbe450c3e88bda6b3c53cf10 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-08 13:53:44 +00:00
Giacomo Travaglini	d63282a9da	dev-arm: Implement wired interrupt for SMMU event queue See https://github.com/orgs/gem5/discussions/898 The SMMUv3 Event Queue is basically unused at the moment. Whenever a transaction fails we actually abort simulation. The sendEvent method could be used to actually report the failure to the driver but it is lacking interrupt support to notify the PE there is an event to handle. The SMMUv3 spec allows both wired and MSI interrupts to be used. We add the eventq_irq SPI param to the SMMU object and we draft an initial sendInterrupt utility that makes use of it whenever it is needed. Change-Id: I6d103919ca8bf53794ae4bc922cbdc7156adf37a Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-08 13:53:21 +00:00
Giacomo Travaglini	63c815b5fc	dev-arm: Do not panic in the SMMUv3 for fauting transactions Rely on the architected solution instead of aborting simulation. This means handling writes to the Event queue to signal managing software there was a fault in the SMMU Change-Id: I7b69ca77021732c6059bd6b837ae722da71350ff Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-08 11:29:22 +00:00
Giacomo Travaglini	7d5d1cd9c8	dev-arm: Rewrite SMMUEvent The struct fields of the SMMUEvent were not matching the SMMUv3 specs. This was "not an issue" as events have been implicitly disabled until now (every translation error was aborting simulation) With generateEvent we automatically construct a SMMU event from a translation result. Change-Id: Iba6a08d551c0a99bb58c4118992f1d2b683f62cf Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-08 11:29:22 +00:00
Giacomo Travaglini	ef10db5a3e	dev-arm: Record additional information in the TranslResult A faulting translation should return additional information (other than the fault type). This will be used by future patches to properly populate the SMMU event record of the event queue As we currenlty support two faults only: 1) F_TRANSLATION 2) F_PERMISSION We add to TranslResult the relevant fault information only: type, class, stage and ipa Change-Id: I0a81d5fc202e1b6135cecdcd6dfd2239c2f1ba7e Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-08 11:29:22 +00:00
Giacomo Travaglini	3d1f68f205	dev-arm: Return translation fault in doReadCD Reading the Context Descriptor (CD) might require a stage2 translation. At the moment doReadCD does not check for the return value of the translateStage2. This means that any stage2 fault will be silently discarded and an invalid address will be used/returned. By returning a translation result we make sure any error happening in the second stage of translation will be properly flagged Change-Id: I2ecd43f7e23080bf8222bc3addfabbd027ee8feb Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-08 11:29:22 +00:00
Giacomo Travaglini	4a4b775985	dev-arm: Provide encapsulation by adding TranslResult::isFaulting We don't check the fault type directly. This will improve readability once the TranslResult class will be augmented with extra fields Change-Id: I5acafaabf098d6ee79e1f0c384499cc043a75a9d Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-08 11:29:22 +00:00
Giacomo Travaglini	c0e5d58a96	dev: RegisterBank addRegistersAt for fragmented reg banks (#902 ) One of the limitations of the RegBank class is that it does not allow you to pass a non-contiguous set of registers. Its simplest form will just accept an initializer list of registers and it will store them in sequence. A more refined version [1] will optionally accept an offset value to be passed alongside the register reference. This is not meant to be used by the register bank to store the register at the provided offset. It is rather used by the bank to sanity check the register sits exactly at the provided range. The way to work around this for a fragemented register space is to explicitly allocate RAZ/RAO blocks as registers and to pass them to addRegisters together with the others. (See the SysSecCtrl [2] as an example) This makes it a bit tedious to model a register bank with gaps between its registers. First, the exact number and position of the gaps needs to be extraced from a spec. These sometimes report only implemented registers and their offset, and omit to document gaps/reserved space. So a developer needs to manually add register offset and size to check if all registers are contiguous. Second, these reserved register blocks need to be instantiated in the bank adding boilerplate code and affecting readibility. For these reasons we add a new registration method, called addRegistersAt. It reuses the RegisterAdder class but this time the offset field is really used to instruct the bank where the register should be mapped. The method is templated and the template parameter tells the bank which register type should be used to fill the remaining space. We make the RegBank the owner of this filler space (registers are generated internally within addRegistersAt). [1]: https://github.com/gem5/gem5/blob/stable/src/dev/reg_bank.hh#L106 [2]: https://github.com/gem5/gem5/blob/stable/src/dev/arm/ssc.cc#L48 Change-Id: I614ae6e9eeb40b365ac9b6dd8b75abbfdb9cb687 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2024-03-01 15:32:40 +00:00
Yu-Cheng Chang	bcf455755e	arch-riscv,dev: Update the PLIC implementation (#886 ) Update the PLIC based on the [riscv-plic-spec](https://github.com/riscv/riscv-plic-spec) in the PR: - Support customized PLIC hardID and privilege mode configuration - Backward compatable with the n_contexts parameter, will generate the config like {0,M}, {0,S}, {1,M} ... Change-Id: Ibff736827edb7c97921e01fa27f503574a27a562	2024-02-26 10:32:53 -08:00
wmin0	4e75e35a33	dev-arm: Remove the dependency of Platform for ArmSigInterruptPin (#878 ) ArmSigInterruptPin don't send the interrupt to GIC. Instead it sends the interrupt to the irq specified in Param. When using ArmSigInterruptPin, we shouldn't ask users to provide "Platform" since it doesn't need it. To reduce the confusion, this change removes the dependency of Platform for ArmSigInterruptPin. Change-Id: I0ee507ed1c08b4fa6d3e384e28732f3acb4f6892	2024-02-20 08:50:27 +00:00
kroarty-lanl	197be3a0dd	dev: Fix off-by-one in IDE controller PCI register allocation (#824 ) The PCI configuration space is 256 bytes, yet because the PCI_CONFIG_SIZE macro is 0xff, the final register allocation in the IDE controller only allocated up to byte 255. Change-Id: I1aef2cad9df366ee8425edb410037061eb29ae33	2024-02-01 10:14:28 -08:00
Matthew Poremba	7f71477f15	dev-amdgpu: Limit SDMA NOP count to wptr boundary (#806 ) If the NOP count of an SDMA NOP packet goes beyond the wptr address, the queue decode method will loop infinitely. If a packet comes in with a bad count this causes gem5 to hang. This change advances the rptr one dword at a time until either reaching the NOP count or when rptr == wptr to prevent this issue. Change-Id: Ib2c0f74a477bff27890c9c064bb4190e76e513bd	2024-01-25 15:35:35 -08:00
Ivana Mitrovic	24e0d71034	arch-gcn3: Remove gcn3 (#781 ) Related to issue #703 , this PR removes GCN3 related files and updates source code, documentation, and tests to switch over to Vega is that was not done already. Highlights are: - Remove all src/arch/amdgpu/gcn3 files and update Kconfigs. - Remove references to GCN3 and replace with Vega where applicable. - Update the build targets in the gcn-gpu Docker. This will need to be rebuilt but not urgently. - Remove the GCN3 tag in testlib. Most tests seem to be using Vega already, so that commit is small.	2024-01-25 10:14:46 -08:00
Matthew Poremba	0ac110ac95	dev-amdgpu: Check privledge bit for SDMA RLC queues (#792 ) By default all SDMA queues are privileged queues, meaning the addresses in SDMA packets use the privileged translation tables. RLC queues (sometimes called user queues) are not necessarily privileged and might use user translation tables. RLC queues are used more often in ROCm 6.0 exposing an issue with invalid translations with RLC queues. This changeset checks the priv bit in the SDMA MQD when an RLC queue is mapped. Each packet type which uses an address then checks the bit before performing translation. Tested with daily/weekly tests with a ROCm 6.0 disk image and tests are passing. Change-Id: I6122fbc194e8d6f5d38e81f1b0e11646d90e0ea0	2024-01-24 07:25:43 -08:00
Matthew Poremba	63caa780c2	misc: Remove all references to GCN3 Replace instances of "GCN3" with Vega. Remove gfx801 and gfx803. Rename FIJI to Vega and Carrizo to Raven. Using misc since there is not enough room to fit all the tags. Change-Id: Ibafc939d49a69be9068107a906e878408c7a5891	2024-01-17 11:11:06 -06:00
Bobby R. Bruce	d11c40dcac	misc: Run `pre-commit run --all-files` This ensures `isort` is applied to all files in the repo. Change-Id: Ib7ced1c924ef1639542bf0d1a01c5737f6ba43e9	2023-11-29 22:06:41 -08:00
Bobby R. Bruce	d94d6017b0	scons: Change to Kconfig build system (#69 ) The PR contains the following changes: - Move all of the config options(`env["CONF"]`) from SConsopt to Kconfig files - Update `build_opts` files to Kconfig option formats - The Ruby Protocol files are only built if `RUBY=y` - Remove the default-default build target - Kconfig commands are included in the PR: - defconfig - setconfig - meunconfig - guiconfig - listnewconfig - savedefconfig - oldconfig - olddefconfig - Add the `python3-tk` package dependencies Jira issue: https://gem5.atlassian.net/browse/GEM5-1211	2023-11-27 13:59:18 -08:00
Matthew Poremba	9e6a87e67a	dev-amdgpu: Writeback PM4 queue rptr when empty (#597 ) The GPU device keeps a local copy of each ring buffers read pointer (rptr) to avoid constant DMAs to/from host memory. This means it needs to be periodically updated on the host side as the driver uses this to determine how much space is left in the queue and may hang if it believe the queue is full. For user-mode queues, this already happens when queues are unmapped. For kernel mode queues (e.g., HIQ, KIQ) the rptr is never updated leading to a hang. In this patch the rptr for all queues is reported back to the kernel whenever the queue reaches an empty state (rptr == wptr). Additionally to handle PM4 queue wrap-around, the queue processing function checks if the queue is not empty instead of rptr < wptr. This is state because the driver fills PM4 queues with NOP packets on initialization and when wrap around occurs. Change-Id: Ie13a4354f82999208a75bb1eaec70513039ff30f	2023-11-27 11:02:11 -08:00
Gabe Black	db3a6e8e84	scons: Use Kconfig to configure gem5. These are not yet consumed by anything, but convert all the settings from SCons variables to Kconfig variables. If you have existing SConsopts files which need to be converted, you should take a look at KCONFIG.md to learn about how kconfig is used in gem5. You should decide if any variables need to be available to C++ or kconfig itself, and whether those are options which should be detected automatically, or should be up to the user. Options which should be measured automatically should still be in SConsopts files, while user facing options should be added to new or existing Kconfig files. Generally, make sure you're storing c++/kconfig visible options in env['CONF'][...]. Also remove references to sticky_vars since persistent options should now be handled with kconfig, and export_vars since everything in env['CONF'] is now exported automatically. Switch SCons/gem5 to use Kconfig for configuration, except EXTRAS which is still a sticky SCons variable. This is necessary because EXTRAS also controls what config options exist. If it came from Kconfig itself, then there would be a circular dependency. This dependency could theoretically be handled by reparsing the Kconfig when EXTRAS directories were added or removed, but that would be complicated, and isn't supported by kconfiglib. It wouldn't be worth the significant effort it would take to add it, just to use Kconfig more purely. Change-Id: I29ab1940b2d7b0e6635a490452d05befe5b4a2c9	2023-11-23 08:26:10 +08:00
Bobby R. Bruce	23a22ed95c	dev-amdgpu: Add VMID map to checkpoint (#570 ) When restoring checkpoints for certain applications, gem5 tries to create new doorbells with a pre-existing queue ID and simulation crashes shortly after. This commit adds existing IDs to the GPU device's used VMID map so that new doorbells are aware of existing queue IDs and use a new ID. This ensures that queue IDs are unique after checkpoint restoration	2023-11-22 10:05:21 -08:00
Vishnu Ramadas	06161ded8c	dev-amdgpu: Add VMID map to checkpoint When restoring checkpoints for certain applications, gem5 tries to create new doorbells with a pre-existing queue ID and simulation crashes shortly after. This commit checkpoints the existing VMID map so that any new doorbells after restoration use a unique queue ID Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f	2023-11-20 21:19:17 -06:00
Bobby R. Bruce	08c0d1f27a	dev: Fix `std::min` type mismatch in reg_bank.hh https://github.com/gem5/gem5/pull/386 included two cases in "src/dev/reg_bank.hh" where `std:: min` was used to compare a an integer of type `size_t` and another of type `Addr`. This cause an error on my Apple Silicon Mac as this is a comparison between an "unsigned long" and an "unsigned long long" which (at least on my setup) was not permitted. To fix this issue the `reg_size` was changed from `size_t` to `Addr`, as well as it the types of the values it was derived from and the variable used to hold the return from the `std::min` calls. Change-Id: I31e9c04a8e0327d4f6f5390bc5a743c629db4746	2023-11-20 17:33:44 -08:00
Vishnu Ramadas	d19d6fc31e	dev-amdgpu: Add PM4 queue ID to GPU used VMID map When restoring checkpoints for certain applications, gem5 tries to create new doorbells with a pre-existing queue ID and simulation crashes shortly after. This commit adds existing IDs to the GPU device's used VMID map so that new doorbells are aware of existing queue IDs and use a new ID. This ensures that queue IDs are unique after checkpoint restoration Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f	2023-11-16 17:30:00 -06:00
hungweihsuG	83f1fe3fec	dev: add debug flag in register bank. (#386 ) Print extra logs for the full/partial read/write access to the registers through the register bank. The debug flag is empty by default and would not print anything. Test: run unittest of dev/reg_bank.test.xml to check the behavior would not affect the original functionality. run gem5 with debug flags and use m5term to poke on registers.	2023-11-15 10:04:46 -08:00
Jason Lowe-Power	71973b386e	gpu-compute,dev-hsa: ROCm 5.5+ support (#498 ) ROCm 5.5 support including: - Vendor packet completion signals - Queue remapping race condition fix - Backwards compatible GPR allocation - Fix transient readBlob fatal reading kernel descriptor	2023-11-06 10:51:37 -08:00
Matthew Poremba	37da1c45f3	dev-amdgpu: Better handling for queue remapping The amdgpu driver can, at any time, tell the device to unmap a queue to force the queue descriptor to be written back to main memory in the form of a memory queue descriptor (MQD). It will then immediately remap the queue and continue writing the doorbell to the queue. It is possible that the doorbell write occurs after the queue is unmapped but before it is remapped. In this situation, we need to check the updated value of the doorbell for the queue and write that to the queue after it is mapped. To handle this, a pending doorbell packet map is created to hold a packet to replay when the queue is mapped. Because PCI in gem5 implements only the atomic protocol port, we cannot use the original packet as it must respond in the same Tick. This patch fixes issues with the doorbell maps not being cleared on unmapping to ensure the doorbell is not found in writeDoorbell and places in the pending doorbell map. This includes fixing the doorbell offset value in the doorbell to VMID map which was is now multiplied by four as it is a dword address. This was tested using tensorflow 2.0's MNIST example which was seeing this issue consistently. With this patch it now makes progress and does issue pending doorbell writes. Change-Id: Ic6b401d3fe7fc46b7bcbf19a769cdea6814e7d1e	2023-11-01 14:52:39 -05:00
Matthew Poremba	d05433b3f6	gpu-compute,dev-hsa: Send vendor packet completion signal gem5 does not currently implement any vendor-specific HSA packets. Starting in ROCm 5.5, vendor packets appear to end with a completion signal. Not sending this completion causes gem5 to hang. Since these packets are not documented anywhere and need to be reverse engineered we send the completion signal, if non-zero, and finish the packet as is the current behavior. Testing: HIP examples working on most recent ROCm release (5.7.1). Change-Id: Id0841407bec564c84f590c943f0609b17e01e14c	2023-11-01 14:52:39 -05:00
Hoa Nguyen	50196863a4	stdlib,dev: Fix several hardcoded RISC-V ISA strings The "s" and "u" letters are not recognized by the Linux kernel as RISC-V extensions [1]. [1] https://elixir.bootlin.com/linux/v6.5.7/source/arch/riscv/kernel/cpufeature.c#L170 Change-Id: I2a99557482cde6e6d6160626b3995275c41b1577 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2023-10-25 20:12:57 +00:00
Bobby R. Bruce	334df18dce	arch-riscv: Add bootloader+kernel workload (#390 ) Aims to boot OpenSBI + Linux kernel.	2023-10-18 09:17:12 -07:00
Bobby R. Bruce	d42eeb6b68	cpu: Explicitly define cache_line_size -> 64-bit unsigned int (#329 ) While it's plausible to define the cache_line_size as a 32-bit unsigned int, the use of cache_line_size is way out of its original scope. cache_line_size has been used to produce an address mask, which masking out the offset bits from an address. For example, [1], [2], [3], and [4]. However, since the cache_line_size is an "unsigned int", the type of the value is not guaranteed to be 64-bit long. Subsequently, the bit twiddling hacks in [1], [2], [3], and [4] produce 32-bit mask, i.e., 0x00000000FFFFFFC0. This behavior at least caused a problem in LLSC in RISC-V [5], where the load reservation (LR) relies on the mask to produce the cache block address. Two distinct 64-bit addresses can be mapped to the same cache block using the above mask. This patch explicitly defines cache_line_size as a 64-bit unsigned int so the cache block mask can be produced correctly for 64-bit addresses. [1] `3bdcfd6f7a/src/cpu/simple/atomic.hh (L147)` [2] `3bdcfd6f7a/src/cpu/simple/timing.hh (L224)` [3] `3bdcfd6f7a/src/cpu/o3/lsq_unit.cc (L241)` [4] `3bdcfd6f7a/src/cpu/minor/lsq.cc (L1425)` [5] `3bdcfd6f7a/src/arch/riscv/isa.cc (L787)`	2023-10-16 07:50:35 -07:00
Bobby R. Bruce	ddf6cb88e4	misc: Run `pre-commit run --all-files` This is reflect the updates made to black when running `pre-commit autoupdate`. Change-Id: Ifb7fea117f354c7f02f26926a5afdf7d67bc5919	2023-10-10 14:01:58 -07:00
Matt Sinclair	ec633b3d68	dev-amdgpu,mem-ruby: Add support to checkpoint and restore between kernels in GPUFS (#377 ) Earlier, GPU checkpointing was working only if a checkpoint was created before the first kernel execution. This pull request adds support to checkpoint in-between any two kernel calls. It does so by doing the following. - Adds flush support in the GPU_VIPER protocol - Adds flush support in the GPUCoalescer - Updates cache recorder to use the GPUCoalescer during simulation cooldown and cache warmup times.	2023-10-10 09:41:21 -05:00
Matthew Poremba	75a7f30dfb	dev-amdgpu: Implement GPU clock MMIOs The ROCr runtime uses a combination of HSA signal timestamps and hardware MMIOs to calculate profiling times. At the beginning of an application a timestamp is read from the GPU using MMIOs. The clock MMIOs reside in the GFX MMIO region, so a new AMDGPUGfx class is added to handle these MMIOs. The timestamp value is expected to be in nanoseconds, so we simply use the gem5 tick converted to ns. Change-Id: I7d1cba40d5042a7f7a81fd4d132402dc11b71bd4	2023-10-06 13:21:40 -05:00
Matthew Poremba	6a4b2bb096	dev-hsa,gpu-compute: Add timestamps to AMD HSA signals The AMD specific HSA signal contains start/end timestamps for dispatch packet completion signals. These are current always zero. These timestamp values are used for profiling in the ROCr runtime. Unfortunately, the GpuAgent::TranslateTime method in ROCr does not check for zero values before dividing, causing applications that use profiling to crash with SIGFPE. Profiling is used via hipEvents in the HACC application, so these should be supported in gem5. In order to handle writing the timestamp values, we need to DMA the values to memory before writing the completion signal. This changes the flow of the async completion signal write to be (1) read mailbox pointer (2) if valid, write the mailbox data, other skip to 4 (3) write mailbox data if pointer is valid (4) write timestamp values (5) write completion signal. The application will process the timestamp data as soon as the completion signal is received, so we need to ordering to ensure the DMA for timestamps was completed. HACC now runs to completion on GPUFS and has the same output was hardware. Change-Id: I09877cdff901d1402140f2c3bafea7605fa6554e	2023-10-06 13:21:40 -05:00
Hoa Nguyen	6f8b74ece8	dev,arch-riscv: Mark gem5's 8250 UART as 16550a compatible 8250 UART is supposed to be compatible to 16550a UART. This enables OpenSBI to print things to UART as OpenSBI only prints if the UART is 16550a compatible [1]. There is a similar change from gem5 gerrit [2] pointing out that this also enables bbl to print things to UART. This is confirmed :) [1] https://github.com/riscv-software-src/opensbi/blob/v1.3.1/lib/utils/serial/fdt_serial_uart8250.c#L29 [2] https://gem5-review.googlesource.com/c/public/gem5/+/68481 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2023-10-06 00:48:12 -07:00
Vishnu Ramadas	f69191a31d	dev-amdgpu: Remove duplicate writes to PM4 queue pointers During checkpoint restoration, the unserialize() function writes rptr, wptr, and indirect buffer rptr, wptr to PM4 queue's rptr, wptr fields. This commit updates this to write only the relevant pointers to the queue structure. If indirect buffers are used, then it writes only the indirect buffer pointers to the queue. If they are not used, then it writes rptr, wptr values to the queue. Change-Id: Iedb25a726112e1af99cc1e7bc012de51c4ebfd45	2023-10-02 19:37:46 -05:00
Vishnu Ramadas	107e05266d	dev-amdgpu: Add aql, hsa queue information to checkpoint-restore GPUFS uses aql information from PM4 queues to initialize doorbells. This commit adds aql information to the checkpoint so that it can be used during restoration to correctly initialize all doorbells. Additionally, this commit also sets the hsa queue correctly during checkpoint-restoration Change-Id: Ief3ef6dc973f70f27255234872a12c396df05d89	2023-10-02 19:02:50 -05:00

1 2 3 4 5 ...

1478 Commits