derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Bobby R. Bruce	d11c40dcac	misc: Run `pre-commit run --all-files` This ensures `isort` is applied to all files in the repo. Change-Id: Ib7ced1c924ef1639542bf0d1a01c5737f6ba43e9	2023-11-29 22:06:41 -08:00
Bobby R. Bruce	d94d6017b0	scons: Change to Kconfig build system (#69 ) The PR contains the following changes: - Move all of the config options(`env["CONF"]`) from SConsopt to Kconfig files - Update `build_opts` files to Kconfig option formats - The Ruby Protocol files are only built if `RUBY=y` - Remove the default-default build target - Kconfig commands are included in the PR: - defconfig - setconfig - meunconfig - guiconfig - listnewconfig - savedefconfig - oldconfig - olddefconfig - Add the `python3-tk` package dependencies Jira issue: https://gem5.atlassian.net/browse/GEM5-1211	2023-11-27 13:59:18 -08:00
Matthew Poremba	9e6a87e67a	dev-amdgpu: Writeback PM4 queue rptr when empty (#597 ) The GPU device keeps a local copy of each ring buffers read pointer (rptr) to avoid constant DMAs to/from host memory. This means it needs to be periodically updated on the host side as the driver uses this to determine how much space is left in the queue and may hang if it believe the queue is full. For user-mode queues, this already happens when queues are unmapped. For kernel mode queues (e.g., HIQ, KIQ) the rptr is never updated leading to a hang. In this patch the rptr for all queues is reported back to the kernel whenever the queue reaches an empty state (rptr == wptr). Additionally to handle PM4 queue wrap-around, the queue processing function checks if the queue is not empty instead of rptr < wptr. This is state because the driver fills PM4 queues with NOP packets on initialization and when wrap around occurs. Change-Id: Ie13a4354f82999208a75bb1eaec70513039ff30f	2023-11-27 11:02:11 -08:00
Gabe Black	db3a6e8e84	scons: Use Kconfig to configure gem5. These are not yet consumed by anything, but convert all the settings from SCons variables to Kconfig variables. If you have existing SConsopts files which need to be converted, you should take a look at KCONFIG.md to learn about how kconfig is used in gem5. You should decide if any variables need to be available to C++ or kconfig itself, and whether those are options which should be detected automatically, or should be up to the user. Options which should be measured automatically should still be in SConsopts files, while user facing options should be added to new or existing Kconfig files. Generally, make sure you're storing c++/kconfig visible options in env['CONF'][...]. Also remove references to sticky_vars since persistent options should now be handled with kconfig, and export_vars since everything in env['CONF'] is now exported automatically. Switch SCons/gem5 to use Kconfig for configuration, except EXTRAS which is still a sticky SCons variable. This is necessary because EXTRAS also controls what config options exist. If it came from Kconfig itself, then there would be a circular dependency. This dependency could theoretically be handled by reparsing the Kconfig when EXTRAS directories were added or removed, but that would be complicated, and isn't supported by kconfiglib. It wouldn't be worth the significant effort it would take to add it, just to use Kconfig more purely. Change-Id: I29ab1940b2d7b0e6635a490452d05befe5b4a2c9	2023-11-23 08:26:10 +08:00
Bobby R. Bruce	23a22ed95c	dev-amdgpu: Add VMID map to checkpoint (#570 ) When restoring checkpoints for certain applications, gem5 tries to create new doorbells with a pre-existing queue ID and simulation crashes shortly after. This commit adds existing IDs to the GPU device's used VMID map so that new doorbells are aware of existing queue IDs and use a new ID. This ensures that queue IDs are unique after checkpoint restoration	2023-11-22 10:05:21 -08:00
Vishnu Ramadas	06161ded8c	dev-amdgpu: Add VMID map to checkpoint When restoring checkpoints for certain applications, gem5 tries to create new doorbells with a pre-existing queue ID and simulation crashes shortly after. This commit checkpoints the existing VMID map so that any new doorbells after restoration use a unique queue ID Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f	2023-11-20 21:19:17 -06:00
Bobby R. Bruce	08c0d1f27a	dev: Fix `std::min` type mismatch in reg_bank.hh https://github.com/gem5/gem5/pull/386 included two cases in "src/dev/reg_bank.hh" where `std:: min` was used to compare a an integer of type `size_t` and another of type `Addr`. This cause an error on my Apple Silicon Mac as this is a comparison between an "unsigned long" and an "unsigned long long" which (at least on my setup) was not permitted. To fix this issue the `reg_size` was changed from `size_t` to `Addr`, as well as it the types of the values it was derived from and the variable used to hold the return from the `std::min` calls. Change-Id: I31e9c04a8e0327d4f6f5390bc5a743c629db4746	2023-11-20 17:33:44 -08:00
Vishnu Ramadas	d19d6fc31e	dev-amdgpu: Add PM4 queue ID to GPU used VMID map When restoring checkpoints for certain applications, gem5 tries to create new doorbells with a pre-existing queue ID and simulation crashes shortly after. This commit adds existing IDs to the GPU device's used VMID map so that new doorbells are aware of existing queue IDs and use a new ID. This ensures that queue IDs are unique after checkpoint restoration Change-Id: I9bf89a2769db26ceab4441634ff2da936eea6d6f	2023-11-16 17:30:00 -06:00
hungweihsuG	83f1fe3fec	dev: add debug flag in register bank. (#386 ) Print extra logs for the full/partial read/write access to the registers through the register bank. The debug flag is empty by default and would not print anything. Test: run unittest of dev/reg_bank.test.xml to check the behavior would not affect the original functionality. run gem5 with debug flags and use m5term to poke on registers.	2023-11-15 10:04:46 -08:00
Jason Lowe-Power	71973b386e	gpu-compute,dev-hsa: ROCm 5.5+ support (#498 ) ROCm 5.5 support including: - Vendor packet completion signals - Queue remapping race condition fix - Backwards compatible GPR allocation - Fix transient readBlob fatal reading kernel descriptor	2023-11-06 10:51:37 -08:00
Matthew Poremba	37da1c45f3	dev-amdgpu: Better handling for queue remapping The amdgpu driver can, at any time, tell the device to unmap a queue to force the queue descriptor to be written back to main memory in the form of a memory queue descriptor (MQD). It will then immediately remap the queue and continue writing the doorbell to the queue. It is possible that the doorbell write occurs after the queue is unmapped but before it is remapped. In this situation, we need to check the updated value of the doorbell for the queue and write that to the queue after it is mapped. To handle this, a pending doorbell packet map is created to hold a packet to replay when the queue is mapped. Because PCI in gem5 implements only the atomic protocol port, we cannot use the original packet as it must respond in the same Tick. This patch fixes issues with the doorbell maps not being cleared on unmapping to ensure the doorbell is not found in writeDoorbell and places in the pending doorbell map. This includes fixing the doorbell offset value in the doorbell to VMID map which was is now multiplied by four as it is a dword address. This was tested using tensorflow 2.0's MNIST example which was seeing this issue consistently. With this patch it now makes progress and does issue pending doorbell writes. Change-Id: Ic6b401d3fe7fc46b7bcbf19a769cdea6814e7d1e	2023-11-01 14:52:39 -05:00
Matthew Poremba	d05433b3f6	gpu-compute,dev-hsa: Send vendor packet completion signal gem5 does not currently implement any vendor-specific HSA packets. Starting in ROCm 5.5, vendor packets appear to end with a completion signal. Not sending this completion causes gem5 to hang. Since these packets are not documented anywhere and need to be reverse engineered we send the completion signal, if non-zero, and finish the packet as is the current behavior. Testing: HIP examples working on most recent ROCm release (5.7.1). Change-Id: Id0841407bec564c84f590c943f0609b17e01e14c	2023-11-01 14:52:39 -05:00
Hoa Nguyen	50196863a4	stdlib,dev: Fix several hardcoded RISC-V ISA strings The "s" and "u" letters are not recognized by the Linux kernel as RISC-V extensions [1]. [1] https://elixir.bootlin.com/linux/v6.5.7/source/arch/riscv/kernel/cpufeature.c#L170 Change-Id: I2a99557482cde6e6d6160626b3995275c41b1577 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2023-10-25 20:12:57 +00:00
Bobby R. Bruce	334df18dce	arch-riscv: Add bootloader+kernel workload (#390 ) Aims to boot OpenSBI + Linux kernel.	2023-10-18 09:17:12 -07:00
Bobby R. Bruce	d42eeb6b68	cpu: Explicitly define cache_line_size -> 64-bit unsigned int (#329 ) While it's plausible to define the cache_line_size as a 32-bit unsigned int, the use of cache_line_size is way out of its original scope. cache_line_size has been used to produce an address mask, which masking out the offset bits from an address. For example, [1], [2], [3], and [4]. However, since the cache_line_size is an "unsigned int", the type of the value is not guaranteed to be 64-bit long. Subsequently, the bit twiddling hacks in [1], [2], [3], and [4] produce 32-bit mask, i.e., 0x00000000FFFFFFC0. This behavior at least caused a problem in LLSC in RISC-V [5], where the load reservation (LR) relies on the mask to produce the cache block address. Two distinct 64-bit addresses can be mapped to the same cache block using the above mask. This patch explicitly defines cache_line_size as a 64-bit unsigned int so the cache block mask can be produced correctly for 64-bit addresses. [1] `3bdcfd6f7a/src/cpu/simple/atomic.hh (L147)` [2] `3bdcfd6f7a/src/cpu/simple/timing.hh (L224)` [3] `3bdcfd6f7a/src/cpu/o3/lsq_unit.cc (L241)` [4] `3bdcfd6f7a/src/cpu/minor/lsq.cc (L1425)` [5] `3bdcfd6f7a/src/arch/riscv/isa.cc (L787)`	2023-10-16 07:50:35 -07:00
Bobby R. Bruce	ddf6cb88e4	misc: Run `pre-commit run --all-files` This is reflect the updates made to black when running `pre-commit autoupdate`. Change-Id: Ifb7fea117f354c7f02f26926a5afdf7d67bc5919	2023-10-10 14:01:58 -07:00
Matt Sinclair	ec633b3d68	dev-amdgpu,mem-ruby: Add support to checkpoint and restore between kernels in GPUFS (#377 ) Earlier, GPU checkpointing was working only if a checkpoint was created before the first kernel execution. This pull request adds support to checkpoint in-between any two kernel calls. It does so by doing the following. - Adds flush support in the GPU_VIPER protocol - Adds flush support in the GPUCoalescer - Updates cache recorder to use the GPUCoalescer during simulation cooldown and cache warmup times.	2023-10-10 09:41:21 -05:00
Matthew Poremba	75a7f30dfb	dev-amdgpu: Implement GPU clock MMIOs The ROCr runtime uses a combination of HSA signal timestamps and hardware MMIOs to calculate profiling times. At the beginning of an application a timestamp is read from the GPU using MMIOs. The clock MMIOs reside in the GFX MMIO region, so a new AMDGPUGfx class is added to handle these MMIOs. The timestamp value is expected to be in nanoseconds, so we simply use the gem5 tick converted to ns. Change-Id: I7d1cba40d5042a7f7a81fd4d132402dc11b71bd4	2023-10-06 13:21:40 -05:00
Matthew Poremba	6a4b2bb096	dev-hsa,gpu-compute: Add timestamps to AMD HSA signals The AMD specific HSA signal contains start/end timestamps for dispatch packet completion signals. These are current always zero. These timestamp values are used for profiling in the ROCr runtime. Unfortunately, the GpuAgent::TranslateTime method in ROCr does not check for zero values before dividing, causing applications that use profiling to crash with SIGFPE. Profiling is used via hipEvents in the HACC application, so these should be supported in gem5. In order to handle writing the timestamp values, we need to DMA the values to memory before writing the completion signal. This changes the flow of the async completion signal write to be (1) read mailbox pointer (2) if valid, write the mailbox data, other skip to 4 (3) write mailbox data if pointer is valid (4) write timestamp values (5) write completion signal. The application will process the timestamp data as soon as the completion signal is received, so we need to ordering to ensure the DMA for timestamps was completed. HACC now runs to completion on GPUFS and has the same output was hardware. Change-Id: I09877cdff901d1402140f2c3bafea7605fa6554e	2023-10-06 13:21:40 -05:00
Hoa Nguyen	6f8b74ece8	dev,arch-riscv: Mark gem5's 8250 UART as 16550a compatible 8250 UART is supposed to be compatible to 16550a UART. This enables OpenSBI to print things to UART as OpenSBI only prints if the UART is 16550a compatible [1]. There is a similar change from gem5 gerrit [2] pointing out that this also enables bbl to print things to UART. This is confirmed :) [1] https://github.com/riscv-software-src/opensbi/blob/v1.3.1/lib/utils/serial/fdt_serial_uart8250.c#L29 [2] https://gem5-review.googlesource.com/c/public/gem5/+/68481 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2023-10-06 00:48:12 -07:00
Vishnu Ramadas	f69191a31d	dev-amdgpu: Remove duplicate writes to PM4 queue pointers During checkpoint restoration, the unserialize() function writes rptr, wptr, and indirect buffer rptr, wptr to PM4 queue's rptr, wptr fields. This commit updates this to write only the relevant pointers to the queue structure. If indirect buffers are used, then it writes only the indirect buffer pointers to the queue. If they are not used, then it writes rptr, wptr values to the queue. Change-Id: Iedb25a726112e1af99cc1e7bc012de51c4ebfd45	2023-10-02 19:37:46 -05:00
Vishnu Ramadas	107e05266d	dev-amdgpu: Add aql, hsa queue information to checkpoint-restore GPUFS uses aql information from PM4 queues to initialize doorbells. This commit adds aql information to the checkpoint so that it can be used during restoration to correctly initialize all doorbells. Additionally, this commit also sets the hsa queue correctly during checkpoint-restoration Change-Id: Ief3ef6dc973f70f27255234872a12c396df05d89	2023-10-02 19:02:50 -05:00
Hoa Nguyen	1fc89bc8ae	cpu,mem,dev: Use Addr for cacheLineSize Change-Id: I2f056571dbf35081d58afda09726c600141d5a05 Signed-off-by: Hoa Nguyen <hn@hnpl.org>	2023-09-20 14:16:46 -07:00
Matthew Poremba	63cabf2848	dev-amdgpu: Handle GPU atomics on host memory addresses It is possible to execute a GPU atomic instruction using a memory address that is in the host memory space (e.g, HMM, __managed__, hipHostMalloc'd address). Since these are in host memory they are passed to the SystemHub DmaDevice. However, this currently executes as a write packet without modifying data. This leads to hangs in applications that use atomics for forward progress (e.g., HeteroSync). It is not clear where these are handled on a real GPU, but they are certianly not handled by the software stack nor driver, so they must be handled in hardware and therefore implemented in gem5. Handling for atomics in the SystemHub makes the most sense. To make atomics work a few extra changes need to be made to the SystemHub. (1) The atomic is implemented as a host memory read, followed by calling the AtomicOpFunctor, followed by a write. This requires a second event to handle read response, performing atomic, and issuing a write. (2) Atomics must be serialized otherwise two atomics might return the same value which is incorrect. This patch adds serialization logic for all request types to the same address to handle this. (3) With the added complexity of the SystemHub, a new debug flag explicitly for SystemHub is added. Testing done: The heterosync application with input "sleepMutex 10 16 4" previously hung before this patch. It passes with the patch applied. This application tests both (1) and (2) above, as it allocates locks with hipHostMalloc and has multiple workgroups sending an atomic request in the same Tick, verifying the serialization mechanism. Change-Id: Ife84b30037d1447dd384340cfeb06fdfd472fff9	2023-09-20 13:52:25 -05:00
Matthew Poremba	57b3d2897c	gpu-compute: Use timing DMAs for GPUFS HSA signals The functional HSA signal read was a hack left in the gpu-compute code. In full system, this functional read is causing problems occasionally with the translation not yet being in the page table. The error message output by gem5 was a fatal message on the readBlob method in port proxy. Changing this to a timing DMA fixes this problem. This commit adds the various timing DMA functions to send and receive response and clean up. A helper method "sendCompletionSignal" is added to the GPUCommandProcessor because the indentation level was getting too deep. This change applies only to FS mode. Code for SE mode is equivalent to what it was before this commit. Change-Id: I1bfcaa0a52731cdf9532a7fd0eb06ab2f0e09d48	2023-08-25 13:10:51 -05:00
Matthew Poremba	addba01d29	configs,dev-amdgpu: Add PCI express capability info The ROCm stack requires PCI express atomics. Currently the first PCI CapabilityPtr does not point to anything, which signals to the OS (Linux) that this is an early generation PCI device. As PCI express atomics were introduced later, the CapabilityPtr needs to point to at least a PCI express capability structure. This capability is defined as 0x10 in Linux. We additionally set the PCI atomic based bits and implement device specific PCI configuration space reads and writes to the amdgpu device. With this commit, the output of simulation when loading the amdgpu driver no longer outputs "PCIE atomics not supported". Further, an application which uses PCIe atomics (PyTorch with a reduce_sum kernel) now makes further progress. Change-Id: I5e3866979659a2657f558941106ef65c2f4d9988	2023-08-24 09:10:35 -05:00
Matthew Poremba	8b4c38302f	dev: PCI: Fix PCI express capability union The capabilities for PCI express is a struct, instead of a union, like the other capability unions. A union is used here to provide access to the ordinal data values when reading/writing an offset while simultaneously providing human readable field values that can be set when writing the code. This commit changes it to union which is likely should be. Nothing appears to be using this union yet so it is likely an oversight. Change-Id: I85fe7cc62914525c70fd7a5946d725ed308f8775	2023-08-23 19:32:38 -05:00
Matthew Poremba	3b35e73eb8	dev-amdgpu: Implement SDMA constant fill This SDMA packet is much more common starting around ROCm 5.4. Previously this was mostly used to clear page tables after an application ended and was therefore left unimplemented. It is now used for basic operation like device memsets. This patch implements constant fill as it is now necessary. Change-Id: I9b2cf076ec17f5ed07c20bb820e7db0c082bbfbc	2023-07-30 13:17:05 -05:00
Ranganath (Bujji) Selagamsetty	ede4d89a83	arch-vega, dev-amdgpu: Fix for memory leaks When using the new operator, delete should be called on any allocated memory after it's use is complete. Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019	2023-07-28 19:14:46 -05:00
Bobby R. Bruce	753933d471	gpu-compute, tests: Fix GPU_X86 compilation, add compiler tests (#64 ) * gpu-compute: Remove use of 'std::random_shuffle' This was deprecated in C++14 and removed in C++17. This has been replaced with std::random. This has been implemented to ensure reproducible results despite (pseudo)random behavior. Change-Id: Idd52bc997547c7f8c1be88f6130adff8a37b4116 * dev-amdgpu: Add missing 'overrides' This causes warnings/errors in some compilers. Change-Id: I36a3548943c030d2578c2f581c8985c12eaeb0ae * dev: Fix Linux specific includes to be portable This allows for compilation in non-linux systems (e.g., Mac OS). Change-Id: Ib6c9406baf42db8caaad335ebc670c1905584ea2 * tests: Add 'VEGA_X86' build target to compiler-tests.sh Change-Id: Icbf1d60a096b1791a4718a7edf17466f854b6ae5 * tests: Add 'GCN3_X86' build target to compiler-tests.sh Change-Id: Ie7c9c20bb090f8688e48c8619667312196a7c123	2023-07-11 14:35:03 -07:00
Wei-Han Chen	b4687aa7d9	dev: Warn when resp packet is error in dma port This CL adds a warning when the response packet is error. Change-Id: I8e94dc2b85cd1753a4d6265cfda3cd5d6325f425 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71778 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Yu-hsin Wang <yuhsingw@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-07-07 07:12:54 +00:00
Matthew Poremba	079fc47dc2	dev-amdgpu: Perform frame writes atomically The PCI read/write functions are atomic functions in gem5, meaning they expect a response with a latency value on the same simulation Tick. For reads to a PCI device, the response must also include a data value read from the device. The AMDGPU device has a PCI BAR which mirrors the frame buffer memory. Currently reads are done atomically, but writes are sent to a DMA device without waiting for a write completion ACK. As a result, it is possible that writes can be queued in the DMA device long enough that another read for a queued address arrives. This happens very deterministically with the AtomicSimpleCPU and causes GPUFS to break with that CPU. This change makes writes to the frame BAR atomic the same as reads. This avoids that problem and as a result the AtomicSimpleCPU can now load the driver for GPUFS simulations. Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71898 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>	2023-06-29 19:56:49 +00:00
Giacomo Travaglini	b355baac93	dev-arm: Treat GICv3 reserved addresses as RES0 According to the GIC specification (IHI0069) reserved addresses in the GIC memory map are treated as RES0. We allow to disable this behaviour and panic instead (reserved_res0 = False, which is what we have been doing so far) to catch development bugs (in gem5 and in the guest SW) Change-Id: I23f98519c2f256c092a52425735b8792bae7a2c7 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71138 Reviewed-by: Richard Cooper <richard.cooper@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-06-05 15:01:59 +00:00
Matthew Poremba	6b4a1020be	configs,dev-amdgpu: GPUFS MI200/gfx90a support Add support for MI200-like device. This includes adding PCI IDs and new MMIOs for the device, a different MAP_PROCESS packet, and a different calculation for the number of VGPRs. Change-Id: I0fb7b3ad928826beaa5386d52a94ba504369cb0d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70317 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-05-25 19:14:32 +00:00
Matthew Poremba	4d18546bfb	dev-amdgpu: Update SDMA checkpointing Patch https://gem5-review.googlesource.com/c/public/gem5/+/70040 added support for a variable number of SDMA engines to support newer GPU models. As part of this an SDMA IDs map was added to map from SDMA ID number to the SDMA SimObject pointer. In order to get the correct pointer in unserialize now, we need to store the ID in the checkpoint and use that to index the new map. We can't simply assign using the loop variable as the SDMAs might not be in order in the checkpoint and additionally the checkpoint contains both the gfx and page offset for the SDMA engines, so each SDMA is inserted into the SDMA offset map (sdmaEngs) twice. Change-Id: I08e9a8d785f467b6eebff8ab0a9336851c87258d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70878 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-05-23 14:28:16 +00:00
Matthew Poremba	08644a7670	dev-amdgpu: Fix nbio psp ring assert The size of the packet changes between ROCm 4.x and ROCm 5.x. Change how the address is set based on the incoming packet size so that both versions continue to work for now. Change-Id: I91694e4760198fd9129e60140df4e863666be2e2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70677 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2023-05-22 15:08:11 +00:00
Bobby R. Bruce	fcb36458e2	misc: Fix 'unused variable' clang errors with gem5.fast Change-Id: I2bb8ac10e8db69fa82abe41577cd8e5db575e93d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70297 Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>	2023-05-08 22:54:06 +00:00
Melissa Jost	dd5b1a674e	dev-amdgpu: Remove unused psp_ring_retval integer This change addresses the compiler failures that have been causing any GCN3_X86 build to fail. https://jenkins.gem5.org/job/compiler-checks/589/ Change-Id: Ifd8e2ef89549752ca4aedf0bc9fa47e831a822d3 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70217 Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-05-02 16:46:01 +00:00
Matthew Poremba	316538bf8a	dev-amdgpu: Enable more GPUs with device specific registers Currently gem5 assumes the amdgpu device to be Vega10. In order to support more devices we need to handle situations where different registers and addresses have the same functionality but different offsets on different devices. This changeset adds an NBIO class to handle device discovery and driver initialization related tasks, pulling them out of the AMDGPUDevice class. The offsets used for MMIOs are reworked slightly to use offsets rather than absolute addresses. This is because we cannot determine the absolute address in the constructor since the BAR has not been assigned by the OS yet. Change-Id: I14b364374e086e185978334425a4e265cf2760d0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70041 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-04-28 00:48:35 +00:00
Matthew Poremba	8b91ac6f8d	dev-amdgpu: Refactor MMIO interface for SDMA engines Currently the amdgpu simulated device is assumed to be a Vega10. As a result there are a few things that are hardcoded. One of those is the number of SDMAs. In order to add a newer device, such as MI100+, we need to enable a flexible number of SDMAs. In order to support a variable number of SDMAs and with the MMIO offsets of each device being potentially different, the MMIO interface for SDMAs is changed to use an SDMA class method dispatch table with forwards a 32-bit value from the MMIO packet to the MMIO functions in SDMA of the format `void method(uint32_t)`. Several changes are made to enable this: - Allow the SDMA to have a variable MMIO base and size. These are configured in python. - An SDMA class method dispatch table which contains the MMIO offset relative to the SDMA's MMIO base address. - An updated writeMMIO method to iterate over the SDMA MMIO address ranges and call the appropriate SDMA MMIO method which matches the MMIO offset. - Moved all SDMA related MMIO data bit twiddling, masking, etc. into the MMIO methods themselves instead of in the writeMMIO method in SDMAEngine. Change-Id: Ifce626f84d52f9e27e4438ba4e685e30dbf06dbc Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70040 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-04-28 00:48:35 +00:00
Matthew Poremba	6c1b95ea41	dev-amdgpu: Default MMIO reads when previously written If an MMIO was previously written and the driver reads it, we should return the value that was previously read. This overwrites the MMIO trace value which is the last resort fallback for finding an MMIO value. This is needed to initialize newer GPU devices in gem5. Change-Id: Ida2435290b706288e88518b5d920691cdb6dcc09 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70039 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-04-28 00:48:35 +00:00
Matthew Poremba	9c3107c762	dev-amdgpu,configs: Add human readable names for different GPUs Add a human readable string for GPU device names rather than using the device ID in the code. This is intended to make code more readable. Change-Id: Id3ea74ca37422b1f4a0f09e5a9522d37b5998c1a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70038 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2023-04-28 00:48:35 +00:00
Vishnu Ramadas	f5af8b5876	dev-amdgpu: Add a few MQD attributes to GPUFS checkpoint During GPUFS checkpoint restore, doorbells callbacks are created based on certain MQD attributes. These callbacks are required to create new SDMA doorbells. If these attributes are not present in the checkpoint, the restore hangs indefinitely waiting for ioctl calls that access these doorbells to finish execution. This commit adds the attributes required for checkpoint restore to proceed. Change-Id: Id3d1b7a2627d4c50133d923096495957a233f675 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70077 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com>	2023-04-27 21:15:46 +00:00
Matthew Poremba	c597361a6b	dev-amdgpu: Add writeROM method For non-KVM CPUs the VBIOS memory falls into an I/O hole and therefore gets routed to the PIO bus in gem5. This gets routed to the GPU in the case of a ROM write. We write to the ROM as a way to "load" the VBIOS without creating holes in the KVM VM. This write method allows the same scripts as KVM to be used by writing to the ROM area and overwriting what might already be there from the --gpu-rom option. Change-Id: I8c2d2aa05a823569a774dfdd3bf2d2e773f38683 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70037 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2023-04-22 19:57:26 +00:00
Richard Cooper	ed9effca73	dev-arm: Fix writes to Arm GICv2 GICD_IGROUPRn Writes to the GICD_IGROUPRn registers are currently applied using the `\|=` operator, allowing bits to be set but not cleared. According to the specification [1] this register should allow direct writes. This patch changes the logic to write the new value directly to the register. [1] https://developer.arm.com/documentation/ihi0048/latest/ Change-Id: Ia5f17d05530263d7e918ff33576daaf8165c25c2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69682 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-04-13 21:09:36 +00:00
Richard Cooper	06637a29e5	arch-arm: Add more detailed debug messages to GICv2. Converted the generic DPRINTF messages for the GICv2 register reads and writes (showing only the memory mapped address) to finer grained DPRINTF messages showing the names of the mapped registers being accessed. This change is intended to make it easier to debug the GIC setup from the gem5 debug trace. Change-Id: Ic418b2ea8438fed6a5a810ebc0b686cd4c891cb0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69681 Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2023-04-13 21:09:36 +00:00
Gabe Black	716c154b51	arch,base,dev,sim: Convert objects to use the HostSocket param type. This will make it possible to connect any of these objects with a named socket, in addition to the usual port numbers. Change-Id: Id441c3628f62d60608a07c5cb697786e33199981 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69166 Reviewed-by: Jui-min Lee <fcrh@google.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Gabe Black <gabeblack@google.com> Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>	2023-04-12 02:18:22 +00:00
Gabe Black	2f5c87c7c6	dev: Add an "abortPending" method to the DMA port class. This will abort any pending transactions that have been given to the port. Change-Id: Ie5f2c702530656a0c4590461369d430abead14cd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69437 Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Gabe Black <gabe.black@gmail.com>	2023-04-08 08:02:30 +00:00
Vishnu Ramadas	8b7e55339a	dev-amdgpu: Add GART translations to GPUFS checkpoint Earlier, the GART entries were not being checkpointed. Therefore, during checkpoint restore, certain SDMA instances were initialized with incorrect addresses that led to incorrect behavior. This commit checkpoints the GART entries and restores them. Change-Id: I5464a39ed431e482ff7519b89bd5b664fd992ccf Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69299 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-04-03 22:29:10 +00:00
Vishnu Ramadas	65e0bd6eb4	dev-amdgpu: Added PM4MapQueues to GPUFS checkpoint The GPUFS checkpoint restoration mechanism expects to find a PM4MapQueues packet in the checkpoint. Since this was not being checkpointed, the restore phase retrieved a null packet which led to a segmentation fault. This commit adds PM4MapQueues to the checkpoint and restores it when deserializing the checkpoint Change-Id: Ib74a9f36fe89d740a74f94314ada41ecc363abe9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69298 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>	2023-04-03 22:28:57 +00:00

1 2 3 4 5 ...

1450 Commits