derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matthew Poremba	f07e0e7f5d	gpu-compute: Read dispatch packet with timing DMA This fixes occasional readBlob fatals caused by the functional read of system memory, seen often with the KVM CPU. Change-Id: Ifccee666f62faa5b2fcf0a64a9d77c8cf95b3add	2023-11-01 14:52:39 -05:00
Matthew Poremba	6a4b2bb096	dev-hsa,gpu-compute: Add timestamps to AMD HSA signals The AMD specific HSA signal contains start/end timestamps for dispatch packet completion signals. These are current always zero. These timestamp values are used for profiling in the ROCr runtime. Unfortunately, the GpuAgent::TranslateTime method in ROCr does not check for zero values before dividing, causing applications that use profiling to crash with SIGFPE. Profiling is used via hipEvents in the HACC application, so these should be supported in gem5. In order to handle writing the timestamp values, we need to DMA the values to memory before writing the completion signal. This changes the flow of the async completion signal write to be (1) read mailbox pointer (2) if valid, write the mailbox data, other skip to 4 (3) write mailbox data if pointer is valid (4) write timestamp values (5) write completion signal. The application will process the timestamp data as soon as the completion signal is received, so we need to ordering to ensure the DMA for timestamps was completed. HACC now runs to completion on GPUFS and has the same output was hardware. Change-Id: I09877cdff901d1402140f2c3bafea7605fa6554e	2023-10-06 13:21:40 -05:00
Matthew Poremba	2b97f17fe1	gpu-compute: Fix dynamic scratch size test ROCm supports dynamically allocating scratch space, which resides in framebuffer memory, to reduce the amount of memory allocated for kernels that have not yet launched. The size of the scratch space allocated is located in task->amdQueue.compute_tmpring_size_wavesize. This size is in kilobytes. The AQL task contains the number of bytes requested per work item, however we currently check if there is enough tmpring space by comparing a single work item. This should instead check the size per wavefront. This causes problems in applications where multiple kernels use dynamic scratch allocation and a later kernel requires more space than the earlier kernel. The only application being tested that does this is LULESH. This was resulting in the scratch space being too small, resulting in workgroups clobbering each other's private memory leading to some nasty bugs. It is fixed by this patch as task->amdQueue will be re-read from the host and will contain the updated tmpring size. After this there is enough scratch space and LULESH makes forward progress. Change-Id: Ie9e0f92bb98fd3c3d6c2da3db9ee65352f9ae070	2023-10-04 09:38:31 -05:00
Matthew Poremba	57b3d2897c	gpu-compute: Use timing DMAs for GPUFS HSA signals The functional HSA signal read was a hack left in the gpu-compute code. In full system, this functional read is causing problems occasionally with the translation not yet being in the page table. The error message output by gem5 was a fatal message on the readBlob method in port proxy. Changing this to a timing DMA fixes this problem. This commit adds the various timing DMA functions to send and receive response and clean up. A helper method "sendCompletionSignal" is added to the GPUCommandProcessor because the indentation level was getting too deep. This change applies only to FS mode. Code for SE mode is equivalent to what it was before this commit. Change-Id: I1bfcaa0a52731cdf9532a7fd0eb06ab2f0e09d48	2023-08-25 13:10:51 -05:00
Matthew Poremba	ee75e19b8b	gpu-compute: Fix dynamic scratch allocation on GPUFS When GPU needs more scratch it requests from the runtime. In the method to wait for response, a dmaReadVirt is called with the same method as the callback with zero delay. This means that effectively there is an infinite loop in the event queue if the scratch setup is not successful on the first attempt. In the case of GPUFS, it is never successfully instantly so a delay must be added. Without added delay, the host CPU is never scheduled to make progress setting up more scratch space. The value 1e9 is choosen to match the KVM quantum and hopefully give KVM a chance to schedule an event. For reference, the driver timeout is 200ms so this is still fairly aggressive checking of the signal response. This value is also balanced around the GPUCommandProc DPRINTF to prevent the print in this method from overwhelming debug output. Change-Id: I0e0e1d75cd66f7c47815b13a4bfc3c0188e16220 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61651 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-07-28 14:10:33 +00:00
Matthew Poremba	8fe975e57e	gpu-compute: Fatal on dynamic scratch allocation in GPUFS This is known not working in GPUFS. As a result, the simulation will never end. Rather than simulate forever, add a fatal for now to exit simulation until support for this functionality is added. Change-Id: I8e45996a7eb781575e8643baea05daf87bc5f1c3 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58472 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-08 17:12:32 +00:00
Matthew Poremba	51648570ea	gpu-compute: Add methods to read GPU memory requestor ID These methods are called from various places to override the requestor ID of a request in order to determine which Ruby network a request should be routed on. Change-Id: Ic0270ddd7123f0457a13144e69ef9132204d4334 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57651 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-25 19:51:29 +00:00
Matthew Poremba	581e451723	gpu-compute,dev-hsa: Update CP and HSAPP for full-system Make the necessary changes to connect Vega pagetable walkers for full-system mode. Previously the CP and HSA packet processor could only read AQL packets from system/host memory using proxy port. This allows for AQL to be read from device memory which is used for non-blit kernels. Change-Id: If28eb8be68173da03e15084765e77e92eda178e9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53077 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-25 19:51:29 +00:00
Matthew Poremba	9313294efe	misc: Remove AMD license addition Remove the line "For use for simulation and test purposes only" in files were AMD is the only copyright holder listed in the header. This happens to be the case for all files where this line exists, removing it completely from gem5. Change-Id: I623f266b002f564301b28774f49081099cfc60fd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53943 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-12-11 04:00:56 +00:00
Gabe Black	07c613ff5e	dev,gpu-compute: Use a TranslationGen in DmaVirtDevice. Use a TranslationGen to iterate over the translations for a region, rather than using a ChunkGenerator with a fixed page size the device needs to know. Change-Id: I5da565232bd5282074ef279ca74e556daeffef70 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50763 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Matthew Poremba <matthew.poremba@amd.com>	2021-10-22 21:43:02 +00:00
Bobby R. Bruce	b2677990f6	gpu-compute: Add missing overrides These missing overrides were causing compilations errors with the Clang 11 compiler: https://www.mail-archive.com/gem5-dev@gem5.org/msg39683.html Change-Id: Ib5e7096ab9a7a8505bcc848ff3f08674f7f289ce Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47899 Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-13 00:16:51 +00:00
Matthew Poremba	897c0c11ed	dev,dev-hsa,gpu-compute: Refactor dmaVirt calls Remove the duplicate dmaVirt calls from HSA packet processor and GPU command processor and move them into their own class. This removes some duplicate code and allows a DmaVirtDevice to be created which will be useful for upcoming full system GPU commits. The DmaVirtDevice is an abstraction of the base DmaDevice but iterates using ChunkGenerator over virtual addresses. Classes which inherit from DmaVirtDevice must provide a translation function to translate from virtual address to physical address. Once translated, the physical address is passed to DmaDevice to do the work. Change-Id: Idd59ccb4d9ba21c0b1150ee328ededf5a88d824e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47179 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-09 22:40:18 +00:00
Daniel R. Carvalho	974a47dfb9	misc: Adopt the gem5 namespace Apply the gem5 namespace to the codebase. Some anonymous namespaces could theoretically be removed, but since this change's main goal was to keep conflicts at a minimum, it was decided not to modify much the general shape of the files. A few missing comments of the form "// namespace X" that occurred before the newly added "} // namespace gem5" have been added for consistency. std out should not be included in the gem5 namespace, so they weren't. ProtoMessage has not been included in the gem5 namespace, since I'm not familiar with how proto works. Regarding the SystemC files, although they belong to gem5, they actually perform integration between gem5 and SystemC; therefore, it deserved its own separate namespace. Files that are automatically generated have been included in the gem5 namespace. The .isa files currently are limited to a single namespace. This limitation should be later removed to make it easier to accomodate a better API. Regarding the files in util, gem5:: was prepended where suitable. Notice that this patch was tested as much as possible given that most of these were already not previously compiling. Change-Id: Ia53d404ec79c46edaa98f654e23bc3b0e179fe2d Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46323 Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu> Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-01 19:08:24 +00:00
Kyle Roarty	ec6b325382	gpu-compute, dev-hsa: Remove HSADriver, HSADevice HSADriver/HSADevice were primarily used with GPUCommandProcessor/ GPUComputeDriver. This change merges the classes together to simplify the inheritance hierarchy, as well as removing any casting. Change-Id: I670eb9b49a16c8aba17e13fd1d1287d0621c9f48 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42219 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2021-04-24 15:54:15 +00:00
Michael LeBeane	ad43083bb3	gpu-compute: Implement per-request MTYPEs GPU MTYPE is currently set using a global config passed to the PACoalescer. This patch enables MTYPE to be set by the shader on a per-request bases. In real hardware, the MTYPE is extracted from a GPUVM PTE during address translation. However, our current simulator only models x86 page tables which do not have the appropriate bits for GPU MTYPES. Rather than hacking non-x86 bits into our x86 page table models, this patch instead keeps an interval tree of all pages that request custom MTYPES in the driver itself. This is currently only used to map host pages to the GPU as uncacheable, but is easily extensible to other MTYPES. Change-Id: I7daab0ffae42084b9131a67c85cd0aa4bbbfc8d6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42216 Maintainer: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-04-24 15:54:15 +00:00
Kyle Roarty	c734ab7602	dev-hsa,gpu-compute: Fix override for updateHsaSignal Change `965ad12` removed a parameter from the updateHsaSignal function. Change `25e8a14` added the parameter back, but only for the derived class, breaking the override. This patch adds that parameter back to the base class, fixing the override. Change-Id: Id1e96e29ca4be7f3ce244bac83a112e3250812d1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/44046 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Alex Dutu <alexandru.dutu@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2021-04-03 02:39:27 +00:00
Michael LeBeane	25e8a14a6b	gpu-compute: Support dynamic scratch allocations dGPUs in all versions of ROCm and APUs starting with ROCM 2.2 can under-allocate scratch resources. This patch adds support for the CP to trigger a recoverable error so that the host can attempt to re-allocate scratch to satisfy the currently stalled kernel. Note that this patch does not include a mechanism to handle dynamic scratch allocation for queues with in-flight kernels, as these queues would first need to be drained and descheduled, which would require some additional effort in the hsaPP and HW queue scheduler. If the CP encounters this scenerio it will assert. I suspect this is not a particularly common occurence in most of our applications so it is left as a TODO. This patch also fixes a few memory leaks and updates the old DMA callback object interface to use a much cleaner c++11 lambda interface. Change-Id: Ica8a5fc88888283415507544d6cc49fa748fe84d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42201 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2021-03-25 17:21:08 +00:00
Daniel R. Carvalho	7f1de4e686	misc: Fix coding style for enum's opening braces The systemc dir was not included in this fix. First it was identified that there were only occurrences at 0, 1, and 2 levels of indentation (and 2 of 2 spaces, 1 of 3 spaces and 2 of 12 spaces), using: grep -nrE --exclude-dir=systemc \ "^ enum [A-Za-z]. {$" src/ Then the following commands were run to replace: <indent level>enum X ... { by: <indent level>enum X ... <indent level>{ Level 0: grep -nrl --exclude-dir=systemc \ "^enum [A-Za-z].* {$" src/ \| \ xargs sed -Ei \ 's/^enum ([A-Za-z].) \{$/enum \1\n\{/g' Level 1: grep -nrl --exclude-dir=systemc \ "^ enum [A-Za-z]. {$" src/ \| \ xargs sed -Ei \ 's/^ enum ([A-Za-z].*) \{$/ enum \1\n \{/g' and so on. Change-Id: Ib186cf379049098ceaec20dfe4d1edcedd5f940d Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/43326 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Gabe Black <gabe.black@gmail.com> Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-03-23 16:26:04 +00:00
Bobby R. Bruce	ae33daa8d7	gpu-compute,misc: Fix Clang missing override errors Clang fails to compile GCN3 due to missing overrides in `src/gpu-compute/gpu_command_processor.hh`. This commit fixes this errror. Change-Id: I6da9fce7c3eb86a5418a931ee4f225cceda488a5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40396 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-02-03 19:08:02 +00:00
Gabe Black	fc4caa6ad0	misc: Re-remove Authors lines from source files. These were universally removed a while ago, but a bunch have crept back in. Remove them. Change-Id: I3cb5b9f40c9c19aafb5e39a51d1baeae60a591c0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40335 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Gabe Black <gabe.black@gmail.com>	2021-02-03 12:55:17 +00:00
Sooraj Puthoor	965ad12b9a	dev-hsa: enable interruptible hsa signal support Event creation and management support from emulated drivers is required to support interruptible signals in HSA and this support was not available. This changeset adds the event creation and management support in the emulated driver. With this patch, each interruptible signal created by the HSA runtime is associated with a signal event. The HSA runtime can then put a thread waiting on a signal condition to sleep asking the driver to monitor the event associated with that signal. If the signal is modified by the GPU, the dispatcher notifies the driver about signal value change. If the modifier is a CPU thread, the thread will have to make HSA API calls to modify the signal and these API calls will notify the driver about signal value change. Once the driver is notified about a change in the signal value, the driver checks to see if any thread is sleeping on that signal and wake up the sleeping thread associated with that event. The driver has also implemented the time_out wakeup that can wake up the thread after a certain time period has expired. This is also true for barrier packets. Each signal has an event address in a kernel managed and allocated event page that can be used as a mailbox pointer to notify an event. However, this feature used by non-CPU agents to communicate with the driver is not implemented by this changeset because the non-CPU HSA agents in our model can directly communicate with driver in our implementation. Having said that, adding that feature should be trivial because the event address and event pages are correctly setup by this changeset and just adding the event page's virtual address to our PIO doorbell interface in the page tables and registering that pio address to the driver should be sufficient. Managing mailbox pointer for an event is based on event ID and using this event ID as an index into event page, this changeset already provides a unique mailbox pointer for each event. Change-Id: Ic62794076ddd47526b1f952fdb4c1bad632bdd2e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38335 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-01-31 03:25:05 +00:00
Daniel Gerzhoy	9a01d3e927	dev-hsa,gpu-compute: Agent Packet handler implemented. HSA packet processor will now accept and process agent packets. Type field in packet is command type. For now: AgentCmd::Nop = 0 AgentCmd::Steal = 1 Steal command steals the completion signal for a running kernel. This enables a benchmark to use hsa primitives to send an agent packet to steal the signal, then wait on that signal. Minimal working example to be added in gem5-resources. Change-Id: I37f8a4b7ea1780b471559aecbf4af1050353b0b1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37015 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-11-16 16:12:48 +00:00
Gabe Black	91d83cc8a1	misc: Standardize the way create() constructs SimObjects. The create() method on Params structs usually instantiate SimObjects using a constructor which takes the Params struct as a parameter somehow. There has been a lot of needless variation in how that was done, making it annoying to pass Params down to base classes. Some of the different forms were: const Params & Params & Params * const Params * Params const* This change goes through and fixes up every constructor and every create() method to use the const Params & form. We use a reference because the Params struct should never be null. We use const because neither the create method nor the consuming object should modify the record of the parameters as they came in from the config. That would make consuming them not idempotent, and make it impossible to tell what the actual simulation configuration was since it would change from any user visible form (config script, config.ini, dot pdf output). Change-Id: I77453cba52fdcfd5f4eec92dfb0bddb5a9945f31 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35938 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-14 12:06:44 +00:00
Tony Gutierrez	b8da9abba7	gpu-compute, mem-ruby, configs: Add GCN3 ISA support to GPU model Change-Id: Ibe46970f3ba25d62ca2ade5cbc2054ad746b2254 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29912 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-15 22:45:17 +00:00

24 Commits