derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matthew Poremba	1d64669473	mem,gpu-compute: Implement GPU TCC directed invalidate The GPU device currently supports large BAR which means that the driver can write directly to GPU memory over the PCI bus without using SDMA or PM4 packets. The gem5 PCI interface only provides an atomic interface for BAR reads/writes, which means the values cannot go through timing mode Ruby caches. This causes bugs as the TCC cache is allowed to keep clean data between kernels for performance reasons. If there is a BAR write directly to memory bypassing the cache, the value in the cache is stale and must be invalidated. In this commit a TCC invalidate is generated for all writes over PCI that go directly to GPU memory. This will also invalidate TCP along the way if necessary. This currently relies on the driver synchonization which only allows BAR writes in between kernels. Therefore, the cache should only be in I or V state. To handle a race condition between invalidates and launching the next kernel, the invalidates return a response and the GPU command processor will wait for all TCC invalidates to be complete before launching the next kernel. This fixes issues with stale data in nanoGPT and possibly PENNANT. Change-Id: I8e1290f842122682c271e5508a48037055bfbcdf	2024-04-10 11:35:25 -07:00
Michael Boyer	acd9d3ff94	gpu-compute: Add support for skipping GPU kernels (#940 ) gpu-compute: Add support for skipping GPU kernels This commit adds two new command-line options: --skip-until-gpu-kernel N Skips (non-blit) GPU kernels until the target kernel is reached. Execution continues normally from there. Blit kernels are not skipped because they are responsible for copying the kernel code and metadata for the non-blit kernels. Note that skipping kernels can impact correctness; this feature is only useful if the kernel of interest has no data-dependent behavior, or its data-dependent behavior is not based on data generated by the skipped kernels. --exit-after-gpu-kernel N Ends the simulation after completing (non-blit) GPU kernel N. This commit also renames two existing command-line options: --debug-at-gpu-kernel -> --debug-at-gpu-task --exit-at-gpu-kernel -> --exit-at-gpu-task These were renamed because they count GPU tasks, which include both kernels launched by the application as well as blit kernels. Change-Id: If250b3fd2db05c1222e369e9e3f779c4422074bc	2024-03-21 07:46:27 -07:00
Michael Boyer	ba2f5615ba	gpu-compute: Support cache line sizes >64B in GPUFS (#939 ) This change fixes two issues: 1) The --cacheline_size option was setting the system cache line size but not the Ruby cache line size, and the mismatch was causing assertion failures. 2) The submitDispatchPkt() function accesses the kernel object in chunks, with the chunk size equal to the cache line size. For cache line sizes >64B (e.g. 128B), the kernel object is not guaranteed to be aligned to a cache line and it was possible for a chunk to be partially contained in two separate device memories, causing the memory access to fail. Change-Id: I8e45146901943e9c2750d32162c0f35c851e09e1 Co-authored-by: Michael Boyer <Michael.Boyer@amd.com>	2024-03-20 11:09:25 -07:00
Matthew Poremba	8722aef2e2	gpu-compute: Store accum_offset from code object in WF The accumulation offset is needed for some instructions. In order to access this value we need to place it somewhere instruction definitions can access. The most logical place is in the wavefront. This commit simply copies the value from the HSA task to the wavefront object. Change-Id: I44ef62ef32d2421953f096c431dd758e882245b4	2024-02-26 12:54:37 -06:00
Matthew Poremba	cc75281802	gpu-compute: Update code object to latest LLVM The AMDKernelCode struct is very outdated. Most of the fields are no longer used and have been replaced with new fields that are used. Therefore in order to support the new fields the code object needs to be updated. The new structure is based on the table located at https://llvm.org/docs/AMDGPUUsage.html#code-object-v3-kernel-descriptor Most notably this adds the new compute_pgm_rsrc3 and kernarg preload fields which are new features in gfx90a (MI200). The accum_offset in compute_pgm_rsrc3 and kergarg preload values are necessary to run application which enable those features and therefore a way to check their values is needed. Also noteable is the removal of enable_sgpr_workgroup_id_{X,Y,Z}. These seem to be unused in all versions of ROCm that gem5 supports and therefore these fields can be removed. They are replaced with a reserved field in the new code object. Change-Id: I5542442e1e5961b05e17affad0adb5186d6d9d1a	2024-01-03 15:41:06 -06:00
Matthew Poremba	f07e0e7f5d	gpu-compute: Read dispatch packet with timing DMA This fixes occasional readBlob fatals caused by the functional read of system memory, seen often with the KVM CPU. Change-Id: Ifccee666f62faa5b2fcf0a64a9d77c8cf95b3add	2023-11-01 14:52:39 -05:00
Matthew Poremba	d05433b3f6	gpu-compute,dev-hsa: Send vendor packet completion signal gem5 does not currently implement any vendor-specific HSA packets. Starting in ROCm 5.5, vendor packets appear to end with a completion signal. Not sending this completion causes gem5 to hang. Since these packets are not documented anywhere and need to be reverse engineered we send the completion signal, if non-zero, and finish the packet as is the current behavior. Testing: HIP examples working on most recent ROCm release (5.7.1). Change-Id: Id0841407bec564c84f590c943f0609b17e01e14c	2023-11-01 14:52:39 -05:00
Matthew Poremba	6a4b2bb096	dev-hsa,gpu-compute: Add timestamps to AMD HSA signals The AMD specific HSA signal contains start/end timestamps for dispatch packet completion signals. These are current always zero. These timestamp values are used for profiling in the ROCr runtime. Unfortunately, the GpuAgent::TranslateTime method in ROCr does not check for zero values before dividing, causing applications that use profiling to crash with SIGFPE. Profiling is used via hipEvents in the HACC application, so these should be supported in gem5. In order to handle writing the timestamp values, we need to DMA the values to memory before writing the completion signal. This changes the flow of the async completion signal write to be (1) read mailbox pointer (2) if valid, write the mailbox data, other skip to 4 (3) write mailbox data if pointer is valid (4) write timestamp values (5) write completion signal. The application will process the timestamp data as soon as the completion signal is received, so we need to ordering to ensure the DMA for timestamps was completed. HACC now runs to completion on GPUFS and has the same output was hardware. Change-Id: I09877cdff901d1402140f2c3bafea7605fa6554e	2023-10-06 13:21:40 -05:00
Matthew Poremba	57b3d2897c	gpu-compute: Use timing DMAs for GPUFS HSA signals The functional HSA signal read was a hack left in the gpu-compute code. In full system, this functional read is causing problems occasionally with the translation not yet being in the page table. The error message output by gem5 was a fatal message on the readBlob method in port proxy. Changing this to a timing DMA fixes this problem. This commit adds the various timing DMA functions to send and receive response and clean up. A helper method "sendCompletionSignal" is added to the GPUCommandProcessor because the indentation level was getting too deep. This change applies only to FS mode. Code for SE mode is equivalent to what it was before this commit. Change-Id: I1bfcaa0a52731cdf9532a7fd0eb06ab2f0e09d48	2023-08-25 13:10:51 -05:00
Matthew Poremba	ebd5b3e4ae	gpu-compute: Gfx version check for FS and SE mode There is no GPU device in SE mode to get version from and no GPU driver in FS mode to get version from, so a conditional needs to be added depending on the mode to get the gfx version. Change-Id: I33fdafb60d351ebc5148e2248244537fb5bebd31 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71078 Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-06-01 00:15:02 +00:00
Matthew Poremba	6b4a1020be	configs,dev-amdgpu: GPUFS MI200/gfx90a support Add support for MI200-like device. This includes adding PCI IDs and new MMIOs for the device, a different MAP_PROCESS packet, and a different calculation for the number of VGPRs. Change-Id: I0fb7b3ad928826beaa5386d52a94ba504369cb0d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70317 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-05-25 19:14:32 +00:00
Matthew Poremba	f6dc5c6aa4	gpu-compute: Chunkify AMDKernelCode read from device The AMDKernelCode object can span potentially span two pages. Currently the copy loop from device memory only translates once at the base address. This changeset translates one cache line at a time before copying and has the ancillary benefit for cleaning up this code a bit. Change-Id: I602bc12d8f8c5d3a3e57ab3f42f7dd3df58dc144 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65251 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com>	2022-11-08 21:34:11 +00:00
Matthew Poremba	6feaa88e27	gpu-compute: Command processor read path from device In full system mode, the AMDKernelCode object can reside in either the system memory or in the dGPU device memory. Currently only reading from the host/system memory is supported. This adds the necessary code to read from the dGPU device memory. Change-Id: I887fc706b3f9834db14e40f36fd29dd3d4602925 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57710 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-07 20:11:01 +00:00
Matthew Poremba	347364ab0f	gpu-compute: Handle mailbox/wakeup signals for GPUFS The current mailbox/wakeup signal uses the SE mode proxy port to write the event value. This is not available in full system mode so instead we need to issue a DMA write to the address. The value of event_val clears the event. Change-Id: I424469076e87e690ab0bb722bac4c3e7414fb150 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57709 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-07 20:11:01 +00:00
Matthew Poremba	51648570ea	gpu-compute: Add methods to read GPU memory requestor ID These methods are called from various places to override the requestor ID of a request in order to determine which Ruby network a request should be routed on. Change-Id: Ic0270ddd7123f0457a13144e69ef9132204d4334 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57651 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-25 19:51:29 +00:00
Matthew Poremba	581e451723	gpu-compute,dev-hsa: Update CP and HSAPP for full-system Make the necessary changes to connect Vega pagetable walkers for full-system mode. Previously the CP and HSA packet processor could only read AQL packets from system/host memory using proxy port. This allows for AQL to be read from device memory which is used for non-blit kernels. Change-Id: If28eb8be68173da03e15084765e77e92eda178e9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53077 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-25 19:51:29 +00:00
Matthew Poremba	9313294efe	misc: Remove AMD license addition Remove the line "For use for simulation and test purposes only" in files were AMD is the only copyright holder listed in the header. This happens to be the case for all files where this line exists, removing it completely from gem5. Change-Id: I623f266b002f564301b28774f49081099cfc60fd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53943 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-12-11 04:00:56 +00:00
Gabe Black	07c613ff5e	dev,gpu-compute: Use a TranslationGen in DmaVirtDevice. Use a TranslationGen to iterate over the translations for a region, rather than using a ChunkGenerator with a fixed page size the device needs to know. Change-Id: I5da565232bd5282074ef279ca74e556daeffef70 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50763 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Matthew Poremba <matthew.poremba@amd.com>	2021-10-22 21:43:02 +00:00
Gabe Black	83b14e569b	misc: Stop using getVirtProxy. The proxies are not used on the critical path, and it's usually implicit whether they should be the FS or SE version. Ideally in the future we won't need to worry about which version we need to use, but the differences haven't quite been abstracted away, and occasionally we need to decide between the two. Change-Id: Idb363d6ddc681f7c1ad5e7aba69865f40aa30dc8 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45907 Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2021-07-23 03:42:17 +00:00
Matthew Poremba	897c0c11ed	dev,dev-hsa,gpu-compute: Refactor dmaVirt calls Remove the duplicate dmaVirt calls from HSA packet processor and GPU command processor and move them into their own class. This removes some duplicate code and allows a DmaVirtDevice to be created which will be useful for upcoming full system GPU commits. The DmaVirtDevice is an abstraction of the base DmaDevice but iterates using ChunkGenerator over virtual addresses. Classes which inherit from DmaVirtDevice must provide a translation function to translate from virtual address to physical address. Once translated, the physical address is passed to DmaDevice to do the work. Change-Id: Idd59ccb4d9ba21c0b1150ee328ededf5a88d824e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47179 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-09 22:40:18 +00:00
Daniel R. Carvalho	974a47dfb9	misc: Adopt the gem5 namespace Apply the gem5 namespace to the codebase. Some anonymous namespaces could theoretically be removed, but since this change's main goal was to keep conflicts at a minimum, it was decided not to modify much the general shape of the files. A few missing comments of the form "// namespace X" that occurred before the newly added "} // namespace gem5" have been added for consistency. std out should not be included in the gem5 namespace, so they weren't. ProtoMessage has not been included in the gem5 namespace, since I'm not familiar with how proto works. Regarding the SystemC files, although they belong to gem5, they actually perform integration between gem5 and SystemC; therefore, it deserved its own separate namespace. Files that are automatically generated have been included in the gem5 namespace. The .isa files currently are limited to a single namespace. This limitation should be later removed to make it easier to accomodate a better API. Regarding the files in util, gem5:: was prepended where suitable. Notice that this patch was tested as much as possible given that most of these were already not previously compiling. Change-Id: Ia53d404ec79c46edaa98f654e23bc3b0e179fe2d Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46323 Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu> Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-01 19:08:24 +00:00
Kyle Roarty	f2a029058a	gpu-compute: Ignore GPU kernel names ROCm 4 seems to have updated the akc, and the only real issue that has occured is that we're no longer able to read kernel names in the same way as we were in ROCm 1.6. This patch removes the prior method of reading kernel names and gives all kernels a temporary name Change-Id: I0040e0cf4cd35d6f56ded6a8acfb10c600bcc77a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46245 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2021-06-30 16:47:43 +00:00
Kyle Roarty	ec6b325382	gpu-compute, dev-hsa: Remove HSADriver, HSADevice HSADriver/HSADevice were primarily used with GPUCommandProcessor/ GPUComputeDriver. This change merges the classes together to simplify the inheritance hierarchy, as well as removing any casting. Change-Id: I670eb9b49a16c8aba17e13fd1d1287d0621c9f48 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42219 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2021-04-24 15:54:15 +00:00
Michael LeBeane	ad43083bb3	gpu-compute: Implement per-request MTYPEs GPU MTYPE is currently set using a global config passed to the PACoalescer. This patch enables MTYPE to be set by the shader on a per-request bases. In real hardware, the MTYPE is extracted from a GPUVM PTE during address translation. However, our current simulator only models x86 page tables which do not have the appropriate bits for GPU MTYPES. Rather than hacking non-x86 bits into our x86 page table models, this patch instead keeps an interval tree of all pages that request custom MTYPES in the driver itself. This is currently only used to map host pages to the GPU as uncacheable, but is easily extensible to other MTYPES. Change-Id: I7daab0ffae42084b9131a67c85cd0aa4bbbfc8d6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42216 Maintainer: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-04-24 15:54:15 +00:00
Michael LeBeane	25e8a14a6b	gpu-compute: Support dynamic scratch allocations dGPUs in all versions of ROCm and APUs starting with ROCM 2.2 can under-allocate scratch resources. This patch adds support for the CP to trigger a recoverable error so that the host can attempt to re-allocate scratch to satisfy the currently stalled kernel. Note that this patch does not include a mechanism to handle dynamic scratch allocation for queues with in-flight kernels, as these queues would first need to be drained and descheduled, which would require some additional effort in the hsaPP and HW queue scheduler. If the CP encounters this scenerio it will assert. I suspect this is not a particularly common occurence in most of our applications so it is left as a TODO. This patch also fixes a few memory leaks and updates the old DMA callback object interface to use a much cleaner c++11 lambda interface. Change-Id: Ica8a5fc88888283415507544d6cc49fa748fe84d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42201 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2021-03-25 17:21:08 +00:00
Kyle Roarty	a9e0a1ccf1	gpu-compute: Explicitly set driver to nullptr in constructor We have a fail_if in attachDriver to prevent driver from being overwritten. However, the fail_if only checks for if the driver is not nullptr. Previously, in some cases driver was set to garbage, which made the fail_if trip the first time we were assigning the driver. This patch explicitly sets driver to nullptr in the constructor, thus ensuring that it will be nullptr the first time we call attachDriver Change-Id: I325f6033e785025a912e3af3888c66cee0332f40 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41973 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-03-01 18:10:11 +00:00
Gabe Black	fc4caa6ad0	misc: Re-remove Authors lines from source files. These were universally removed a while ago, but a bunch have crept back in. Remove them. Change-Id: I3cb5b9f40c9c19aafb5e39a51d1baeae60a591c0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40335 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Gabe Black <gabe.black@gmail.com>	2021-02-03 12:55:17 +00:00
Sooraj Puthoor	965ad12b9a	dev-hsa: enable interruptible hsa signal support Event creation and management support from emulated drivers is required to support interruptible signals in HSA and this support was not available. This changeset adds the event creation and management support in the emulated driver. With this patch, each interruptible signal created by the HSA runtime is associated with a signal event. The HSA runtime can then put a thread waiting on a signal condition to sleep asking the driver to monitor the event associated with that signal. If the signal is modified by the GPU, the dispatcher notifies the driver about signal value change. If the modifier is a CPU thread, the thread will have to make HSA API calls to modify the signal and these API calls will notify the driver about signal value change. Once the driver is notified about a change in the signal value, the driver checks to see if any thread is sleeping on that signal and wake up the sleeping thread associated with that event. The driver has also implemented the time_out wakeup that can wake up the thread after a certain time period has expired. This is also true for barrier packets. Each signal has an event address in a kernel managed and allocated event page that can be used as a mailbox pointer to notify an event. However, this feature used by non-CPU agents to communicate with the driver is not implemented by this changeset because the non-CPU HSA agents in our model can directly communicate with driver in our implementation. Having said that, adding that feature should be trivial because the event address and event pages are correctly setup by this changeset and just adding the event page's virtual address to our PIO doorbell interface in the page tables and registering that pio address to the driver should be sufficient. Managing mailbox pointer for an event is based on event ID and using this event ID as an index into event page, this changeset already provides a unique mailbox pointer for each event. Change-Id: Ic62794076ddd47526b1f952fdb4c1bad632bdd2e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38335 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-01-31 03:25:05 +00:00
Daniel Gerzhoy	9a01d3e927	dev-hsa,gpu-compute: Agent Packet handler implemented. HSA packet processor will now accept and process agent packets. Type field in packet is command type. For now: AgentCmd::Nop = 0 AgentCmd::Steal = 1 Steal command steals the completion signal for a running kernel. This enables a benchmark to use hsa primitives to send an agent packet to steal the signal, then wait on that signal. Minimal working example to be added in gem5-resources. Change-Id: I37f8a4b7ea1780b471559aecbf4af1050353b0b1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37015 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-11-16 16:12:48 +00:00
Gabe Black	d05a0a4ea1	misc: Delete the now unnecessary create methods. Most create() methods are no longer necessary. This change deletes them, and occasionally moves some code from them into the constructors they call. Change-Id: Icbab29ba280144b892f9b12fac9e29a0839477e5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36536 Reviewed-by: Gabe Black <gabe.black@gmail.com> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-30 04:00:20 +00:00
Gabe Black	91d83cc8a1	misc: Standardize the way create() constructs SimObjects. The create() method on Params structs usually instantiate SimObjects using a constructor which takes the Params struct as a parameter somehow. There has been a lot of needless variation in how that was done, making it annoying to pass Params down to base classes. Some of the different forms were: const Params & Params & Params * const Params * Params const* This change goes through and fixes up every constructor and every create() method to use the const Params & form. We use a reference because the Params struct should never be null. We use const because neither the create method nor the consuming object should modify the record of the parameters as they came in from the config. That would make consuming them not idempotent, and make it impossible to tell what the actual simulation configuration was since it would change from any user visible form (config script, config.ini, dot pdf output). Change-Id: I77453cba52fdcfd5f4eec92dfb0bddb5a9945f31 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35938 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-14 12:06:44 +00:00
Michael LeBeane	1d816250f8	gpu_compute: Support loading BLIT kernels The BLIT kernels used to implement DMA through the shaders don't fill out all of the standard fields in an amd_kernel_code_t object. This patch modifies the code object parsing logic to support these new kernels. BLIT kernels are used in APUs when using ROCm memcopies for certain size buffers, and are used for dGPUs when the SDMA engines are disabled. Change-Id: Id4e667474d05e311097dbec443def07dfad14a79 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29959 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-17 16:13:59 +00:00
Tony Gutierrez	b8da9abba7	gpu-compute, mem-ruby, configs: Add GCN3 ISA support to GPU model Change-Id: Ibe46970f3ba25d62ca2ade5cbc2054ad746b2254 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29912 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-15 22:45:17 +00:00

33 Commits