derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matthew Poremba	c644eae2dd	configs,gpu-compute: Kernel dispatch-based exit events Add two kernel dispatch-based exit events that are useful for limiting the simulation and enabling debug flags at specific GPU kernels. Since the KVM CPU typically used with GPUFS is not deterministic, this help with enabling debug flags when the Tick number may vary. The exit at GPU kernel option can also limit simulation by only simulating a few hundred kernels, for example, and exit at a determined point. Change-Id: I81bae92a80c25fc38c41e999aa662e1417b7a20d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71418 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-06-08 22:03:47 +00:00
Matthew Poremba	ebd5b3e4ae	gpu-compute: Gfx version check for FS and SE mode There is no GPU device in SE mode to get version from and no GPU driver in FS mode to get version from, so a conditional needs to be added depending on the mode to get the gfx version. Change-Id: I33fdafb60d351ebc5148e2248244537fb5bebd31 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71078 Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-06-01 00:15:02 +00:00
Matthew Poremba	6b4a1020be	configs,dev-amdgpu: GPUFS MI200/gfx90a support Add support for MI200-like device. This includes adding PCI IDs and new MMIOs for the device, a different MAP_PROCESS packet, and a different calculation for the number of VGPRs. Change-Id: I0fb7b3ad928826beaa5386d52a94ba504369cb0d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70317 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-05-25 19:14:32 +00:00
Giacomo Travaglini	7b39a7f14e	misc: Rename DEBUG macro into GEM5_DEBUG The DEBUG macro is not part of any compiler standards (differently from NDEBUG, which elides assertions). It is only meant to differentiate gem5.debug from .fast and .opt builds. gem5 developers have used it to insert helper code that is supposed to aid the debugging process in case anything goes wrong. This generic name is likely to clash with other libraries linked with gem5. This is the case of DRAMSim as an example. Rather than using undef tricks, we just inject a GEM5_DEBUG macro for gem5.debug builds. Change-Id: Ie913ca30da615bd0075277a260bbdbc397b7ec87 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69079 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>	2023-03-21 06:53:55 +00:00
Gabriel Busnot	7f4c92c910	mem,arch-arm,mem-ruby,cpu: Remove use of deprecated base port owner Change-Id: I29214278c3dd4829c89a6f7c93214b8123912e74 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67452 Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>	2023-02-03 06:11:45 +00:00
Vishnu Ramadas	d6bbccb60a	gpu-compute : Fix incorrect TLB stats when FunctionalTLB is used When FunctionalTLB is used in SE mode, the stats tlbLatency and tlbCycles report negative values. This patch fixes it by disabling the updates that result in negative values when FunctionalTLB is set to true Change-Id: I6962785fc1730b166b6d5b879e9c7618a8d6d4b3 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67202 Reviewed-by: Matt Sinclair <mattdsinclair.wisc@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-01-10 02:27:29 +00:00
Matthew Poremba	af2cecf59e	gpu-compute: Fix ABI init for DispatchId DispatchId should allocate two SGPRs instead of one. Allocating one was causing all subsequent SGPR index values to be off by one, leading to bad addresses for things like flat scratch and private segment. This field is not used very often so it was not impacting most applications. Change-Id: I17744e2d099fbc0447f400211ba7f8a42675ea06 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66711 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-12-16 18:16:18 +00:00
Hoa Nguyen	eac06ad681	python: Fix multiline quotes in a single line An example case, ```python mem_side_port = RequestPort( "This port sends requests and " "receives responses" ) ``` This is the residue of running the python formatter. This is done by finding all tokens matching the regex `"\s"(?![.;"])` and manually replacing them by empty strings. Change-Id: Icf223bbe889e5fa5749a81ef77aa6e721f38b549 Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66111 Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu> Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2022-11-29 23:44:38 +00:00
vramadas95	dff879cf21	configs, gpu-compute: Add configurable L1 scalar latencies Previously the scalar cache path used the same latency parameter as the vector cache path for memory requests. This commit adds new parameters for the scalar cache path latencies. This commit also modifies the model to use the new latency parameter to set the memory request latency in the scalar cache. The new paramters are '--scalar-mem-req-latency' and '--scalar-mem-resp-latency' and are set to default values of 50 and 0 respectively Change-Id: I7483f780f2fc0cfbc320ed1fd0c2ee3e2dfc7af2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65511 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-11-12 02:23:02 +00:00
Matthew Poremba	8d63c9fc06	gpu-compute: Add granulated SGPR computation for gfx9 The granulated SGPR size is used when the number of SGPRs is unknown. The computation for this has changed since gfx8 and is commented as a TODO in a comment. This changeset implements the change and also checks for an invalid SGPR count. According to LLVM code this could happen "due to a compiler bug or when using inline asm.": https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/ AMDGPUAsmPrinter.cpp#L723 Change-Id: Ie487a53940b323a0002341075e0f81af4147a7d8 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65252 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-11-08 21:34:11 +00:00
Matthew Poremba	f6dc5c6aa4	gpu-compute: Chunkify AMDKernelCode read from device The AMDKernelCode object can span potentially span two pages. Currently the copy loop from device memory only translates once at the base address. This changeset translates one cache line at a time before copying and has the ancillary benefit for cleaning up this code a bit. Change-Id: I602bc12d8f8c5d3a3e57ab3f42f7dd3df58dc144 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65251 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com>	2022-11-08 21:34:11 +00:00
Matthew Poremba	ec696c00b2	gpu-compute: Add missing initial reg state in WF There are two initial scalar register fields that are not initialized in the wavefront when a task is dispatch. This changeset adds the missing DispatchId and PrivateSegSize fields. These fields are typically used when an application is compiled with debug support and are typically not used in the applications in gem5's test suite. Change-Id: I5b5fa75e4badfd9ba7588e4cd485ebf75fd5d627 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64191 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-10-06 23:14:40 +00:00
Matthew Poremba	9f5c0f2822	gpu-compute: dprint instruction requesting translation When debugging strange addresses, it is extremely useful to know what instruction calculated that address. This make it much easier to follow assembly code backwards to find the source of an incorrect address. This change adds a DPRINTF for GPUTLB that by default prints the disassembly when a virtual address translation is sent to the TLB. Change-Id: I5066c064a48c5c48696863eeccd8d011245ef7b2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63176 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-09-09 04:13:49 +00:00
Gabe Black	f4209bbdee	misc: Remove lingering uses of TheISA::. Change-Id: Ie55e0d79867fbc8f75a993fb456a58c84de5def4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62196 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>	2022-08-20 07:30:16 +00:00
Alexandru Dutu	c6b38909e1	gpu-compute: Adding support for LDS atomics This changeset is adding support for LDS atomics and implementing DS_OR_B32 instruction. Change-Id: I84c5cf6ce0e9494726dc7299f360551cd2a485f5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61791 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-08-19 16:44:31 +00:00
Bobby R. Bruce	787204c92d	python: Apply Black formatter to Python files The command executed was `black src configs tests util`. Change-Id: I8dfaa6ab04658fea37618127d6ac19270028d771 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47024 Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-08-03 09:10:41 +00:00
Matthew Poremba	68115460d8	gpu-compute: Set LDS and Scratch apertures in FS The LDS and scratch aperture base and limits are hardcoded to some values that are useful for SE mode. In reality, these are chosen by the driver so we need to honor whatever values the driver passes so that when addresses are calculated they fall into the correct aperture to route flat instructions to those apertures. This overwrites the default hardcoded values for LDS and scratch base and limit using the values providing by the driver in a MAP_PROCESS packet. Change-Id: I0e194a26631f697819d8aaecf1bf346a7b7c7026 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61656 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2022-07-28 14:10:33 +00:00
Matthew Poremba	f65f5a8981	gpu-compute,arch-vega: Overhaul HWRegs, setreg, getreg These instructions are supposed to be read/writing special shader hardware registers. Currently they are getting/setting to an SGPR. This results in getting incorrect registers at best and clobbering an SGPR being used by an application at worst. Furthermore, some registers need to be set in the shader and the application will never (can never) set them. This patch overhauls the getreg/setreg instructions to use different storage in the shader. The values will be updated either via setreg from an application (e.g., mode register) or set by a PM4 MAP_PROCESS. Change-Id: Ie5e5d552bd04dc47f5b35b5ee40a569ae345abac Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61655 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-07-28 14:10:33 +00:00
Matthew Poremba	ee75e19b8b	gpu-compute: Fix dynamic scratch allocation on GPUFS When GPU needs more scratch it requests from the runtime. In the method to wait for response, a dmaReadVirt is called with the same method as the callback with zero delay. This means that effectively there is an infinite loop in the event queue if the scratch setup is not successful on the first attempt. In the case of GPUFS, it is never successfully instantly so a delay must be added. Without added delay, the host CPU is never scheduled to make progress setting up more scratch space. The value 1e9 is choosen to match the KVM quantum and hopefully give KVM a chance to schedule an event. For reference, the driver timeout is 200ms so this is still fairly aggressive checking of the signal response. This value is also balanced around the GPUCommandProc DPRINTF to prevent the print in this method from overwhelming debug output. Change-Id: I0e0e1d75cd66f7c47815b13a4bfc3c0188e16220 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61651 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-07-28 14:10:33 +00:00
Gabe Black	397e66d8b6	gpu-compute: Stop passing in a default value to resize(). A default constructed element of the container is the default value to the second resize() parameter. Having that parameter explicitly causes a warning/error in newer versions of gcc, and is unnecessary regardless. Change-Id: I6aee2d23f0f4382b00caf552c8e38940614c5f9a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/60311 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Gabe Black <gabe.black@gmail.com> Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>	2022-06-03 10:48:31 +00:00
Matthew Poremba	8fe975e57e	gpu-compute: Fatal on dynamic scratch allocation in GPUFS This is known not working in GPUFS. As a result, the simulation will never end. Rather than simulate forever, add a fatal for now to exit simulation until support for this functionality is added. Change-Id: I8e45996a7eb781575e8643baea05daf87bc5f1c3 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58472 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-08 17:12:32 +00:00
Matthew Poremba	e36a8dbd8a	gpu-compute: Handle GPUFS system store responses Requests in GPUFS which go to system memory will not generate the WriteCompleteResp packets that the VIPER protocol would normally created for device requests which go through the caches. Therefore, we need to callback the GM pipe handleResponse to complete the access and make forward progress. Change-Id: Ic00c430ce420a591fe5743f758b780d93afd2a38 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57989 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-07 20:11:01 +00:00
Matthew Poremba	6feaa88e27	gpu-compute: Command processor read path from device In full system mode, the AMDKernelCode object can reside in either the system memory or in the dGPU device memory. Currently only reading from the host/system memory is supported. This adds the necessary code to read from the dGPU device memory. Change-Id: I887fc706b3f9834db14e40f36fd29dd3d4602925 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57710 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-07 20:11:01 +00:00
Matthew Poremba	fcbc9afcd6	gpu-compute: Don't use emulated driver in full system The emulated driver is currently called in a few locations unconditionally. This changeset adds checks that we are not in full system before calling any emulated driver function. In full system the amdgpu driver running on the disk image handles these functions. Change-Id: Iea3546b574e29c649351c0fce9154530be89e9b1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57712 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-07 20:11:01 +00:00
Matthew Poremba	f375e79bcf	gpu-compute: Support Scalar and Vector access to system pages The amdgpu driver supports reading and writing scalar and vector memory addresses that reside in system memory. This is commonly used for things like blit kernels that perform host-to-device or device-to-host copies using GPU load/store instructions. This is done by utilizing the system hub device added in a prior changeset. Memory packets translated by the Scalar or VMEM TLBs will have the correspoding system request field set from the PTE in the TLB which can be used in the compute unit to determine if a request is for system memory or not. Another important change is to return global memory tokens for system requests. Since these do not flow through the GPU coalescer where the token is returned, the token can be returned once the request is known to be a system request. Change-Id: I35030e0b3698f10c63a397f96b81267271e3130e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57711 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-07 20:11:01 +00:00
Matthew Poremba	347364ab0f	gpu-compute: Handle mailbox/wakeup signals for GPUFS The current mailbox/wakeup signal uses the SE mode proxy port to write the event value. This is not available in full system mode so instead we need to issue a DMA write to the address. The value of event_val clears the event. Change-Id: I424469076e87e690ab0bb722bac4c3e7414fb150 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57709 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-04-07 20:11:01 +00:00
Matthew Poremba	91e8bbe299	configs,gpu-compute: Support fetch from system pages The amdgpu driver supports fetching instructions from pages which reside in system memory rather than device memory. This changeset adds support to do this by adding the system hub object added in a prior changeset to the fetch unit and issues requests to the system hub if the system bit in the memory page's PTE is set. Otherwise, the requestor ID is set to be device memory and the request is routed through the Ruby network / GPU caches to fetch the instructions. Change-Id: Ib2fb47c589fdd5e544ab6493d7dbd8f2d9d7b0e8 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57652 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-28 23:24:53 +00:00
Gabe Black	e6c0ba97db	scons: Put all config variables in an env['CONF'] sub-dict. This makes what are configuration and what are internal SCons variables explicit and separate, and makes it unnecessary to call out what variables to export to C++. These variables will also be plumbed into and out of kconfiglib in later changes. Change-Id: Iaf5e098d7404af06285c421dbdf8ef4171b3f001 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56892 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-28 20:31:21 +00:00
Matthew Poremba	51648570ea	gpu-compute: Add methods to read GPU memory requestor ID These methods are called from various places to override the requestor ID of a request in order to determine which Ruby network a request should be routed on. Change-Id: Ic0270ddd7123f0457a13144e69ef9132204d4334 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57651 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-25 19:51:29 +00:00
Matthew Poremba	581e451723	gpu-compute,dev-hsa: Update CP and HSAPP for full-system Make the necessary changes to connect Vega pagetable walkers for full-system mode. Previously the CP and HSA packet processor could only read AQL packets from system/host memory using proxy port. This allows for AQL to be read from device memory which is used for non-blit kernels. Change-Id: If28eb8be68173da03e15084765e77e92eda178e9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53077 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-25 19:51:29 +00:00
Matthew Poremba	539a2e2bcd	arch-vega: Add VEGA page tables and TLB Add the page table walker, page table format, TLB, TLB coalescer, and associated support in the AMDGPUDevice. This page table format used the hardware format for dGPU and is very different from APU/GCN3 which use the X86 page table format. In order to support either format for the GPU model, a common TranslationState called GpuTranslation state is created which holds the combined fields of both the APU and Vega translation state. Similarly the TlbEntry is cast at runtime by the corresponding arch files as they are the only files which touch the internals of the TlbEntry. The GPU model only checks if a TlbEntry is non-null and thus does not need to cast to peek inside the data structure. Change-Id: I4484c66239b48df5224d61caa6e968e56eea38a5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51848 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-17 00:11:14 +00:00
Gabe Black	06117275fa	scons: Make all sticky variables automatically exported. All sticky vars are exported, but not all exported vars are sticky. The vars which are exported but not sticky are (at least in general) found with Configure() style measurement. Change-Id: Idebf17e44c2eeca745cdfdd9f42eddcfdb0cf9ed Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56891 Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>	2022-03-15 00:45:30 +00:00
Matthew Poremba	45ad755511	gpu-compute: Fix default MTYPE initialization The default MTYPE initialization in the emulated GPU driver is currently doing a bitwise AND on an input integer param with other integers instead of using a bitmask. Change this to use bitset and test the bit positions corresponding to the values in the MTYPE enum that were previously being used as an operand for bitwise AND. This was causing invalid slicc transitions in some benchmarks for combinations of request type and mtype that are undefined. Change-Id: I93fee0eae1fff7141cd14c239c16d1d69925d08d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56367 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-03-13 21:50:21 +00:00
Kyle Roarty	78451f6685	gpu-compute: Fix register checking and allocation in dyn manager This patch updates the canAllocate function to account both for the number of regions of registers that need to be allocated, and for the fact that the registers aren't one continuous chunk. The patch also consolidates the registers as much as possible when a register chunk is freed. This prevents fragmentation from making it impossible to allocate enough registers Change-Id: Ic95cfe614d247add475f7139d3703991042f8149 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56909 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>	2022-02-18 18:46:33 +00:00
Kyle Roarty	4d2cbefd1e	gpu-compute: Set scratch_base, lds_base for gfx902 When updating how scratch_base and lds_base were set, gfx902 was left out. This adds in gfx902 to the case statement, allowing the apertures to be set and for simulations using gfx902 to not error out Change-Id: I0e1adbdf63f7c129186fb835e30adac9cd4b72d0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/54663 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-02-17 21:20:16 +00:00
Matthew Poremba	9313294efe	misc: Remove AMD license addition Remove the line "For use for simulation and test purposes only" in files were AMD is the only copyright holder listed in the header. This happens to be the case for all files where this line exists, removing it completely from gem5. Change-Id: I623f266b002f564301b28774f49081099cfc60fd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53943 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-12-11 04:00:56 +00:00
Matthew Poremba	c028af111a	arch-gcn3,gpu-compute: Move TLB to common folder in amdgpu This TLB is more of an "APU" TLB than anything GCN3 specific. It can be used with either GCN3 or Vega. With this change, VEGA_X86 builds and one can run binaries with Vega ISA code using the same steps as GCN3 but building the Vega ISA instead. Change-Id: I0c92bcd0379a18628dc05cb5af070bdc7e692c7c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53803 Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-12-09 17:26:15 +00:00
Gabe Black	1c233ee9d2	scons: Add sim_object and enums arguments to SimObject(). This will explicitly declare what SimObject and Enum types need to be set up in C++, which will make importing all the SimObject modules during the setup phase of SCons uneccessary. Change-Id: Id2d7603daf33b236ceaa0789e2f089f589d34e62 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49406 Reviewed-by: Gabe Black <gabe.black@gmail.com> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-12-08 08:01:23 +00:00
Matthew Poremba	9cabfd7a9b	dev-hsa,gpu-compute: Properly assign DmaVirtDevices in py These SimObjects are DmaVirtDevices in C++ but DmaDevices in the sim object's python file. Make the sim object python files consistent. Change-Id: I728ae737c5901e448628fc5ac877f261ca4c4393 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53704 Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu> Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-12-07 20:26:17 +00:00
Gabe Black	6107dd11c6	misc: Remove include of arch/page_size.hh, and fix up includes. Remove the only remaining use of arch/page_size.hh, and fix up a couple files which were using one of the constants defined in a specific arch version of it without including the file they needed directly. Change-Id: I6da5638ca10c788bd42197f4f5180e6b66f7b87f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50765 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>	2021-10-22 21:43:02 +00:00
Gabe Black	07c613ff5e	dev,gpu-compute: Use a TranslationGen in DmaVirtDevice. Use a TranslationGen to iterate over the translations for a region, rather than using a ChunkGenerator with a fixed page size the device needs to know. Change-Id: I5da565232bd5282074ef279ca74e556daeffef70 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50763 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Matthew Poremba <matthew.poremba@amd.com>	2021-10-22 21:43:02 +00:00
Matt Sinclair	96a86780ee	dev-hsa,gpu-compute: fix bug with gfx8 VAs for HSA Queues GFX7 (not supported in gem5) and GFX8 have a bug with how virtual addresses are calculated for their HSA queues. The ROCr component of ROCm solves this problem by doubling the HSA queue size that is requested, then mapping all virtual addresses in the second half of the queue to the same virtual addresses as the first half of the queue. This commit fixes gem5's support to mimic this behavior. Note that this change does not affect Vega's HSA queue support, because according to the ROCm documentation, Vega does not have the same problem as GCN3. Change-Id: I133cf1acc3a00a0baded0c4c3c2a25f39effdb51 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51371 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>	2021-10-12 17:01:52 +00:00
Matthew Poremba	3112a7f0d0	arch-gcn3,gpu-compute: Move GCN3 specific TLB to arch Move GpuTLB and TLBCoalescer to GCN3 as the TLB format is specific to GCN3 and SE mode / APU simulation. Vega will have its own TLB, coalescer, and walker suitable for a dGPU. This also adds a using alias for the TLB translation state to reduce the number of references to TheISA and X86ISA. X86 specific includes are also removed. Change-Id: I34448bb4e5ddb9980b34a55bc717bbcea0e03db5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49847 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-10-04 23:47:03 +00:00
Matthew Poremba	c15e472199	arch-vega: Rework flat instructions to support global Global instructions are new in Vega and are essentially FLAT instructions from GCN3 but guaranteed to go to global memory where as flat can go to global or local memory. This reworks the flat instruction classes so that the initiateAcc / execute / completeAcc logic can be reused for flat, global, and later scratch subtypes of flat instructions. The decoder creates a flat instruction class which sets instruction flags based on the flat instruction's SEG field. There are new initOperandInfo and generateDissasmbly methods for flat and global. The number of operands and operand index getters are modified to check the flags and return the correct value for the subtype. Change-Id: I1db4a3742aeec62424189e54c38c59d6b1a8d3c1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47106 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Kyle Roarty <kyleroarty1716@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-10-04 22:51:37 +00:00
Matthew Poremba	16de253c15	arch-vega: Add missing functions referenced by insts Some instructions were referencing pc() and isExecMaskRegister() which were not defined. Change-Id: Ic5b3fa9057950ff85603fcb87447a81b6c7f274b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47103 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2021-09-27 22:30:30 +00:00
Gabe Black	bec16fbc31	misc: Move MemPool based calls to the SEWorkload. These currently proxy to the System object, but this is one step towards moving the MemPool-s out of the System and into the SEWorkload where they really should have been from the start. Change-Id: Id27e7b874c283abf07bd892c8467a9cc52e2fdff Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50342 Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-09-21 02:05:32 +00:00
Gabe Black	00187b7bc3	x86,mem: Replace the x86 StoreCheck flag with READ_MODIFY_WRITE. X86 had a private/arch specific request flag called StoreCheck which it used to signal to the TLB that it should fault on a load if it would have faulted had it been a store. That way, you can detect whether a read-modify-write type of operation is going to fail due to a translation problem during the read, and don't have to worry about not doing anything architecturally visible until the store had succeeded, while also making sure not to do the store part if the modify part could fail. It seems that Ruby had hijacked that flag and had an architecture specific check which was looking for a load which was going to be followed by a store. The x86 flag was never intended to communicate that beyond the TLB, and this nominally architecture agnostic component shouldn't be reaching into the ISA specific flags to try to get that information. Instead, this change introduces a new Request flag called READ_MODIFY_WRITE which is used for the same purpose in x86, but in general means that a load will be followed by a write in the near future. With this new globally applicable flag, the ruby Sequencer class no longer needs to check what the arch is, nor does it need to access ISA private data in the request flags. Always doing this check should be no less efficient than before, because checking the arch involved calling into the system object, while checking the flag only requires masking a bit on the flags which the compiler probably already has floating around for other logic in this function. Change-Id: Ied5b744d31e7aa8bf25e399b6b321f9d2020a92f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48710 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Maintainer: Gabe Black <gabe.black@gmail.com>	2021-09-05 05:29:27 +00:00
Gabe Black	92288331c2	gpu-compute: Delete code related to X86PagetableWalker in X86GPUTLB.py. This code will never be executed since FULL_SYSTEM is not part of the build environment (and hasn't been for many years), and on top of that, this declaration redundantly (and incompletely) tries to set up the X86PagetableWalker that the ISA already sets up. Change-Id: I40cffbd7f60c1f741b1a14d9009f80185c9ce28c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49405 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com>	2021-08-20 08:19:57 +00:00
Jason Lowe-Power	eb24bca44e	Merge "misc: Merge branch 'release-staging-v21-1' into develop" into develop	2021-07-30 04:44:09 +00:00
Matt Sinclair	97760cb5a3	gpu-compute: fix typo in compute driver comments Change-Id: I550c6c81ffb2ee9143a2676f93385a8b90c4ddd5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48023 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-07-29 20:00:20 +00:00

1 2 3 4 5

250 Commits