derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Kyle Roarty	ec6b325382	gpu-compute, dev-hsa: Remove HSADriver, HSADevice HSADriver/HSADevice were primarily used with GPUCommandProcessor/ GPUComputeDriver. This change merges the classes together to simplify the inheritance hierarchy, as well as removing any casting. Change-Id: I670eb9b49a16c8aba17e13fd1d1287d0621c9f48 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42219 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2021-04-24 15:54:15 +00:00
Kyle Roarty	eb09361eef	configs, gpu-compute: Add option to specify gfx version Currently uses gfx801, gfx803, gfx900 for Carrizo, Fiji, and Vega respectively Change-Id: I62758914b6a60f16dd4f2141a23c0a9141a4e1a0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42217 Maintainer: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-04-24 15:54:15 +00:00
Michael LeBeane	ad43083bb3	gpu-compute: Implement per-request MTYPEs GPU MTYPE is currently set using a global config passed to the PACoalescer. This patch enables MTYPE to be set by the shader on a per-request bases. In real hardware, the MTYPE is extracted from a GPUVM PTE during address translation. However, our current simulator only models x86 page tables which do not have the appropriate bits for GPU MTYPES. Rather than hacking non-x86 bits into our x86 page table models, this patch instead keeps an interval tree of all pages that request custom MTYPES in the driver itself. This is currently only used to map host pages to the GPU as uncacheable, but is easily extensible to other MTYPES. Change-Id: I7daab0ffae42084b9131a67c85cd0aa4bbbfc8d6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42216 Maintainer: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-04-24 15:54:15 +00:00
Michael LeBeane	a5f55e0be1	gpu-compute: Topology and driver changes for dGPU New topology ripped from Fiji to support dGPU. A dGPU flag is added to the config which is propogated to the driver. The emulated driver is now able to properly deal with dGPU ioctls and mmaps. For now, dGPU physical memory is allocated from the host, but this is easy to change once we get a GPU memory controller up and running. Change-Id: I594418482b12ec8fb2e4018d8d0371d56f4f51c8 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42214 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2021-04-15 16:41:11 +00:00
gauravjain14	c29523665e	gpu-compute: Support for dynamic register alloc SimplePoolManager doesn't allow mapping of two WGs simultaneously on the same Compute Unit (provided the previous WG has been mapped to all the SIMDs) even if there is sufficient VRF and SRF space available. DynPoolManager takes care of that by dynamically allocating and deallocating register file space to wavefronts Change-Id: I2255c68d4b421615d7b231edc05d3ebb27cbd66c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32034 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>	2021-01-14 17:04:27 +00:00
Shivani Parekh	392c1ced53	misc: Replaced master/slave terminology Change-Id: I4df2557c71e38cc4e3a485b0e590e85eb45de8b6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33553 Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-10 23:02:28 +00:00
Emily Brickey	6333e914d3	gpu-compute: update port terminology Change-Id: I3121c4afb1e137aebe09c1d694e9484844d02b9b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32313 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Poremba <chesp3@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-26 16:48:13 +00:00
Kyle Roarty	b872f02ab1	configs,gpu-compute,mem-ruby: connect gmTokenPorts in apu_se This patch adds gmTokenPorts to the ComputeUnit and RubyGPUCoalescer python classes so the gmTokenPorts can be connected in apu_se. Change-Id: Icf3cb05c757754d6935b46f14e4b1b1d5072c4ca Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32677 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-18 23:47:16 +00:00
Tony Gutierrez	af621cd6e6	gpu-compute, arch-gcn3: refactor barriers Barriers were not modeled properly. Firstly, barriers were allocated to each WG that was launched, which is not correct, and the CU would provide an infinite number of barrier slots. There are a limited number of barrier slots per CU in reality. In addition, the CU will not allocate barrier slots to WGs with a single WF (nothing to sync if only one WF). Beyond modeling problems, there also the issue of deadlock. The barrier could deadlock because not all WFs are freed from the barrier once it has been satisfied. Instead, we relied on the scoreboard stage to release them lazily, one-by-one. Under this implementation the scoreboard may not fully release all WFs participating in a barrier; this happens because the first WF to be freed from the barrier could reach an s_barrier instruction again, forever causing the barrier counts across WFs to be out-of-sync. This change refactors the barrier logic to: 1) Create a proper barrier slot implementation 2) Enforce (via a parameter) the number of barrier slots on the CU. 3) Simplify the logic and cleanup the code (i.e., we no longer iterate through the entire WF list each time we check if a barrier is satisfied). 4) Fix deadlock issues. Change-Id: If53955b54931886baaae322640a7b9da7a1595e0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29943 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-16 20:37:22 +00:00
Xianwei Zhang	2c1e9c4e81	gpu-compute: enable flexible control of kernel boundary syncs Kernel end release was turned on for VIPER protocol, which is in fact write-through based and thus no need to have release operation. This changeset splits the option 'impl_kern_boundary_sync' into 'impl_kern_launch_acq' and 'impl_kern_end_rel', and turns off release on VIPER. Change-Id: I5490019b6765a25bd801cc78fb7445b90eb02a3d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29917 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-19 20:40:05 +00:00
Tony Gutierrez	b8da9abba7	gpu-compute, mem-ruby, configs: Add GCN3 ISA support to GPU model Change-Id: Ibe46970f3ba25d62ca2ade5cbc2054ad746b2254 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29912 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-15 22:45:17 +00:00
Matthew Poremba	3d57eaf9f5	gpu-compute,mem-ruby: Refactor GPU coalescer Remove the read/write tables and coalescing table and introduce a two levels of tables for uncoalesced and coalesced packets. Tokens are granted to GPU instructions to place in uncoalesced table. If tokens are available, the operation always succeeds such that the 'Aliased' status is never returned. Coalesced accesses are placed in the coalesced table while requests are outstanding. Requests to the same address are added as targets to the table similar to how MSHRs operate. Change-Id: I44983610307b638a97472db3576d0a30df2de600 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27429 Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Bradford Beckmann <brad.beckmann@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-05-11 21:25:19 +00:00
Gabe Black	cdcc55a6a8	mem: Minimize the use of MemObject. MemObject doesn't provide anything beyond its base ClockedObject any more, so this change removes it from most inheritance hierarchies. Occasionally MemObject is replaced with SimObject when I was fairly confident that the extra functionality of ClockedObject wasn't needed. Change-Id: Ic014ab61e56402e62548e8c831eb16e26523fdce Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18289 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Gabe Black <gabeblack@google.com>	2019-04-28 01:19:40 +00:00
Andreas Sandberg	ef71a987c1	python: Don't assume SimObjects live in the global namespace The importer in Python 3 doesn't like the way we import SimObjects from the global namespace. Convert the existing SimObject declarations to import from m5.objects. As a side-effect, this makes these files consistent with configuration files. Change-Id: I11153502b430822130722839e1fa767b82a027aa Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/15981 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>	2019-02-12 09:43:00 +00:00
Tony Gutierrez	de72e36619	gpu-compute: support in-order data delivery in GM pipe this patch adds an ordered response buffer to the GM pipeline to ensure in-order data delivery. the buffer is implemented as a stl ordered map, which sorts the request in program order by using their sequence ID. when requests return to the GM pipeline they are marked as done. only the oldest request may be serviced from the ordered buffer, and only if is marked as done. the FIFO response buffers are kept and used in OoO delivery mode	2016-10-26 22:48:28 -04:00
Tony Gutierrez	7ac38849ab	gpu-compute: remove inst enums and use bit flag for attributes this patch removes the GPUStaticInst enums that were defined in GPU.py. instead, a simple set of attribute flags that can be set in the base instruction class are used. this will help unify the attributes of HSAIL and machine ISA instructions within the model itself. because the static instrution now carries the attributes, a GPUDynInst must carry a pointer to a valid GPUStaticInst so a new static kernel launch instruction is added, which carries the attributes needed to perform a the kernel launch.	2016-10-26 22:47:11 -04:00
jkalamat	3724fb15fa	gpu-compute: parametrize Wavefront size Eliminate the VSZ constant that defined the Wavefront size (in numbers of work items); replaced it with a parameter in the GPU.py configuration script. Changed all data structures dependent on the Wavefront size to be dynamically sized. Legal values of Wavefront size are 16, 32, 64 for now and checked at initialization time.	2016-06-09 11:24:55 -04:00
Tony Gutierrez	1a7d3f9fcb	gpu-compute: AMD's baseline GPU model	2016-01-19 14:28:22 -05:00

18 Commits