derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Kyle Roarty	0bb385941b	gpu-compute: Add exp_cnt tracking for buffer store instructions exp_cnt (expInstsIssued in the code) is used in the waitcnt instruction to track that data has been read out of VGPRs in previous global memory instructions, making it safe to overwrite the VGPRs used in said global memory instructions. Previously, exp_cnt wasn't being tracked at all, which lead to the waitcnt finishing immediately, leading to the memory instruction's VPGRs getting overwritten by subsequent instructions, causing errors. This patch makes it so waitcnts waiting on exp_cnt will wait for MUBUF buffer store instructions to read their VGPRs before completing Change-Id: Idd2b59511bc086cf316217da27b7a228272b0b0f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37555 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-11-30 20:59:31 +00:00
Daniel Gerzhoy	9a01d3e927	dev-hsa,gpu-compute: Agent Packet handler implemented. HSA packet processor will now accept and process agent packets. Type field in packet is command type. For now: AgentCmd::Nop = 0 AgentCmd::Steal = 1 Steal command steals the completion signal for a running kernel. This enables a benchmark to use hsa primitives to send an agent packet to steal the signal, then wait on that signal. Minimal working example to be added in gem5-resources. Change-Id: I37f8a4b7ea1780b471559aecbf4af1050353b0b1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/37015 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-11-16 16:12:48 +00:00
Tuan Ta	173c1c6eb0	gpu-compute,mem-ruby: Replace ACQUIRE and RELEASE request flags This patch replaces ACQUIRE and RELEASE flags which are HSA-specific. ACQUIRE flag becomes INV_L1 in VIPER protocol. RELEASE flag is removed. Future protocols may support extra cache coherence flags like INV_L2 and WB_L2. Change-Id: I3d60c9d3625c898f4110a12d81742b6822728533 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32859 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-11-04 21:09:26 +00:00
Gabe Black	d05a0a4ea1	misc: Delete the now unnecessary create methods. Most create() methods are no longer necessary. This change deletes them, and occasionally moves some code from them into the constructors they call. Change-Id: Icbab29ba280144b892f9b12fac9e29a0839477e5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/36536 Reviewed-by: Gabe Black <gabe.black@gmail.com> Maintainer: Gabe Black <gabe.black@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-30 04:00:20 +00:00
Gabe Black	3a49ed0156	gpu: Use X86ISA instead of TheISA in src/gpu-compute. These files are nominally not tied to the X86ISA, but in reality they are because they reach into the GPU TLB, which is defined unchangeably in the X86ISA namespaces, and uses data structures within it. Rather than try to pretend that these structures are generic, we'll instead just use X86ISA instead of TheISA. If this really does become generic in the future, a base class with the ISA agnostic essentials defined in it can be used instead, and the ISA specific TLBs can defined their own derived class which has whatever else they need. Really the compute unit shouldn't be communicating with the TLB using sender state since those are supposed to be little notes for the sender to keep with a transaction, not for communicating between entities across a port. Change-Id: Ie6573396f6c77a9a02194f5f4595eefa45d6d66b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34174 Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-26 20:32:43 +00:00
Gabe Black	463cb28ca5	misc: Use compiler.hh macros when available. Some places were hand coding __attribute__s when macros in compiler.hh were available to do that job. Using the macros helps abstract away compiler specific details and should be used when possible. Change-Id: I94befebcfde2d673e874e9959588f69781bd9021 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35975 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-19 05:52:40 +00:00
Kyle Roarty	b20cc7e6d8	gpu-compute,mem-ruby: Properly create/handle WriteCompletePkts There is a flow of packets as so: WriteResp -> WriteReq -> WriteCompleteResp These packets share some variables, in particular senderState and a status vector. One issue was the WriteResp packet decremented the status vector, which was used by the WriteCompleteResp packets to determine when to handle the global memory response. This could lead to multiple WriteCompleteResp packets attempting to handle the global memory response. Because of that, the WriteCompleteResp packets needed to handle the status vector. this patch moves WriteCompleteResp packet handling back into ComputeUnit::DataPort::processMemRespEvent from ComputeUnit::DataPort::recvTimingResp. This helps remove some redundant code. This patch has the WriteResp packet return without doing any status vector handling, and without deleting the senderState, which had previously caused a segfault. Another issue was WriteCompleteResp packets weren't being issued for each active lane, as the coalesced request was being issued too early. In order to fix that, we have to ensure every active lane puts their request into their applicable coalesced request before issuing the coalesced request. Because of that change, we change the issuing of CoalescedRequests from GPUCoalescer::coalescePacket to GPUCoalescer::completeIssue. That change involves adding a new variable to store the CoalescedRequests that are created in the calls to coalescePacket. This variable is a map from instruction sequence number to coalesced requests. Additionally, the WriteCompleteResp packet was attempting to access physical memory in hitCallback while not having any data, which caused a crash. This can be resolved either by not allowing WriteCompleteResp packets to access memory, or by copying the data from the WriteReq packet. This patch denies WriteCompleteResp packets memory access in hitCallback. Finally, in VIPERCoalescer::writeCompleteCallback there was a map that held the WriteComplete packets, but no packets were ever being removed. This patch removes packets that match the address that was passed in to the function. Change-Id: I9a064a0def2bf6c513f5295596c56b1b652b0ca4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33656 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-15 17:52:51 +00:00
Gabe Black	91d83cc8a1	misc: Standardize the way create() constructs SimObjects. The create() method on Params structs usually instantiate SimObjects using a constructor which takes the Params struct as a parameter somehow. There has been a lot of needless variation in how that was done, making it annoying to pass Params down to base classes. Some of the different forms were: const Params & Params & Params * const Params * Params const* This change goes through and fixes up every constructor and every create() method to use the const Params & form. We use a reference because the Params struct should never be null. We use const because neither the create method nor the consuming object should modify the record of the parameters as they came in from the config. That would make consuming them not idempotent, and make it impossible to tell what the actual simulation configuration was since it would change from any user visible form (config script, config.ini, dot pdf output). Change-Id: I77453cba52fdcfd5f4eec92dfb0bddb5a9945f31 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35938 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-10-14 12:06:44 +00:00
Matthew Poremba	53807c8276	configs,gpu-compute: Fixes to connect gmTokenPort When the TokenPort was moved from the GCN3 staging branch to develop the TokenPort was changed from being the port connecting the ComputeUnit to Ruby's vector memory port to a sideband port which inhibits requests to Ruby's vector memory port. As such, it needs to be explicitly connected as a new port. This changes the getPort method in ComputeUnit to be aware of the port as well as modifying the example config to connect to TCPs. The iteration to connect in the config file was modified since it was not properly connecting to TCPs each time and Ruby.py does not explicitly return a list of each MachineType. Change-Id: Ia70a6756b2af54d95e94d19bec5d8aadd3c2d5c0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35096 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-30 20:19:21 +00:00
Gabe Black	b877efa6d4	misc: Update attribute syntax, and reorganize compiler.hh. This change replaces the __attribute__ syntax with the now standard [[]] syntax. It also reorganizes compiler.hh so that all special macros have some explanatory text saying what they do, and each attribute which has a standard version can use that if available and what version of c++ it's standard in is put in a comment. Also, the requirements as far as where you put [[]] style attributes are a little more strict than the old school __attribute__ style. The use of the attribute macros was updated to fit these new, more strict requirements. Change-Id: Iace44306a534111f1c38b9856dc9e88cd9b49d2a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35219 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-28 21:52:59 +00:00
Gabe Black	50a0b85367	arm,base,gpu: Use std::make_unique instead of m5::make_unique. Now that we're using c++14, we can just assume that std::make_unique exists. We no longer have to conditionally inject our own version. Change-Id: I5d851afb02dd05c7af93864ffec3b3184f3d4ec8 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35215 Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-28 05:41:08 +00:00
Kyle Roarty	347d7644eb	gpu-compute: replace uint32_t* casts with bits API calls The uint32_t* casting was challenging to fully understand what was being done at a glance. Replaced with calls to various bits functions as it's functionally equivalent and much more clear. This also fixes a segfault in GPUInitAbi DPRINTFs from a mis-typed uint32_t* cast. Change-Id: Id5d1863942848dd7a9e5e17e8180c33adbc72f15 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34677 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-24 14:53:16 +00:00
Gabe Black	24e87cb1c5	gpu: Stop using TheISA in the GPU TLB. This class is defined inside the X86ISA namespace, so there's no point in pretending it's generic. Remove TheISA and let the code access what it needs from X86ISA naturally since it's there already. Change-Id: I21b5d2d2b9af6aa0c10ddbb5b3ddca1692188dcc Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34173 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2020-09-18 13:48:45 +00:00
Kyle Roarty	be3bcd1629	gpu-compute: Fix deadlock in fetch_unit after branch instruction The following deadlock was occuring in fetch_unit w/timingSim: 1. exec() is called, a wave is ready to fetch, so it sets pendingFetch 2. A packet is sent to ITLB to fetch for that wave 3. The wave executes a branch, causing the fetch buffer to be cleared 4. The packet is handled, and fetch() is called. However, because the fetch buffer was cleared, it returns doing nothing. 5. exec() gets called again, but the wave will never be scheduled to fetch, as pendingFetch is still set to true. This patch clears pendingFetch (and dropFetch) before returning in fetch() when the fetch buffer has been cleared. dropFetch needed to be cleared otherwise gem5 would crash. Change-Id: Iccbac7defc4849c19e8b17aa2492da641defb772 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34555 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-17 21:24:19 +00:00
Gabe Black	49a41da964	gpu: Fix a syntax error in X86GPUTLB.py. The recent changes which removed master/slave terminology also accidentally deleted an "=", making the syntax in that file illegal. Change-Id: I50aa945f0f66765db36775380b98a88caff23c13 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34576 Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-16 06:08:14 +00:00
Shivani Parekh	392c1ced53	misc: Replaced master/slave terminology Change-Id: I4df2557c71e38cc4e3a485b0e590e85eb45de8b6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33553 Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2020-09-10 23:02:28 +00:00
Kyle Roarty	b00b986353	misc: Use VPtr in hsa_driver.cc This change updates HSADriver::allocateQueue to take in a ThreadContext pointer as opposed to a PortProxy ref. This allows the TypedBufferArg to be replaced with VPtr. This also fixes building GCN3_X86 Change-Id: I1fea26b10c7344daf54a0cb05337e961f834a5fd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33655 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-31 17:44:11 +00:00
Gabe Black	1d755b4ba1	misc: Clean up usage of arch/isa_traits.hh. isa_traits.hh used to have much more in it, but now it only has PageShift, PageBytes, and (for now) the guest endianness. These values should only be retrieved from the System class generally speaking, so only the system class should include arch/isa_traits.hh. Some gpu compute related files need PageBytes or PageShift. Even though those files don't advertise their ISA dependence, they are tied to x86. In those files, they can include arch/x86/isa_traits.hh. The only other file which legitimately needs arch/isa_traits.hh is the decoder cache since it uses PageBytes to size an array. Change-Id: I12686368715623e3140a68a7027c136bd52567b1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33203 Reviewed-by: Gabe Black <gabeblack@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-28 07:20:58 +00:00
Tony Gutierrez	94000aefe6	gpu-compute: Create CU's ports in the standard way The CU would initialize its ports in getMasterPort(), which is not desirable as getMasterPort() may be called several times for the same port. This can lead to a fatal if the CU expects to only create a single port of a given type, and may lead to other issues where stat names are duplicated. This change instantiates and initializes the CU's ports in the CU constructor using the CU params. The index field is also removed from the CU's ports because the base class already has an ID field, which will be set to the default value in the base class's constructor for scalar ports. It doesn't make sense for scalar port's to take an index because they are scalar, so we let the base class initialize the ID to the invalid port ID. Change-Id: Id18386f5f53800a6447d968380676d8fd9bac9df Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32836 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-27 16:31:46 +00:00
Emily Brickey	6333e914d3	gpu-compute: update port terminology Change-Id: I3121c4afb1e137aebe09c1d694e9484844d02b9b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32313 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Poremba <chesp3@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-26 16:48:13 +00:00
Kyle Roarty	b872f02ab1	configs,gpu-compute,mem-ruby: connect gmTokenPorts in apu_se This patch adds gmTokenPorts to the ComputeUnit and RubyGPUCoalescer python classes so the gmTokenPorts can be connected in apu_se. Change-Id: Icf3cb05c757754d6935b46f14e4b1b1d5072c4ca Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32677 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-18 23:47:16 +00:00
Gabe Black	40e8cac306	misc: Make registerExitCallback use CallbackQueue2. Issue-on: https://gem5.atlassian.net/browse/GEM5-698 Change-Id: I526d4a19ca4e54a6469a4ee26693c1c0400fcc70 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32644 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Gabe Black <gabeblack@google.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-08-18 11:49:06 +00:00
Matthew Poremba	9b95f32b12	arch-gcn3,gpu-compute: Fix GCN3 related compiler errors Fix all errors that were revealed using the util/compiler-test.sh script. Change-Id: Ie0d35568624e5e1405143593f0677bbd0b066b61 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31154 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-20 14:53:13 +00:00
Tony Gutierrez	4d737462c2	gpu-compute, arch-gcn3: Change how waitcnts are implemented Use single counters per memory operation type and increment them upon issue, not execute. Change-Id: I6afc0b66b21882538ef90a14a57a3ab3cc7bd6f3 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29973 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-17 16:36:23 +00:00
Tony Gutierrez	63c76448eb	gpu-compute: Add pipeline stage interface classes This change separates the pipeline stage interfaces for the GPU's compute unit into their own classes with a well-defined interface. This helps to create a cleaner interface for users to extend the CU pipeline's capabilities and also helps consolidate all the pipeline communication code in one place in the source. Change-Id: I569d52bce84dc1b9fbf8f0f96d53a81a2b6773c6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29972 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-17 16:36:09 +00:00
Alexandru Dutu	7d50d5d972	gpu-compute: No RF scheduling in case of SKIP or EMPTY In case of flat memory instructions the status for the LM pipe execution unit is set to SKIP or EMPTY, as the bus between the VRF and the GM and LM pipe is shared. The destination operands should not be scheduled for the LM pipe, event if the wave is in the dispatch list. This can lead to deadlock in the destination cache as DCEs are reused and the slotsAvailableForBank count gets artificially incremented. Change-Id: I2230c53e3bc1032d2cccbe00fab62c99ab8de6cd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29970 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-17 16:34:59 +00:00
Tony Gutierrez	5f0378b8d0	gpu-compute: Use refs to CU in pipe stages/mem pipes The pipe stages and memory pipes are changed to store a reference to their parent CU as opposed to a pointer. These objects will never change which CU they belong to, and they are constructed by their parent CU. Change-Id: Ie5476e1e2e124a024c2efebceb28cb3a9baa78c1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29969 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-17 16:34:36 +00:00
Michael LeBeane	83fe4754e7	gpu-compute: Fix Y-dimension ABI decode We currently have a bug in decoding workitem ID from the kernel descriptor with multiple dimensions. The enable_vgpr_workitem_id bits are currently seperated into x and y components, when they should be treated as a single 2 bit value, where y is enabled when it is > 0, and z is enabled when it is > 1. The current setup allows a kernel launch with vgprs reserved for the z dimension and not the y dimension, which is incorrect. Change-Id: Iee64b207feb95bcf064898d5db33b8f201e25323 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29965 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-17 16:32:56 +00:00
Tony Gutierrez	f64ff89212	gpu-compute: Don't track vector store insts in CU's headTailMap This change fixes a memory leak due to live GPUDynInstPtr references to vector store insts being stored in the CU's headTailMap and never released. This happened because store insts are not supposed to have their head-tail latencies tracked by the headTailMap; instead they use timing information from the GPUCoalescer. When updating the headTailLatency stat via the headTailMap, only loads were considered and removed from the headTailMap, however when inserting into the headTailMap loads and stores were considered, thus leading to the memory leak. This change fixes the issue by only adding loads to the headTailMap. Change-Id: I8a8f5b79f55e00481ae5e82519a9ed627a7ecbd1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29963 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-17 16:32:06 +00:00
Michael LeBeane	1d816250f8	gpu_compute: Support loading BLIT kernels The BLIT kernels used to implement DMA through the shaders don't fill out all of the standard fields in an amd_kernel_code_t object. This patch modifies the code object parsing logic to support these new kernels. BLIT kernels are used in APUs when using ROCm memcopies for certain size buffers, and are used for dGPUs when the SDMA engines are disabled. Change-Id: Id4e667474d05e311097dbec443def07dfad14a79 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29959 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-17 16:13:59 +00:00
Matt Sinclair	13079629a1	arch-gcn3: convert vALU instruction counters from 32 to 64-bit The vALU instruction counters were previously 32 bits, but for some workloads this value wraps around and triggers an assert failure because the max vALU operations are reached. To resolve this, this commit increases the counter size to 64 bits. Change-Id: I90ed4514669485cfea7ccc37ba9d69665277bccb Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29950 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-16 20:37:22 +00:00
Tony Gutierrez	0c5d671ea1	gpu-compute: Init CU object for pipe stages in their ctors This change updates the constructors of the CU's pipe stages/memory pipelines to accept a pointer to their parent CU. Because the CU creates these objects, and can pass a pointer to itself to these object via their constructors, this is the safer way to initalize these classes. Change-Id: I0b3732ce7c03781ee15332dac7a21c097ad387a4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29945 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-16 20:37:22 +00:00
Tony Gutierrez	af621cd6e6	gpu-compute, arch-gcn3: refactor barriers Barriers were not modeled properly. Firstly, barriers were allocated to each WG that was launched, which is not correct, and the CU would provide an infinite number of barrier slots. There are a limited number of barrier slots per CU in reality. In addition, the CU will not allocate barrier slots to WGs with a single WF (nothing to sync if only one WF). Beyond modeling problems, there also the issue of deadlock. The barrier could deadlock because not all WFs are freed from the barrier once it has been satisfied. Instead, we relied on the scoreboard stage to release them lazily, one-by-one. Under this implementation the scoreboard may not fully release all WFs participating in a barrier; this happens because the first WF to be freed from the barrier could reach an s_barrier instruction again, forever causing the barrier counts across WFs to be out-of-sync. This change refactors the barrier logic to: 1) Create a proper barrier slot implementation 2) Enforce (via a parameter) the number of barrier slots on the CU. 3) Simplify the logic and cleanup the code (i.e., we no longer iterate through the entire WF list each time we check if a barrier is satisfied). 4) Fix deadlock issues. Change-Id: If53955b54931886baaae322640a7b9da7a1595e0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29943 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-07-16 20:37:22 +00:00
Tony Gutierrez	701f026ba5	gpu-compute: Fix LDS out-of-bounds behavior The LDS is capable of handling out-of-bounds accesses, that is, accesses that are outside the bounds of the chunk allocated to a WG. Currently, the simulator asserts on these accesses. This patch changes the behavior of the LDS to return 0 for reads and dropping writes that are out-of-bounds. Change-Id: I5f467d0f52113e8565e1a3029e82fb89cc6f07ea Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29940 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2020-07-16 20:37:22 +00:00
Xianwei Zhang	024f978cff	gpu-compute: enable kernel-end WB functionality Change-Id: Ib17e1d700586d1aa04d408e7b924270f0de82efe Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29938 Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com>	2020-07-13 23:32:37 +00:00
Michael LeBeane	ed7daa10aa	arch-gcn3, gpu-compute: Implement out-of-range accesses Certain buffer out-of-range memory accesses should be special cased and not generate memory accesses. This patch implements those special cases and supresses lanes from accessing memory when the calculated address falls in an ISA-specified out-of-range condition. Change-Id: I8298f861c6b59587789853a01e503ba7d98cb13d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29935 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>	2020-07-13 19:48:00 +00:00
Onur Kayiran	bff8df2288	gpu-compute: Dropping fetchs when no entry is reserved in the buffer This changeset drops fetches if there is no entry reserved in the fetch buffer for that instruction. This can happen due to a fetch attempted to be issued in the same cycle where a branch instruction flushed the fetch buffer, while an ITLB or I-cache request is still pending. Change-Id: I3b80dbd71af27ccf790b543bd5c034bb9b02624a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29932 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Onur Kayıran <onur.kayiran@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>	2020-07-13 19:47:26 +00:00
Tony Gutierrez	bbab876c32	gpu-compute: Make headTailMap a std::unordered_map There is no reason that the headTailMap needs to be sorted, so let's use a std::unordered_map. Change-Id: I18641b893352c18ec86e3775c8947a05a6c6547d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29930 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-22 16:14:35 +00:00
Tony Gutierrez	5c95e6b678	gpu-compute: Remove unused function hostWakeUp from shader Change-Id: Ib4415a7c5918da03bbd16fe9adb4dd593dcaa95c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29929 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-22 16:14:35 +00:00
Tony Gutierrez	ccee639904	arch-gcn3, gpu-compute: Fix issue when reading const operands Currently, when an instruction has an operand that reads a const value, it goes thru the same readMiscReg() api call as other misc registers (real HW registers, not constant values). There is an issue, however, when casting from the const values (which are 32b) to higher precision values, like 64b. This change creates a separate, templated function call to the GPU's ISA state that will return the correct type. Change-Id: I41965ebeeed20bb70e919fce5ad94d957b3af802 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29927 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-22 16:14:35 +00:00
Matt Sinclair	8177fc4392	arch-gcn3: add support for unaligned accesses Previously, with HSAIL, we were guaranteed by the HSA specification that the GPU will never issue unaligned accesses. However, now that we are directly running GCN this is no longer true. Accordingly, this commit adds support for unaligned accesses. Moreover, to reduce the replication of nearly identical code for the different request types, I also added new helper functions that are called by all the different memory request producing instruction types in op_encodings.hh. Adding support for unaligned instructions requires changing the statusBitVector used to track the status of the memory requests for each lane from a bit per lane to an int per lane. This is necessary because an unaligned access may span multiple cache lines. In the worst case, each lane may span multiple cache lines. There are corresponding changes in the files that use the statusBitVector. Change-Id: I319bf2f0f644083e98ca546d2bfe68cf87a5f967 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29920 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-19 20:41:18 +00:00
Xianwei Zhang	2c1e9c4e81	gpu-compute: enable flexible control of kernel boundary syncs Kernel end release was turned on for VIPER protocol, which is in fact write-through based and thus no need to have release operation. This changeset splits the option 'impl_kern_boundary_sync' into 'impl_kern_launch_acq' and 'impl_kern_end_rel', and turns off release on VIPER. Change-Id: I5490019b6765a25bd801cc78fb7445b90eb02a3d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29917 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-19 20:40:05 +00:00
Matthew Poremba	eb9efdaa44	gpu-compute: remove recvToken from GM pipe exec Tokens were previously acquired in GM pipe exec but has been moved to acqCoalescerToken. This removes the extraneous code which was acquiring tokens twice, causing them to be depleted and triggering an assertion. Change-Id: Ic92de8f06cc85828b29c69790bdadde057ef1777 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29916 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-19 20:35:11 +00:00
Tony Gutierrez	9d51dec937	arch, gpu-compute: Remove HSAIL related files Change-Id: Iefba0a38d62da7598bbfe3fe6ff46454d35144b1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28410 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-17 02:53:47 +00:00
Tony Gutierrez	b8da9abba7	gpu-compute, mem-ruby, configs: Add GCN3 ISA support to GPU model Change-Id: Ibe46970f3ba25d62ca2ade5cbc2054ad746b2254 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29912 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-06-15 22:45:17 +00:00
Bobby R. Bruce	e53de444f6	misc: Merge branch 'release-staging-v20.0.0.0' into develop	2020-05-28 01:04:16 -07:00
Bobby R. Bruce	a8fb7a0c1d	gpu-compute,misc: Removed unused 'vaddr' capture Clang compilers return a `error: lambda capture 'vaddr' is not used` error when compiling HSAIL_X86/gem5.opt. This unused lambda capture has therefore been removed. Change-Id: I2a7c58174a9ef83435099ab4daf84c762f017dd4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29533 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>	2020-05-28 04:48:54 +00:00
Matthew Poremba	3d57eaf9f5	gpu-compute,mem-ruby: Refactor GPU coalescer Remove the read/write tables and coalescing table and introduce a two levels of tables for uncoalesced and coalesced packets. Tokens are granted to GPU instructions to place in uncoalesced table. If tokens are available, the operation always succeeds such that the 'Aliased' status is never returned. Coalesced accesses are placed in the coalesced table while requests are outstanding. Requests to the same address are added as targets to the table similar to how MSHRs operate. Change-Id: I44983610307b638a97472db3576d0a30df2de600 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27429 Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Bradford Beckmann <brad.beckmann@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-05-11 21:25:19 +00:00
Matthew Poremba	64134b6e66	base,arch-hsail: Fix GPU build The GPU build is currently broken due to recent changes. This fixes the build after changes to local access, removal of getSyscallArg, and creating of AMO header in base. Change-Id: I43506f6fb0a92a61a50ecb9efa7ee279ecb21d98 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27136 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Bradford Beckmann <brad.beckmann@amd.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com> Tested-by: Gem5 Cloud Project GCB service account <345032938727@cloudbuild.gserviceaccount.com>	2020-04-03 21:51:57 +00:00
Matthew Poremba	5c2fb0c652	sim-se: Switch to new MemState API Switch over to the new MemState API by specifying memory regions for stack in each ISA, changing brkFunc to use MemState for heap memory, and calling the MemState fixup in fixupStackFault (renamed to just fixupFault). Change-Id: Ie3559a68ce476daedf1a3f28b168a8fbc7face5e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/25366 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2020-03-25 19:18:15 +00:00

1 2 3

127 Commits