derek/gem5 - gem5 - Gitea: Git with a cup of tea

derek/gem5

Author	SHA1	Message	Date
Matthew Poremba	a03319bef7	arch-vega: Fix output warnings, gem5.fast (#1023 ) Fix gem5.fast build not building when using gpu model. Removes very spammy stat distribution bucket size prints when running gpu model.	2024-04-15 13:18:27 -07:00
Matthew Poremba	01f2df4b8a	gpu-compute: Fix stat bucket sizes Change-Id: If30505515867a866c631cb117d3d22e19814a2f2	2024-04-13 15:51:41 -07:00
Matthew Poremba	1d64669473	mem,gpu-compute: Implement GPU TCC directed invalidate The GPU device currently supports large BAR which means that the driver can write directly to GPU memory over the PCI bus without using SDMA or PM4 packets. The gem5 PCI interface only provides an atomic interface for BAR reads/writes, which means the values cannot go through timing mode Ruby caches. This causes bugs as the TCC cache is allowed to keep clean data between kernels for performance reasons. If there is a BAR write directly to memory bypassing the cache, the value in the cache is stale and must be invalidated. In this commit a TCC invalidate is generated for all writes over PCI that go directly to GPU memory. This will also invalidate TCP along the way if necessary. This currently relies on the driver synchonization which only allows BAR writes in between kernels. Therefore, the cache should only be in I or V state. To handle a race condition between invalidates and launching the next kernel, the invalidates return a response and the GPU command processor will wait for all TCC invalidates to be complete before launching the next kernel. This fixes issues with stale data in nanoGPT and possibly PENNANT. Change-Id: I8e1290f842122682c271e5508a48037055bfbcdf	2024-04-10 11:35:25 -07:00
Matthew Poremba	833392e7b2	mem-ruby,gpu-compute: Allow memory reqs without inst The GPUDynInst for sending memory requests through the CUs data port is required but only used for DPRINTFs. Relax this constraint so that the methods can be reused for requests such as probes generated by the GPU device. Change-Id: I16094e400968225596370b684d6471580888d98a	2024-04-10 11:35:24 -07:00
Michael Boyer	acd9d3ff94	gpu-compute: Add support for skipping GPU kernels (#940 ) gpu-compute: Add support for skipping GPU kernels This commit adds two new command-line options: --skip-until-gpu-kernel N Skips (non-blit) GPU kernels until the target kernel is reached. Execution continues normally from there. Blit kernels are not skipped because they are responsible for copying the kernel code and metadata for the non-blit kernels. Note that skipping kernels can impact correctness; this feature is only useful if the kernel of interest has no data-dependent behavior, or its data-dependent behavior is not based on data generated by the skipped kernels. --exit-after-gpu-kernel N Ends the simulation after completing (non-blit) GPU kernel N. This commit also renames two existing command-line options: --debug-at-gpu-kernel -> --debug-at-gpu-task --exit-at-gpu-kernel -> --exit-at-gpu-task These were renamed because they count GPU tasks, which include both kernels launched by the application as well as blit kernels. Change-Id: If250b3fd2db05c1222e369e9e3f779c4422074bc	2024-03-21 07:46:27 -07:00
Michael Boyer	ba2f5615ba	gpu-compute: Support cache line sizes >64B in GPUFS (#939 ) This change fixes two issues: 1) The --cacheline_size option was setting the system cache line size but not the Ruby cache line size, and the mismatch was causing assertion failures. 2) The submitDispatchPkt() function accesses the kernel object in chunks, with the chunk size equal to the cache line size. For cache line sizes >64B (e.g. 128B), the kernel object is not guaranteed to be aligned to a cache line and it was possible for a chunk to be partially contained in two separate device memories, causing the memory access to fail. Change-Id: I8e45146901943e9c2750d32162c0f35c851e09e1 Co-authored-by: Michael Boyer <Michael.Boyer@amd.com>	2024-03-20 11:09:25 -07:00
Matthew Poremba	8722aef2e2	gpu-compute: Store accum_offset from code object in WF The accumulation offset is needed for some instructions. In order to access this value we need to place it somewhere instruction definitions can access. The most logical place is in the wavefront. This commit simply copies the value from the HSA task to the wavefront object. Change-Id: I44ef62ef32d2421953f096c431dd758e882245b4	2024-02-26 12:54:37 -06:00
Vishnu Ramadas	85680ea58e	gpu-compute: Remove unused and redundant functions In ComputeUnit, a previous commit added a SystemHubEvent event class to the SQCPort. This was found to be unnecessary during the review process and is removed in this commit. Similarly, invBuf() which was added in FetchUnit as part of an earlier commit was found to be redundant. This commit removes it Change-Id: I6ee8d344d29e7bfade49fb9549654b71e3c4b96f	2024-02-09 12:17:24 -06:00
Vishnu Ramadas	690b2b9462	gpu-compute, mem-ruby: Add comments and reformat code Change-Id: Id2b3886dce347fdcfcad22009a42b92febc00a6c	2024-02-09 12:17:24 -06:00
Vishnu Ramadas	7dae25e881	configs, gpu-compute: Add parameter in shader for CUs per SQC Change-Id: If0ae0db1b6ccc08a92f169a271b137f69f410f7b	2024-02-09 12:17:24 -06:00
Vishnu Ramadas	0e93e6142a	arch-vega, gpu-compute, mem-ruby: Remove extra empty lines Change-Id: I18770ec7e38c4a992a0ae6de95b0be49ab4426c2	2024-02-09 12:17:24 -06:00
Vishnu Ramadas	440409d807	gpu-compute: Add Icache invalidation at kernel start Previously, the data caches were invalidated at the start of each kernel. This commit adds support for invalidating instruction cache at kernel launch time Change-Id: I32e50f63fa1442c2514d4dd8f9d7689759f503d3	2024-02-09 12:16:41 -06:00
Vishnu Ramadas	03838afce0	gpu-compute: Add support for injecting scalar memory barrier This commit adds support for injecting a scalar memory barrier in the GPU. The barrier will primarily be used to invalidate the entire SQC cache. The commit also invalidates all buffers and decrements related counters upon completion of the invalidation request Change-Id: Ib8e270bbeb8229a4470d606c96876ba5c87335bf	2024-02-09 12:14:57 -06:00
Matthew Poremba	63caa780c2	misc: Remove all references to GCN3 Replace instances of "GCN3" with Vega. Remove gfx801 and gfx803. Rename FIJI to Vega and Carrizo to Raven. Using misc since there is not enough room to fit all the tags. Change-Id: Ibafc939d49a69be9068107a906e878408c7a5891	2024-01-17 11:11:06 -06:00
Matthew Poremba	6a9e80c54c	gpu-compute: Support for MI200 GPU model (#733 )	2024-01-15 08:18:34 -08:00
Matt Sinclair	ab9e61ea03	gpu-compute: WAX dependency detection (#731 ) WAX Dependencies would be missed if a RAW Dependency also existed.	2024-01-05 12:57:24 -06:00
Matt Sinclair	dc85d1492c	gpu-compute: Added register file cache support (#730 ) The RFC is defaulted to a size of 0 which removes it completely. To use the RFC set the --register-file-cache-size to a non-zero multiple of two. In addition, rfc_pipe_length may be altered to increase or decrease RFC latency benefit.	2024-01-05 12:57:06 -06:00
KaiBatley	359ac63280	gpu-compute: Added register file cache support The RFC is defaulted to a size of 0 which removes it completely. To use the RFC set the --register-file-cache-size to a non-zero multiple of two. In addition, rfc_pipe_length may be altrered to increase or decrease RFC latency benefit. Change-Id: I6f5bf5b750eb64155fbc8c8343e9feadce5c9f79	2024-01-04 22:43:05 -06:00
KaiBatley	55fce58c19	gpu-compute: WAX dependency detection WAX Dependencies would be missed if a RAW Dependency also existed. Change-Id: I2a9e50b9d0540a30de9c1bf6bb544c7b9654cb29	2024-01-03 22:02:02 -06:00
Matthew Poremba	cc75281802	gpu-compute: Update code object to latest LLVM The AMDKernelCode struct is very outdated. Most of the fields are no longer used and have been replaced with new fields that are used. Therefore in order to support the new fields the code object needs to be updated. The new structure is based on the table located at https://llvm.org/docs/AMDGPUUsage.html#code-object-v3-kernel-descriptor Most notably this adds the new compute_pgm_rsrc3 and kernarg preload fields which are new features in gfx90a (MI200). The accum_offset in compute_pgm_rsrc3 and kergarg preload values are necessary to run application which enable those features and therefore a way to check their values is needed. Also noteable is the removal of enable_sgpr_workgroup_id_{X,Y,Z}. These seem to be unused in all versions of ROCm that gem5 supports and therefore these fields can be removed. They are replaced with a reserved field in the new code object. Change-Id: I5542442e1e5961b05e17affad0adb5186d6d9d1a	2024-01-03 15:41:06 -06:00
Matthew Poremba	8c016ebbbc	gpu-compute: Implement packed workitem ABI init This initialization method is used in gfx90a (MI200). Rather than using three VGPRs for X,Y,Z dimensions of the kernel, pack them into one register with 10-bits for each dimensions. Change-Id: I8e5b681c8287779ff9f80451d6028e862322294a	2024-01-03 10:40:34 -06:00
Matthew Poremba	5e45233484	gpu-compute: Add gfx version to HSA task entry The version is necessary for determining the correct ABI init process. Add it to the task queue so it is accessible when doing ABI init. Change-Id: If77434b0f93614057b5c40fcf612d59b54e05dbb	2024-01-03 10:40:34 -06:00
Bobby R. Bruce	d11c40dcac	misc: Run `pre-commit run --all-files` This ensures `isort` is applied to all files in the repo. Change-Id: Ib7ced1c924ef1639542bf0d1a01c5737f6ba43e9	2023-11-29 22:06:41 -08:00
Gabe Black	db3a6e8e84	scons: Use Kconfig to configure gem5. These are not yet consumed by anything, but convert all the settings from SCons variables to Kconfig variables. If you have existing SConsopts files which need to be converted, you should take a look at KCONFIG.md to learn about how kconfig is used in gem5. You should decide if any variables need to be available to C++ or kconfig itself, and whether those are options which should be detected automatically, or should be up to the user. Options which should be measured automatically should still be in SConsopts files, while user facing options should be added to new or existing Kconfig files. Generally, make sure you're storing c++/kconfig visible options in env['CONF'][...]. Also remove references to sticky_vars since persistent options should now be handled with kconfig, and export_vars since everything in env['CONF'] is now exported automatically. Switch SCons/gem5 to use Kconfig for configuration, except EXTRAS which is still a sticky SCons variable. This is necessary because EXTRAS also controls what config options exist. If it came from Kconfig itself, then there would be a circular dependency. This dependency could theoretically be handled by reparsing the Kconfig when EXTRAS directories were added or removed, but that would be complicated, and isn't supported by kconfiglib. It wouldn't be worth the significant effort it would take to add it, just to use Kconfig more purely. Change-Id: I29ab1940b2d7b0e6635a490452d05befe5b4a2c9	2023-11-23 08:26:10 +08:00
Matthew Poremba	e362310f3d	gpu-compute: Update GPR allocation counts GPR allocation is using fields in the AMD kernel code structure which are not backwards compatible and are not populated in more recent compiler versions. Use the granulated fields instead which is enfored to be backwards compatible. Change-Id: I718716226f5dbeb08369d5365d5e85b029027932	2023-11-01 14:52:39 -05:00
Matthew Poremba	f07e0e7f5d	gpu-compute: Read dispatch packet with timing DMA This fixes occasional readBlob fatals caused by the functional read of system memory, seen often with the KVM CPU. Change-Id: Ifccee666f62faa5b2fcf0a64a9d77c8cf95b3add	2023-11-01 14:52:39 -05:00
Matthew Poremba	d05433b3f6	gpu-compute,dev-hsa: Send vendor packet completion signal gem5 does not currently implement any vendor-specific HSA packets. Starting in ROCm 5.5, vendor packets appear to end with a completion signal. Not sending this completion causes gem5 to hang. Since these packets are not documented anywhere and need to be reverse engineered we send the completion signal, if non-zero, and finish the packet as is the current behavior. Testing: HIP examples working on most recent ROCm release (5.7.1). Change-Id: Id0841407bec564c84f590c943f0609b17e01e14c	2023-11-01 14:52:39 -05:00
Matthew Poremba	da11427ba6	gpu-compute: Update tokens for flat global/scratch (#408 ) Memory instructions acquire coalescer tokens in the schedule stage. Currently this is only done for buffer and flat instructions, but not flat global or flat scratch. This change now acquires tokens for flat global and flat scratch instructions. This provides back-pressure to the CUs and helps to avoid deadlocks in Ruby. The change also handles returning tokens for buffer, flat global, and flat scratch instructions. This was previously only being done for normal flat instructions leading to deadlocks in some applications when the tokens were exhausted. To simplify the logic, added a needsToken() method to GPUDynInst which return if the instruction is buffer or any flat segment. The waitcnts were also incorrect for flat global and flat scratch. We should always decrement vmem and exp count for stores and only normal flat instructions should decrement lgkm. Currently vmem/exp are not decremented for flat global and flat scratch which can lead to deadlock. This change set fixes this by always decrementing vmem/exp and lgkm only for normal flat instructions. Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5	2023-10-11 09:00:10 -07:00
Matthew Poremba	9f4d334644	gpu-compute: Update tokens for flat global/scratch Memory instructions acquire coalescer tokens in the schedule stage. Currently this is only done for buffer and flat instructions, but not flat global or flat scratch. This change now acquires tokens for flat global and flat scratch instructions. This provides back-pressure to the CUs and helps to avoid deadlocks in Ruby. The change also handles returning tokens for buffer, flat global, and flat scratch instructions. This was previously only being done for normal flat instructions leading to deadlocks in some applications when the tokens were exhausted. To simplify the logic, added a needsToken() method to GPUDynInst which return if the instruction is buffer or any flat segment. The waitcnts were also incorrect for flat global and flat scratch. We should always decrement vmem and exp count for stores and only normal flat instructions should decrement lgkm. Currently vmem/exp are not decremented for flat global and flat scratch which can lead to deadlock. This change set fixes this by always decrementing vmem/exp and lgkm only for normal flat instructions. Change-Id: I673f4ac6121e4b5a5e8491bc9130c6d825d95fc5	2023-10-10 09:48:16 -05:00
Matthew Poremba	6a4b2bb096	dev-hsa,gpu-compute: Add timestamps to AMD HSA signals The AMD specific HSA signal contains start/end timestamps for dispatch packet completion signals. These are current always zero. These timestamp values are used for profiling in the ROCr runtime. Unfortunately, the GpuAgent::TranslateTime method in ROCr does not check for zero values before dividing, causing applications that use profiling to crash with SIGFPE. Profiling is used via hipEvents in the HACC application, so these should be supported in gem5. In order to handle writing the timestamp values, we need to DMA the values to memory before writing the completion signal. This changes the flow of the async completion signal write to be (1) read mailbox pointer (2) if valid, write the mailbox data, other skip to 4 (3) write mailbox data if pointer is valid (4) write timestamp values (5) write completion signal. The application will process the timestamp data as soon as the completion signal is received, so we need to ordering to ensure the DMA for timestamps was completed. HACC now runs to completion on GPUFS and has the same output was hardware. Change-Id: I09877cdff901d1402140f2c3bafea7605fa6554e	2023-10-06 13:21:40 -05:00
Matthew Poremba	2b97f17fe1	gpu-compute: Fix dynamic scratch size test ROCm supports dynamically allocating scratch space, which resides in framebuffer memory, to reduce the amount of memory allocated for kernels that have not yet launched. The size of the scratch space allocated is located in task->amdQueue.compute_tmpring_size_wavesize. This size is in kilobytes. The AQL task contains the number of bytes requested per work item, however we currently check if there is enough tmpring space by comparing a single work item. This should instead check the size per wavefront. This causes problems in applications where multiple kernels use dynamic scratch allocation and a later kernel requires more space than the earlier kernel. The only application being tested that does this is LULESH. This was resulting in the scratch space being too small, resulting in workgroups clobbering each other's private memory leading to some nasty bugs. It is fixed by this patch as task->amdQueue will be re-read from the host and will contain the updated tmpring size. After this there is enough scratch space and LULESH makes forward progress. Change-Id: Ie9e0f92bb98fd3c3d6c2da3db9ee65352f9ae070	2023-10-04 09:38:31 -05:00
Matthew Poremba	cfa833a97d	gpu-compute: Set LDS/scratch aperture base register Starting with gfx900 (Vega) the LDS and scratch apertures can be queried using a new s_getreg_b32 instruction. If the instruction is called with the SH_MEM_BASES argument it returns the upper 16 bits of a 64 bit address for the LDS and scratch apertures. The current addresses cannot be encoded in this register, so that addresses are changed to have the lower 48 bits be all zeros in addition to writing the bases register. Change-Id: If20f262b2685d248afe31aa3ebb274e4f0fc0772	2023-08-31 11:01:32 -05:00
Matthew Poremba	60f071d09a	gpu-compute,arch-vega: Implement flat scratch insts Flat scratch instructions (aka private) are the 3rd and final segment of flat instructions in gfx9 (Vega) and beyond. These are used for things like spills/fills and thread local storage. This commit enables two forms of flat scratch instructions: (1) flat_load/flat_store instructions where the memory address resolves to private memory and (2) the new scratch_load/scratch_store instructions in Vega. The first are similar to older generation ISAs where the aperture is unknown until address translation. The second are instructions guaranteed to go to private memory. Since these are very similar to flat global instructions there are minimal changes needed: - Ensure a flat instruction is either regular flat, global, XOR scratch - Rename the global op_encoding methods to GlobalScratch to indicate they are for both and are intentionally used. - Flat instructions in segment 1 output scratch_ in the disassembly - Flat instruction executed as private use similar mem helpers as global - Flat scratch cannot be an atomic This was tested using a modified version of the 'square' application: template <typename T> __global__ void scratch_square(T C_d, T A_d, size_t N) { size_t offset = (blockIdx.x * blockDim.x + threadIdx.x); size_t stride = blockDim.x * gridDim.x ; volatile int foo; // Volatile ensures scratch / unoptimized code for (size_t i=offset; i<N; i+=stride) { foo = A_d[i]; C_d[i] = foo * foo; } } Change-Id: Icc91a7f67836fa3e759fefe7c1c3f6851528ae7d	2023-08-26 13:40:12 -05:00
Matthew Poremba	4506188e00	gpu-compute: Fix private offset/size register indexes According to the ABI documentation from LLVM, the low register of flat scratch (maxSGPR - 4) is the offset and the high register (maxSGPR - 3) is size. These are currently backwards, resulting in some gnarly addresses being generated leading to page fault and/or incorrect data. This commit fixes this by setting the order correctly. Change-Id: I0b1d077c49c0ee2a4e59b0f6d85cdb8f17f9be61	2023-08-26 13:40:12 -05:00
Matthew Poremba	e0379f4526	gpu-compute: Fix flat scratch resource counters Flat instructions may access memory locations in LDS (scratchpad) and global (VRAM/framebuffer) and therefore increment both counters when dispatched. Once the aperture is known, we decrement the counters of the aperture that was not used. This is done incorrectly for scratch / private flat instruction. Private memory is global and therefore local memory counters should be decremented. This commit fixes the counters by changing the global decrements to local decrements. Change-Id: I25890446908df72e5469e9dbaba6c984955196cf	2023-08-26 13:40:12 -05:00
Matthew Poremba	57b3d2897c	gpu-compute: Use timing DMAs for GPUFS HSA signals The functional HSA signal read was a hack left in the gpu-compute code. In full system, this functional read is causing problems occasionally with the translation not yet being in the page table. The error message output by gem5 was a fatal message on the readBlob method in port proxy. Changing this to a timing DMA fixes this problem. This commit adds the various timing DMA functions to send and receive response and clean up. A helper method "sendCompletionSignal" is added to the GPUCommandProcessor because the indentation level was getting too deep. This change applies only to FS mode. Code for SE mode is equivalent to what it was before this commit. Change-Id: I1bfcaa0a52731cdf9532a7fd0eb06ab2f0e09d48	2023-08-25 13:10:51 -05:00
Matthew Poremba	90a518e885	gpu-compute,arch-vega: Fix ALU-only LDS counters There are a few LDS instructions that perform local ALU operations and writeback which are marked as loads. These are marked as loads because they fit in the pipeline logic better, according to a several year old comment. In the VEGA ISA these instructions (swizzle, permute, bpermute) are not decrementing the LDS load counter. As a result, the counter will gradually increase over time. Since wavefront slots are persistent, this can cause applications with a few thousand kernels to eventually hang thinking there are not enough resources. This changeset fixes this by decrementing the LDS load counter for these instructions. This fix was already integrated in the GCN3 ISA in the exact same way. This changeset moves it near a similar comment about scheduling register file writes. Change-Id: Ife5237a2cae7213948c32ef266f4f8f22917351c	2023-08-23 19:30:24 -05:00
Matthew Poremba	df4739929d	gpu-compute: Change kernel-based exit location The previous exit event occurs when the dispatcher sends a completion signal for a kernel, but gem5 does some kernel-based stats updates after the signal is sent. Therefore, if these exit events are used as a way to dump per-kernel stats, some of the stats for the kernel that just ended will be in the next kernel's stat dump which is misleading. This patch moves the exit event to where the stats are updated and only exits if the dispatcher has requested a stat dump to prevent situations where stats are updated mid-kernel. Change-Id: I74dc1cad5fc90382a2a80564764b3e7c9fb65521	2023-08-15 11:06:26 -05:00
KaiBatley	efa1d87add	configs: fix GPU's default number of HW barrier/CU (#92 ) AMD GCN3 and Vega GPUs assume a max of 16 WG/CU. Any GPU WG with more than 1 WF requires a hardware barrier to allow WFs in the WG to synchronize locally. However, currently the default gem5 GPU configuration assumes only 4 barriers per CU, which artificially prevents applications with > 4 WG/CU that could run simultaneously from running simultaneously. This fix resolves this by updating the default number of hardware barriers per CU to 16, which mimics the support described in slide 39 here: https://www.olcf.ornl.gov/wp-content/uploads/2019/10/ ORNL_Application_Readiness_Workshop-AMD_GPU_Basics.pdf Change-Id: Ib7636a13359d998e676c1790f436a83ce88cbfc0	2023-07-17 10:42:40 -07:00
Jason Lowe-Power	442923c414	Add feature to output citations automatically based on configuration (#90 ) This change adds a new file to m5out which is citations.bib. This file will contain the citations to the papers which describe the aspects of the gem5 simulator that the simulation uses. In other words, each simulation configuration could generate a different bib file referencing different works. Each SimObject can now have a set of citations associated with it. After the system is built (in `instantiate`), the citations.bib file is created by parsing all SimObjects that have been instantiated and taking the union of their associated citations. This commit is not meant to add all citations, but to act as an example for others to add more citations to gem5. Change-Id: Icd5c46fd9ee44adbeec1fea162657f5716f7e5ef Signed-off-by: Jason Lowe-Power <jason@lowepower.com>	2023-07-17 10:41:51 -07:00
Matthew Poremba	3756af8ed9	gpu-compute,configs: Make sim exits conditional The unconditional exit event when a kernel completes that was added in `c644eae2dd` is causing scripts that do not ignore unknown exit events to end simulation prematurely. One such script is the apu_se.py script used in SE mode GPU simulation. Make this exit conditional to the parameter being set to a valid value to avoid this problem. Change-Id: I1d2c082291fdbcf27390913ffdffb963ec8080dd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/72098 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-07-07 14:12:54 +00:00
Matthew Poremba	c644eae2dd	configs,gpu-compute: Kernel dispatch-based exit events Add two kernel dispatch-based exit events that are useful for limiting the simulation and enabling debug flags at specific GPU kernels. Since the KVM CPU typically used with GPUFS is not deterministic, this help with enabling debug flags when the Tick number may vary. The exit at GPU kernel option can also limit simulation by only simulating a few hundred kernels, for example, and exit at a determined point. Change-Id: I81bae92a80c25fc38c41e999aa662e1417b7a20d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71418 Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-06-08 22:03:47 +00:00
Matthew Poremba	ebd5b3e4ae	gpu-compute: Gfx version check for FS and SE mode There is no GPU device in SE mode to get version from and no GPU driver in FS mode to get version from, so a conditional needs to be added depending on the mode to get the gfx version. Change-Id: I33fdafb60d351ebc5148e2248244537fb5bebd31 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71078 Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>	2023-06-01 00:15:02 +00:00
Matthew Poremba	6b4a1020be	configs,dev-amdgpu: GPUFS MI200/gfx90a support Add support for MI200-like device. This includes adding PCI IDs and new MMIOs for the device, a different MAP_PROCESS packet, and a different calculation for the number of VGPRs. Change-Id: I0fb7b3ad928826beaa5386d52a94ba504369cb0d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70317 Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-05-25 19:14:32 +00:00
Giacomo Travaglini	7b39a7f14e	misc: Rename DEBUG macro into GEM5_DEBUG The DEBUG macro is not part of any compiler standards (differently from NDEBUG, which elides assertions). It is only meant to differentiate gem5.debug from .fast and .opt builds. gem5 developers have used it to insert helper code that is supposed to aid the debugging process in case anything goes wrong. This generic name is likely to clash with other libraries linked with gem5. This is the case of DRAMSim as an example. Rather than using undef tricks, we just inject a GEM5_DEBUG macro for gem5.debug builds. Change-Id: Ie913ca30da615bd0075277a260bbdbc397b7ec87 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69079 Tested-by: kokoro <noreply+kokoro@google.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>	2023-03-21 06:53:55 +00:00
Gabriel Busnot	7f4c92c910	mem,arch-arm,mem-ruby,cpu: Remove use of deprecated base port owner Change-Id: I29214278c3dd4829c89a6f7c93214b8123912e74 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67452 Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>	2023-02-03 06:11:45 +00:00
Vishnu Ramadas	d6bbccb60a	gpu-compute : Fix incorrect TLB stats when FunctionalTLB is used When FunctionalTLB is used in SE mode, the stats tlbLatency and tlbCycles report negative values. This patch fixes it by disabling the updates that result in negative values when FunctionalTLB is set to true Change-Id: I6962785fc1730b166b6d5b879e9c7618a8d6d4b3 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67202 Reviewed-by: Matt Sinclair <mattdsinclair.wisc@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matthew Poremba <matthew.poremba@amd.com> Reviewed-by: Matthew Poremba <matthew.poremba@amd.com> Tested-by: kokoro <noreply+kokoro@google.com>	2023-01-10 02:27:29 +00:00
Matthew Poremba	af2cecf59e	gpu-compute: Fix ABI init for DispatchId DispatchId should allocate two SGPRs instead of one. Allocating one was causing all subsequent SGPR index values to be off by one, leading to bad addresses for things like flat scratch and private segment. This field is not used very often so it was not impacting most applications. Change-Id: I17744e2d099fbc0447f400211ba7f8a42675ea06 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66711 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com>	2022-12-16 18:16:18 +00:00
Hoa Nguyen	eac06ad681	python: Fix multiline quotes in a single line An example case, ```python mem_side_port = RequestPort( "This port sends requests and " "receives responses" ) ``` This is the residue of running the python formatter. This is done by finding all tokens matching the regex `"\s"(?![.;"])` and manually replacing them by empty strings. Change-Id: Icf223bbe889e5fa5749a81ef77aa6e721f38b549 Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu> Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66111 Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu> Maintainer: Bobby Bruce <bbruce@ucdavis.edu> Tested-by: kokoro <noreply+kokoro@google.com>	2022-11-29 23:44:38 +00:00
vramadas95	dff879cf21	configs, gpu-compute: Add configurable L1 scalar latencies Previously the scalar cache path used the same latency parameter as the vector cache path for memory requests. This commit adds new parameters for the scalar cache path latencies. This commit also modifies the model to use the new latency parameter to set the memory request latency in the scalar cache. The new paramters are '--scalar-mem-req-latency' and '--scalar-mem-resp-latency' and are set to default values of 50 and 0 respectively Change-Id: I7483f780f2fc0cfbc320ed1fd0c2ee3e2dfc7af2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65511 Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com> Maintainer: Jason Lowe-Power <power.jg@gmail.com> Reviewed-by: Jason Lowe-Power <power.jg@gmail.com> Tested-by: kokoro <noreply+kokoro@google.com> Maintainer: Matt Sinclair <mattdsinclair@gmail.com>	2022-11-12 02:23:02 +00:00

1 2 3 4 5 ...

291 Commits