The Vega ISA's s_memtime instruction is used to obtain a cycle value
from the GPU. Previously, this was implemented to obtain the cycle count
when the memtime instruction reached the execute stage of the GPU
pipeline. However, from microbenchmarking we have found that this under
reports the latency for memtime instructions relative to real hardware.
Thus, we changed its behavior to go through the scalar memory pipeline
and obtain a latency value from the the SQC (L1 I$). This mirrors the
suggestion of the AMD Vega ISA manual that s_memtime should be treated
like a s_load_dwordx2.
The default latency was set based on microbenchmarking.
Change-Id: I5e251dde28c06fe1c492aea4abf9f34f05784420
A new host tag `gcn_gpu` has been added. This allows for selection of
those GPU tests which depend upon the gcn-gpu docker image to run.
In addition to this, the square GPU tests has been moved to the CI
tests. This ensures some GPU code is compiled and run on every PR.
Adding RP_choose functions to change replacement policies among
TreePLRU, LRU, FIFO, LFU, LIP, MRU, NRU, RRIP, SecondChance AND ShiPMem replacement
policies for TCC, TCP and SQC caches for GPU
Replace instances of "GCN3" with Vega. Remove gfx801 and gfx803. Rename
FIJI to Vega and Carrizo to Raven.
Using misc since there is not enough room to fit all the tags.
Change-Id: Ibafc939d49a69be9068107a906e878408c7a5891
The RFC is defaulted to a size of 0 which removes it completely. To use
the RFC set the --register-file-cache-size to a non-zero multiple of
two. In addition, rfc_pipe_length may be altered to increase or decrease
RFC latency benefit.
The RFC is defaulted to a size of 0 which removes it completely. To use
the RFC set the --register-file-cache-size to a non-zero multiple of
two. In addition, rfc_pipe_length may be altrered to increase or
decrease RFC latency benefit.
Change-Id: I6f5bf5b750eb64155fbc8c8343e9feadce5c9f79
`get_runtime_isa()` has been deprecated for some time. It is a leftover
piece of code from when gem5 was compiled to a single ISA and that ISA
used to configure the simulated system to use that ISA. Since multi-ISA
compilations are possible, `get_runtime_isa()` should not be used.
Unless the gem5 binary is compiled to a single ISA, a failure will
occur.
The new proceedure for specify which ISA to use is by the setting of the
correct `BaseCPU` implementation. E.g., `X86SimpleTimingCPU` of
`ArmO3CPU`.
This patch removes the remaining `get_runtime_isa()` instances and
removes the function itself. The `SimpleCore` class has been updated to
allow for it's CPU factory to return a class, needed by scripts in
"configs/common".
The deprecated functionality in the standard library, which allowed for
the specifying of an ISA when setting up a processor and/or core has
also been removed. Setting an ISA is now manditory.
Fixes#216.
An example case,
```python
mem_side_port = RequestPort(
"This port sends requests and " "receives responses"
)
```
This is the residue of running the python formatter.
This is done by finding all tokens matching the regex `"\s"(?![.;"])`
and manually replacing them by empty strings.
Change-Id: Icf223bbe889e5fa5749a81ef77aa6e721f38b549
Signed-off-by: Hoa Nguyen <hoanguyen@ucdavis.edu>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66111
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Previously the scalar cache path used the same latency parameter as the
vector cache path for memory requests. This commit adds new parameters
for the scalar cache path latencies. This commit also modifies the model
to use the new latency parameter to set the memory request latency in
the scalar cache. The new paramters are '--scalar-mem-req-latency' and
'--scalar-mem-resp-latency' and are set to default values of 50 and 0
respectively
Change-Id: I7483f780f2fc0cfbc320ed1fd0c2ee3e2dfc7af2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65511
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Previously the L1 request and response latencies were not configurable
in the GPU config scripts. As a result, the simulations used the default
values from GPU.py. This commits adds support to change this value as an
input parameter. The parameters to use are "--mem-req-latency" followed
by the value and "--mem-resp-latency" followed by the value. The default
values are the same as those in GPU.py (which is 50). These new
parameters should be set instead of changing the mandatory queue latency when configuring the L1 cache.
Change-Id: I812d77758ea12530899953f308c91f4c8b05866d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63971
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Previously the LDS bus latency was not configurable in the GPU config
scripts. As a result, the simulations would use the default value from
GPU.py. This commit adds support to change this value as an input
option. The option to use is "--vrf_lm_bus_latency".
Change-Id: I8d8852e6d7b9d03ebec1fe8b392968f396dd3526
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63652
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair.wisc@gmail.com>
The TARGET_ISA variable would let you select one ISA from a list of
possible ISAs. That has now been replaced with USE_ARM_ISA, USE_X86_ISA,
etc, variables which are boolean on or off. That will allow any number
of ISAs to be enabled or disabled individually. Enabling something other
than exactly one of these will probably prevent you from getting a
working gem5 binary, but those problems are being addressed in other,
parallel change series.
I decided to use the USE_ prefix since it was consistent with most other
on/off variables we have in gem5. One noteable exception is the
BUILD_GPU setting which, you could convincingly argue, is a better
prefix than USE_. Another option would be to use CONFIG_, in
anticipation of using a kconfig style config mechanism in gem5.
It seemed premature to start using a CONFIG_ prefix here, and if we
decide to switch to some other prefix like BUILD_, it should be a
purposeful choice and not something somebody just starts using.
Change-Id: I90fef2835aa4712782e6c1313fbf564d0ed45538
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/52491
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The current default GPU register allocator is the "simple" policy,
which only allows 1 wavefront to run at a time on each CU. This is
not very realistic and also means the tester (when not specifically
choosing the dynamic policy) is less rigorous in terms of validating
correctness.
To resolve this, this commit changes the default to the "dynamic"
register allocator, which runs as many waves per CU as there are
space in terms of registers and other resources -- thus it is more
realistic and does a better job of ensuring test coverage.
Change-Id: Ifca915130bb4f44da6a9ef896336138542b4e93e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57537
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Remove the line "For use for simulation and test purposes only" in files
were AMD is the only copyright holder listed in the header. This happens
to be the case for all files where this line exists, removing it
completely from gem5.
Change-Id: I623f266b002f564301b28774f49081099cfc60fd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53943
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch adds support for a gfx902 Vega APU, ripping the
appropriate values for device_id from the ROCm Thunk
(src/topology.c).
Note: gfx902 isn't officially supported by ROCm. This
means that it may not work for all programs. In particular,
rocBLAS is incompatible with gfx902, so anything that uses
rocBLAS won't be able to run with gfx902.
Change-Id: I48893e7cc9c7e52275fdfd22314f371a9db8e90a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47530
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
GPU MTYPE is currently set using a global config passed to the
PACoalescer. This patch enables MTYPE to be set by the shader on a
per-request bases. In real hardware, the MTYPE is extracted from a
GPUVM PTE during address translation. However, our current simulator
only models x86 page tables which do not have the appropriate bits for
GPU MTYPES. Rather than hacking non-x86 bits into our x86 page table
models, this patch instead keeps an interval tree of all pages that
request custom MTYPES in the driver itself. This is currently
only used to map host pages to the GPU as uncacheable, but is easily
extensible to other MTYPES.
Change-Id: I7daab0ffae42084b9131a67c85cd0aa4bbbfc8d6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42216
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There was a merge error caused by new options being added to this script
while all scripts were being converted from optparse. This fixes the
error.
This also removes the mostly unused setOption / getOption as you can
directly assign a value to an argument after parsing
Change-Id: Ic8aaa0728a43936cd4c6e1ed590e01ba5f0fbf5b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/44785
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
New topology ripped from Fiji to support dGPU. A dGPU flag is added to
the config which is propogated to the driver. The emulated driver is
now able to properly deal with dGPU ioctls and mmaps. For now, dGPU
physical memory is allocated from the host, but this is easy to change
once we get a GPU memory controller up and running.
Change-Id: I594418482b12ec8fb2e4018d8d0371d56f4f51c8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42214
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Event creation and management support from emulated drivers is required
to support interruptible signals in HSA and this support was not
available. This changeset adds the event creation and management support
in the emulated driver. With this patch, each interruptible signal
created by the HSA runtime is associated with a signal event. The HSA
runtime can then put a thread waiting on a signal condition to sleep
asking the driver to monitor the event associated with that signal. If
the signal is modified by the GPU, the dispatcher notifies the driver
about signal value change. If the modifier is a CPU thread, the thread
will have to make HSA API calls to modify the signal and these API calls
will notify the driver about signal value change. Once the driver is
notified about a change in the signal value, the driver checks to see if
any thread is sleeping on that signal and wake up the sleeping thread
associated with that event. The driver has also implemented the time_out
wakeup that can wake up the thread after a certain time period has
expired. This is also true for barrier packets.
Each signal has an event address in a kernel managed and allocated
event page that can be used as a mailbox pointer to notify an event.
However, this feature used by non-CPU agents to communicate with the
driver is not implemented by this changeset because the non-CPU HSA
agents in our model can directly communicate with driver in our
implementation. Having said that, adding that feature should be trivial
because the event address and event pages are correctly setup by this
changeset and just adding the event page's virtual address to our PIO
doorbell interface in the page tables and registering that pio address
to the driver should be sufficient. Managing mailbox pointer for an
event is based on event ID and using this event ID as an index into
event page, this changeset already provides a unique mailbox pointer for
each event.
Change-Id: Ic62794076ddd47526b1f952fdb4c1bad632bdd2e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38335
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
SimplePoolManager doesn't allow mapping of two WGs
simultaneously on the same Compute Unit (provided
the previous WG has been mapped to all the SIMDs)
even if there is sufficient VRF and SRF space
available.
DynPoolManager takes care of that by dynamically
allocating and deallocating register file space
to wavefronts
Change-Id: I2255c68d4b421615d7b231edc05d3ebb27cbd66c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32034
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>