The ROCr runtime uses a combination of HSA signal timestamps and
hardware MMIOs to calculate profiling times. At the beginning of an
application a timestamp is read from the GPU using MMIOs. The clock
MMIOs reside in the GFX MMIO region, so a new AMDGPUGfx class is added
to handle these MMIOs.
The timestamp value is expected to be in nanoseconds, so we simply use
the gem5 tick converted to ns.
Change-Id: I7d1cba40d5042a7f7a81fd4d132402dc11b71bd4
The ROCm stack requires PCI express atomics. Currently the first PCI
CapabilityPtr does not point to anything, which signals to the OS
(Linux) that this is an early generation PCI device. As PCI express
atomics were introduced later, the CapabilityPtr needs to point to at
least a PCI express capability structure. This capability is defined as
0x10 in Linux. We additionally set the PCI atomic based bits and
implement device specific PCI configuration space reads and writes to
the amdgpu device.
With this commit, the output of simulation when loading the amdgpu
driver no longer outputs "PCIE atomics not supported". Further, an
application which uses PCIe atomics (PyTorch with a reduce_sum kernel)
now makes further progress.
Change-Id: I5e3866979659a2657f558941106ef65c2f4d9988
When using the new operator, delete should be called
on any allocated memory after it's use is complete.
Change-Id: Id5fcfb264b6ddc252c0a9dcafc2d3b020f7b5019
The PCI read/write functions are atomic functions in gem5, meaning they
expect a response with a latency value on the same simulation Tick. For
reads to a PCI device, the response must also include a data value read
from the device.
The AMDGPU device has a PCI BAR which mirrors the frame buffer memory.
Currently reads are done atomically, but writes are sent to a DMA device
without waiting for a write completion ACK. As a result, it is possible
that writes can be queued in the DMA device long enough that another
read for a queued address arrives. This happens very deterministically
with the AtomicSimpleCPU and causes GPUFS to break with that CPU.
This change makes writes to the frame BAR atomic the same as reads. This
avoids that problem and as a result the AtomicSimpleCPU can now load the
driver for GPUFS simulations.
Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71898
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Patch https://gem5-review.googlesource.com/c/public/gem5/+/70040 added
support for a variable number of SDMA engines to support newer GPU
models. As part of this an SDMA IDs map was added to map from SDMA ID
number to the SDMA SimObject pointer. In order to get the correct
pointer in unserialize now, we need to store the ID in the checkpoint
and use that to index the new map. We can't simply assign using the loop
variable as the SDMAs might not be in order in the checkpoint and
additionally the checkpoint contains both the gfx and page offset for
the SDMA engines, so each SDMA is inserted into the SDMA offset map
(sdmaEngs) twice.
Change-Id: I08e9a8d785f467b6eebff8ab0a9336851c87258d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70878
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Currently gem5 assumes the amdgpu device to be Vega10. In order to
support more devices we need to handle situations where different
registers and addresses have the same functionality but different
offsets on different devices.
This changeset adds an NBIO class to handle device discovery and driver
initialization related tasks, pulling them out of the AMDGPUDevice
class. The offsets used for MMIOs are reworked slightly to use offsets
rather than absolute addresses. This is because we cannot determine the
absolute address in the constructor since the BAR has not been assigned
by the OS yet.
Change-Id: I14b364374e086e185978334425a4e265cf2760d0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70041
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Currently the amdgpu simulated device is assumed to be a Vega10. As a
result there are a few things that are hardcoded. One of those is the
number of SDMAs. In order to add a newer device, such as MI100+, we need
to enable a flexible number of SDMAs.
In order to support a variable number of SDMAs and with the MMIO offsets
of each device being potentially different, the MMIO interface for SDMAs
is changed to use an SDMA class method dispatch table with forwards a
32-bit value from the MMIO packet to the MMIO functions in SDMA of the
format `void method(uint32_t)`. Several changes are made to enable this:
- Allow the SDMA to have a variable MMIO base and size. These are
configured in python.
- An SDMA class method dispatch table which contains the MMIO offset
relative to the SDMA's MMIO base address.
- An updated writeMMIO method to iterate over the SDMA MMIO address
ranges and call the appropriate SDMA MMIO method which matches the
MMIO offset.
- Moved all SDMA related MMIO data bit twiddling, masking, etc. into
the MMIO methods themselves instead of in the writeMMIO method in
SDMAEngine.
Change-Id: Ifce626f84d52f9e27e4438ba4e685e30dbf06dbc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70040
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
If an MMIO was previously written and the driver reads it, we should
return the value that was previously read. This overwrites the MMIO
trace value which is the last resort fallback for finding an MMIO value.
This is needed to initialize newer GPU devices in gem5.
Change-Id: Ida2435290b706288e88518b5d920691cdb6dcc09
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70039
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
For non-KVM CPUs the VBIOS memory falls into an I/O hole and therefore
gets routed to the PIO bus in gem5. This gets routed to the GPU in the
case of a ROM write. We write to the ROM as a way to "load" the VBIOS
without creating holes in the KVM VM.
This write method allows the same scripts as KVM to be used by writing
to the ROM area and overwriting what might already be there from the
--gpu-rom option.
Change-Id: I8c2d2aa05a823569a774dfdd3bf2d2e773f38683
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70037
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
The GART table is a legacy 1-level page table primarily used for
supervisor mode accesses to GPUs. The PTE size is 64-bits, not 32-bit.
This causes memory sizes >3GB (in X86) to fail loading amdgpu driver.
This changeset fixes the issue by setting the GART mappings to the
correct data type.
Change-Id: Ibfba2443675fe28316d26afa5f1a14885fdce40c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65091
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
This map was originally used for fast access to the GART table. It is no
longer needed as the table has been moved to the AMDGPUVM class. Along
with commit 12ec5f9172 which reads
functionally from device memory, this table is no longer needed and is
essentially a duplicate copy of device memory for anything written over
the PCI BAR.
This changeset removes the map entirely which will reduce the memory
footprint of simulations and potentially avoid stale copies of data
when reading over the PCI BAR.
Change-Id: I312ae38f869c6a65e50577b1c33dd055078aaf32
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63951
Reviewed-by: Matt Sinclair <mattdsinclair.wisc@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Previously framebuffer reads would try reading from MMIO trace, special
addresses, and then anything previously written to a special address
range. This does not handle direct large BAR reads, causing incorrect
results in some applications that were doing this.
Rework the readFramebuffer method to do the following. Remove the MMIO
trace read altogether, as there were not any framebuffer reads from the
trace to begin with. Read special addresses first to avoid overwriting
by previous writes. Next read previous writes to special ranges. The
special range is the GART table. These are required for functional
translations. Lastly read from the device memory directly. This does a
functional read required by the PCIDevice read method which is
non-timing. Reading from device memory is preferred over the map type
used for GART to avoid duplication of a potentially huge amount of data.
With this changeset all but one of the HIP samples and HIP examples
applications now run and pass verification of results.
Change-Id: Id3b788bfc5eaf17cfa1897f25d26f3725d4db321
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63171
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
In almost all cases reading/writing using the GPU memory manager will
want to wait until that read or write is complete. Therefore, change the
API to not default to no callback so that the user must explicitly
specify nullptr indicating they do not want to wait for completion.
Updates a write call which cannot use a callback due to being atomic in
the base gpu device code.
Change-Id: Id19145d49c7cafc97e2e178819682cb97270a16a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/62716
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reads to the frame buffer are currently handled by either the MMIO trace
or from the GART table if the address is in the GART aperture. In some
cases the MMIO trace will not contain the address or the data may have
been written previously and be different from the MMIO trace. To handle
this, return the data that was written previously by the driver. The
priority order from lowest to highest is: MMIO trace, device cache,
special framebuffer registers.
Change-Id: Ia45ae19555508fcd780926fedbd7a65c3d294727
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57589
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The driver will check this bit is set after initializing IPs. Currently
the MMIO trace will cause this bit to be set at the correct time,
however this is not portable access different ROCm versions. Therefore
we modify the value to always set the bit indicating interrupts are
enabled.
Change-Id: Iae0baf1936720fbe9835ae4acadbf1b3bdc52896
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57530
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Make the necessary changes to connect Vega pagetable walkers for
full-system mode. Previously the CP and HSA packet processor could only
read AQL packets from system/host memory using proxy port. This allows
for AQL to be read from device memory which is used for non-blit
kernels.
Change-Id: If28eb8be68173da03e15084765e77e92eda178e9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53077
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The PM4 packet processor is handling all non-HSA GPU packets such
as packets for (un)mapping HSA queues. This commit pulls many
Linux structs and defines out into their own files for clarity.
Finally, it implements the VMID related functions in AMDGPU device.
Change-Id: I5f0057209305404df58aff2c4cd07762d1a31690
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53068
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Add the devices that have been added in previous changesets to the
config file. Forward MMIO writes to the appropriate device based
on the MMIO address. Connect doorbells and forward rings to the
appropriate device based on queue type.
Change-Id: I44110c9a24559936102a246c9658abb84a8ce07e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53065
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
SDMAEngine handles copies to device memory. This commit
updates sdma_packets.hh style as well. Added several methods needed by
SDMAEngine to GPU device including GART table, various getters, and
aperture range checkers. Move the MMIO interface from GPUController to
SDMAEngine. Create an SDMA MMIO and commands header with only the macros
we use so that we don't need to check in multi-thousand line header
files from the linux kernel. Keep SOC15 IH client ID macros as that file
is small.
Change-Id: I986fede90cc1bc16ee56d4e8598cf9283bde034e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53064
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Create a VM class to reduce clutter in the amdgpu_device.* files. This
new file is in charge of reading/writting MMIOs related to VM contexts
and apertures. It also provides ranges checks for various apertures and
breaks out the MMIO interface so that there are not overloaded macro
definitions in the device MMIO methods.
The new translation generator classes for the various apertures are also
added to this class.
Change-Id: Ic224c1aa485685685b1136a46eed50bcf99d2350
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53066
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Remove the line "For use for simulation and test purposes only" in files
were AMD is the only copyright holder listed in the header. This happens
to be the case for all files where this line exists, removing it
completely from gem5.
Change-Id: I623f266b002f564301b28774f49081099cfc60fd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53943
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Apply the gem5 namespace to the codebase.
Some anonymous namespaces could theoretically be removed,
but since this change's main goal was to keep conflicts
at a minimum, it was decided not to modify much the
general shape of the files.
A few missing comments of the form "// namespace X" that
occurred before the newly added "} // namespace gem5"
have been added for consistency.
std out should not be included in the gem5 namespace, so
they weren't.
ProtoMessage has not been included in the gem5 namespace,
since I'm not familiar with how proto works.
Regarding the SystemC files, although they belong to gem5,
they actually perform integration between gem5 and SystemC;
therefore, it deserved its own separate namespace.
Files that are automatically generated have been included
in the gem5 namespace.
The .isa files currently are limited to a single namespace.
This limitation should be later removed to make it easier
to accomodate a better API.
Regarding the files in util, gem5:: was prepended where
suitable. Notice that this patch was tested as much as
possible given that most of these were already not
previously compiling.
Change-Id: Ia53d404ec79c46edaa98f654e23bc3b0e179fe2d
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46323
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There are special counters in the framebuffer that are tested during
driver initialization. The expected behavior of the counters is to
return the previously read value + 1. There is one (known) counter used
in driver initialization at a fixed BAR address offset.
Change-Id: Id2dbb5fa9365b0a0453b15013c45aa67a2eec190
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46163
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
The flow for Full System amdgpu is the use KVM to boot linux and begin
loading the driver module. However, the amdgpu module requires reading
the VGA ROM located at 0xc0000 in X86. KVM does not support having a
small 128KiB hole at this location, therefore we take a checkpoint and
switch to a timing CPU to continue loading the drivers before the VGA
ROM is read.
This creates a checkpoint just before the first MMIOs. This is indicated
by three interrupts being sent to the PCI device. After three interrupts
in a row are counted a checkpoint exit event occurs. The interrupt
counter is reset if a non-interrupt PCI read is seen.
Change-Id: I23b320abe81ff6e766cb3f604eca2979339938e5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46161
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The initial device contains enough code for the gpufs configuration
scripts to register an amdgpu device that identifies as a Vega 10
(Frontier Edition) device when PCI devices are listed by Linux. It also
contains stubs necessary for adding the MMIO interface to handle driver
initialization.
Using the configuration Linux boots and the device is successfully seen
in lspci. The driver can also begin loading an successfully sends
initial MMIOs and attempts to read the ROM.
Change-Id: I7ad87026876f31f44668e700d5adb639c2c053c1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/44909
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>