The LDS and scratch aperture base and limits are hardcoded to some
values that are useful for SE mode. In reality, these are chosen by the
driver so we need to honor whatever values the driver passes so that
when addresses are calculated they fall into the correct aperture to
route flat instructions to those apertures.
This overwrites the default hardcoded values for LDS and scratch base
and limit using the values providing by the driver in a MAP_PROCESS
packet.
Change-Id: I0e194a26631f697819d8aaecf1bf346a7b7c7026
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61656
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
These instructions are supposed to be read/writing special shader
hardware registers. Currently they are getting/setting to an SGPR. This
results in getting incorrect registers at best and clobbering an SGPR
being used by an application at worst. Furthermore, some registers need
to be set in the shader and the application will never (can never) set
them.
This patch overhauls the getreg/setreg instructions to use different
storage in the shader. The values will be updated either via setreg from
an application (e.g., mode register) or set by a PM4 MAP_PROCESS.
Change-Id: Ie5e5d552bd04dc47f5b35b5ee40a569ae345abac
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61655
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
The driver uses the pasid to look up events that need to be set in
kfd_signal_event_interrupt (amdkfd/kfd_events.c). Currently this is
uninitialized which causes the function in the driver to return without
doing anything useful.
This changeset initializes the cookie PASID to 0x8000. 0x8000 is always
the first PASID assigned by the driver. This works since gem5 only
supports one GPU process in FS mode. This would have to be changed for
multi-process support, so a comment is added as a reminder.
Change-Id: I7074b581f2f2f346bd910eef15d5f9253ce17e2c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61653
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This code is unnecessary as the read index is already correct.
Furthermore, it can cause hangs in some situations where the packet
SHOULD be marked as not complete. This causes a bug where the read index
is incremented by 1 multiple times, causing the packet processor to read
an invalid packet, followed by a hang after it does nothing.
Change-Id: Iceda3c9606e018f60f8902770a2d9762c1c14304
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61650
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
The specs for the LupIO-IPI device were recently updated. Instead of
providing a single IPI value for each processor, the device now provides
32 individual IPI bits that can be masked and set.
Update device accordingly in gem5.
Change-Id: Ia47cd1c70e073686bc2009d546c80edb0ad58711
Signed-off-by: Joël Porquet-Lupine <joel@porquet.org>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61530
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
By design SimObject should initialize its state at init() stage.
However, the original intpin design will try to reset the sink side when
binding. This could cause unexpected issue as the other side does not
init() yet.
To align with the design, the call to upper()/lower() should be left to
the initiator in the init() function instead of constructor.
Change-Id: Iec8b228715d093381a33e747849119562bd634e1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/60751
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Yu-hsin Wang <yuhsingw@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Add GenericRiscvPciHost to RISCV Board. In addition, we connect the IGbE_e1000
ethernet card to PCI in order to verify the correct functionality.
To be noticed that we build a new Linux kernel v5.10 (with Bootloader) according to these steps (
https://github.com/gem5/gem5-resources/tree/stable/src/riscv-fs) adding the the PCI and e1000 drivers:
CONFIG_PCI_SYSCALL=y
CONFIG_PCI_STUB=y
CONFIG_PCI_HOST_GENERIC=y
CONFIG_NET_VENDOR_INTEL=y
CONFIG_E1000=y
CONFIG_E1000E=y
CONFIG_IGB=y
CONFIG_NET_VENDOR_I825XX=y
Here you can find the kernel.config and our prebuild kernel to verify the correct behaviour:
https://www.dropbox.com/scl/fo/sz9s37vybpfecbfilxqzz/h?dl=0&rlkey=klkxh33anjqnzwj3sopucqqzx
You can verify it with the following command:
build/RISCV/gem5.fast configs/example/gem5_library/riscv-fs.py
Dear Jason Lowe-Power,
Thank you for your comments! We have addressed all of them.
Best regards,
Nikolaos Tampouratzis
Dear Jason,
I think that it is ok now! :)
Thanks!
Best regards,
Nikolaos Tampouratzis
Change-Id: Id27d84a5588648b82cbfd5c88471927157ae6759
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/59969
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The AQL queue size is currently hardcoded to 64kB. For longer running
applications this causes the circular queue to wrap before reaching the
real end of the queue. Add the computation for queue size instead.
Previously longer applications (e.g., bc in pannotia) were hanging
around 4k kernels. With change the application launches 10k+ kernels.
Change-Id: I6c31677c1799a3c9ce28cf4e7e79efcb987e3b7f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/59449
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
The data allocated for the DMA request used to send an interrupt cookie
was too large. This was causing the memcpy to occasionally seg fault due
to reading past the bounds of the source parameter (the interrupt cookie
struct). Correct the size and add a compile time check to ensure it is
the correct number of bytes expected by the driver.
Change-Id: Ie9757cb52ce8f72354582c36cfd3a7e8a1525484
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58969
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
How to reset a model correctly is very different between models. Take
cpu models for instance, they have different reset pins for different
parts(typically one for each core, one for shared component, one for
debug interface). To make users more easily to reset the model, here we
want to introduce a special reset port. By implementing the port, users
can simply request a whole reset to the model. If users want to do
partial resets, users still can access the raw pins to achieve what they
want.
Change-Id: I746121d16441e021dc3392aeae1a6d9fa33d637a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58810
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Add logic to collect pointers to all GPU TLBs in full system. Implement
the invalid TLBs PM4 packet. The invalidate is done functionally since
there is really no benefit to simulate it with timing and there is no
support in the TLB to do so. This allow application with much larger
data sets which may reuse device memory pages to work in gem5 without
possibly crashing due to a stale translation being leftover in the TLB.
Change-Id: Ia30cce02154d482d8f75b2280409abb8f8375c24
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58470
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The QCntxt is reused when a queue is unmapped and mapped again. This is
fairly common in GPU full system. If this is not done the readIndex on
the queue context is reset to 1, causing getCommandsFromHost to read
from the wrong slot which is typically an old dispatch packet or an
invalid packet. This causes simulation to stall as the incorrect
completion signal is eventually written.
Change-Id: I65541e559fe04f5eb44b936ca37e3f802262fe6a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57670
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This makes what are configuration and what are internal SCons variables
explicit and separate, and makes it unnecessary to call out what
variables to export to C++.
These variables will also be plumbed into and out of kconfiglib in later
changes.
Change-Id: Iaf5e098d7404af06285c421dbdf8ef4171b3f001
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56892
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reads to the frame buffer are currently handled by either the MMIO trace
or from the GART table if the address is in the GART aperture. In some
cases the MMIO trace will not contain the address or the data may have
been written previously and be different from the MMIO trace. To handle
this, return the data that was written previously by the driver. The
priority order from lowest to highest is: MMIO trace, device cache,
special framebuffer registers.
Change-Id: Ia45ae19555508fcd780926fedbd7a65c3d294727
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57589
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The driver will check this bit is set after initializing IPs. Currently
the MMIO trace will cause this bit to be set at the correct time,
however this is not portable access different ROCm versions. Therefore
we modify the value to always set the bit indicating interrupts are
enabled.
Change-Id: Iae0baf1936720fbe9835ae4acadbf1b3bdc52896
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57530
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Make the necessary changes to connect Vega pagetable walkers for
full-system mode. Previously the CP and HSA packet processor could only
read AQL packets from system/host memory using proxy port. This allows
for AQL to be read from device memory which is used for non-blit
kernels.
Change-Id: If28eb8be68173da03e15084765e77e92eda178e9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53077
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The Arm Architecture Reference Manual has moved from
"Armv7-oriented" names for generic timer interrupts to
names more consistent with Armv8 (Exception Levels based).
We are therefore renaming those interrupts as follows:
int_phys_s -> int_el3_phys
int_phys_ns -> int_el1_phys
int_virt -> int_el1_virt
int_hyp -> int_el2_ns_phys
Change-Id: Id6e34a0e4311953938b25bca168a34357e3c8643
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/58109
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The PM4 packet processor is handling all non-HSA GPU packets such
as packets for (un)mapping HSA queues. This commit pulls many
Linux structs and defines out into their own files for clarity.
Finally, it implements the VMID related functions in AMDGPU device.
Change-Id: I5f0057209305404df58aff2c4cd07762d1a31690
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53068
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Add the devices that have been added in previous changesets to the
config file. Forward MMIO writes to the appropriate device based
on the MMIO address. Connect doorbells and forward rings to the
appropriate device based on queue type.
Change-Id: I44110c9a24559936102a246c9658abb84a8ce07e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53065
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
SDMAEngine handles copies to device memory. This commit
updates sdma_packets.hh style as well. Added several methods needed by
SDMAEngine to GPU device including GART table, various getters, and
aperture range checkers. Move the MMIO interface from GPUController to
SDMAEngine. Create an SDMA MMIO and commands header with only the macros
we use so that we don't need to check in multi-thousand line header
files from the linux kernel. Keep SOC15 IH client ID macros as that file
is small.
Change-Id: I986fede90cc1bc16ee56d4e8598cf9283bde034e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53064
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Create a VM class to reduce clutter in the amdgpu_device.* files. This
new file is in charge of reading/writting MMIOs related to VM contexts
and apertures. It also provides ranges checks for various apertures and
breaks out the MMIO interface so that there are not overloaded macro
definitions in the device MMIO methods.
The new translation generator classes for the various apertures are also
added to this class.
Change-Id: Ic224c1aa485685685b1136a46eed50bcf99d2350
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/53066
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
In a dGPU configuration, vector and scalar loads/stores can either be
requests to device memory or host memory depending on if the system bit
is set in the PTE when the request's virtual address is translated. This
object is used to send/receive those requests to the host via DMA.
This object will be used in a later changeset by the compute unit and
fetch units to issue data and instruction loads from the GPU which
translate to physical addresses on the host/cpu memory.
Change-Id: I4537059f90ebc03f3b2e6b8b631b4c452841f83f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51851
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>