These instructions are supposed to be read/writing special shader
hardware registers. Currently they are getting/setting to an SGPR. This
results in getting incorrect registers at best and clobbering an SGPR
being used by an application at worst. Furthermore, some registers need
to be set in the shader and the application will never (can never) set
them.
This patch overhauls the getreg/setreg instructions to use different
storage in the shader. The values will be updated either via setreg from
an application (e.g., mode register) or set by a PM4 MAP_PROCESS.
Change-Id: Ie5e5d552bd04dc47f5b35b5ee40a569ae345abac
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61655
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Here the mask should not be inverted. We also need to shift by the
offset to remove the padding as the consumer of the value expects the
offset to be removed.
This can be easily tested by running a GPU kernel with __shared__
variables. This will generate the following assembly:
s_getreg_b32 s6, hwreg(HW_REG_SH_MEM_BASES, 16, 16)
The current implementation returns the lower 16 bits (private memory
aperture) while the correct behavior is the uppter 16 bits (shared/LDS
memory aperture).
Change-Id: Iea8f0adceeadb24cdcf46ef4183fcaa8262ab9e7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61654
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
The driver uses the pasid to look up events that need to be set in
kfd_signal_event_interrupt (amdkfd/kfd_events.c). Currently this is
uninitialized which causes the function in the driver to return without
doing anything useful.
This changeset initializes the cookie PASID to 0x8000. 0x8000 is always
the first PASID assigned by the driver. This works since gem5 only
supports one GPU process in FS mode. This would have to be changed for
multi-process support, so a comment is added as a reminder.
Change-Id: I7074b581f2f2f346bd910eef15d5f9253ce17e2c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61653
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The environment variable HSA_ENABLE_INTERRUPT controls if Interrupt or
busy wait signals are used in the ROCm runtime. Interrupts are not being
sent in gem5 causing simulations to hang indefinitely in certain
situations. To fix this, always disable interrupts to fall back to busy
wait signals. Using interrupts is an old and simple optimization to not
waste CPU cycles, but from the perspective of simulation this is not
important. Disabling interrupt-based HSA signals therefore increases the
number of applications working within gem5.
Change-Id: I1ae21d7ee01548a4d00a8972642079b90278f9a2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61652
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
When GPU needs more scratch it requests from the runtime. In the
method to wait for response, a dmaReadVirt is called with the same
method as the callback with zero delay. This means that effectively
there is an infinite loop in the event queue if the scratch setup is not
successful on the first attempt. In the case of GPUFS, it is never
successfully instantly so a delay must be added. Without added delay,
the host CPU is never scheduled to make progress setting up more scratch
space.
The value 1e9 is choosen to match the KVM quantum and hopefully give KVM
a chance to schedule an event. For reference, the driver timeout is
200ms so this is still fairly aggressive checking of the signal response.
This value is also balanced around the GPUCommandProc DPRINTF to
prevent the print in this method from overwhelming debug output.
Change-Id: I0e0e1d75cd66f7c47815b13a4bfc3c0188e16220
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61651
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
This code is unnecessary as the read index is already correct.
Furthermore, it can cause hangs in some situations where the packet
SHOULD be marked as not complete. This causes a bug where the read index
is incremented by 1 multiple times, causing the packet processor to read
an invalid packet, followed by a hang after it does nothing.
Change-Id: Iceda3c9606e018f60f8902770a2d9762c1c14304
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61650
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
This instruction appears to be the only VOP1 instruction that has a
scalar destination using VDST as the destination register number.
However, since VDST is only 8 bits it cannot encode all possible
registers. Therefore, use the opcode to determine if the destination is
a scalar or vector destination.
This issue manifests as a VGPR dest being out of range for a kernel
where the number of SGPRs is more than the number of VGPRs and the
intended SGPR dest is larger than the count of VGPRs
Change-Id: I95a7de1ddb97f7171f48331fed36aef776fa0cb4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61649
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These VHE flags are not needed anymore.
They were used to trap EL2 access to VHE only registers (like CPACR_EL12)
when VHE was disabled (hcr.e2h = 0)
With the new faulting logic, we can just introduce VHE specific
callbacks checking for the hcr.e2h bitfield and returning an undefined
instruction if VHE is disabled.
In this way we don't have to add VHE only bits to every system register
Change-Id: I07bf9a9adc7a089bd45e718fb06d88488a2b7ed5
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61678
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch is adding per-EL read/write callbacks to the MiscRegLUTEntry
class. The goal is to merge access permission and trapping logic into
these unified callbacks
As of now the default callbacks are simply reimplementing the access
permission code, checking for MiscRegLUTEntry flags. This is the default
behaviour for all registers.
Trapping code (from MiscRegOp64::trap) will be moved with a later patch
Change-Id: Ib4bb1b5d95319548de5e77e00258fd65c11d88d7
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61675
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
The iss field is only used when the MSR/MRS instruction
gets trapped. Rather than generating it at decode time,
we generate the value within the trap method instead
This avoids the confusion of having a MSR/MRS register
instruction storing an immediate field
Later patches will change this even further by generating the
iss field on the fly ONLY if the instruction gets trapped
Change-Id: I97fdcf54d9643ea79a1f9d052073320ee68109fd
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61670
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Now that we have a pointer to the actual RegClass the RegId is
associated with, we can use it's regName method to pretty print the
RegId for us. This gets rid of the redundant print method for RegId.
Also, replace the default register printing method with the
implementation in the << operator, which is more descriptive.
Change-Id: I00e93032ddea77e167ca13e54b370de7210f1a2b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49808
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This commit added two paramaters in the set_se_binary_workload to pass
input parameters for the binary.
The "arguments" object allows users to pass in arugments in a list.
The "stdin_file" object allows users to pass in input file as a
Resource.
This commit also created a local variable "binary_path" to save the
return object of "binary.get_local_path()".
Note:
These new parameters were tested and passed in 4 cases:
1. only passing in (Resource/CustomResource) binary
2. passing in (CustomResource) binary and input_file
3. passing in (CustomResource) binary and argument(no input file
directory included)
4. passing in (CustomResource) binary and argument(with input file
directory included)
Jira Issue: https://gem5.atlassian.net/browse/GEM5-1242
Change-Id: I6433a349f7ecb5d630c7cdbe7268ff18915bf23f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61609
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
This option doesn't work and triggers a warning by Linux when booting.
To make it work, we need a chosen node containing an `stdout-path`
property in the FDT which currently doesn't exist.
I tried to create via a couple of approaches it but encountered multiple
issues:
1. One can set `stdout-path` to the complete path of the tty device, but
such path is impossible to get programmatically (unless it's
hardcoded).
2. One can set `stdout-path` as a reference to a label. While labels are
possible to generate easily, reference to labels cannot be generated
with the current FDT library.
So just remove this option for the time being.
Change-Id: I58ad879c0fdf567a812069ae91ebc7d4f8accf13
Signed-off-by: Joël Porquet-Lupine <joel@porquet.org>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61534
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
The specs for the LupIO-IPI device were recently updated. Instead of
providing a single IPI value for each processor, the device now provides
32 individual IPI bits that can be masked and set.
Update device accordingly in gem5.
Change-Id: Ia47cd1c70e073686bc2009d546c80edb0ad58711
Signed-off-by: Joël Porquet-Lupine <joel@porquet.org>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61530
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
First, this CL modifies the implementation of WFI so it actually put the
calling CPU into sleep.
This CL also adds an id in the IRQ table to represent NMI. This is
because the wakeup path is only implemented on cpu's postInterrupt
function, and it expects an int_num.
We still keep the MISCREG for nmie and nmip instead of merging them into
other ip/ie as that will give the user ability to get/set the nmi
status, which is pretty dangerous.
Change-Id: Idf8a5748990efa20aa9372efa97d3bed2aac82d9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61511
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Update the TCP_latency input arg to reflect what it does -- in
combination with the number of banks, it determines the number of
accesses that can happen in the L1 (TCP) in a given cycle. It does
not directly affect the L1 latency as the name implies. Instead,
the mandatory_queue_latency does this.
Change-Id: Ib6cbc8367ce2b1f30005d137384f53650a403b49
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61309
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Vega adds three new VOP2 instructions that may use VOP3 encoding that
are not part of the GCN3 ISA: v_add_u32, v_sub_u32, v_subrev_u32. This
changeset implements those three new instructions to fix errors related
to "invalid encoding" when those instructions are seen.
Tested using srad from Rodinia 3.0 HIP port which compiles a v_add_u32
instruction with VOP3 encoding.
Change-Id: I409a9f72f5c37895c3a0ab7ceb14a4dd121874a4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61330
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Add a pre-precommit configuration that can be used to run a series of
automated tests in CI. This configuration currently enforces Black
style on Python files and checks for other common errors (incorrect
json, mixed line endings etc.).
The CI system runs pre-commit using the following command:
pre-commit run --from-ref HEAD~ --to-ref HEAD
Users can install pre-commit hooks in their git repositories using the
following command:
pre-commit install
Note that our scons scripts automatically install a simple style
checker as a pre-commit hooks already. If the pre-commit tool is
installed after the existing hook, the existing hook will be kept and
called automatically.
The current pre-commit configuration doesn't currently doesn't run the
C++ style checker automatically. This feature could easily be added in
the future by adding the following to the pre-commit file:
- repo: local
hooks:
- id: style-checker
name: Check C++ coding style
language: python
entry: util/style.py -f
The main reason this isn't already in the configuration is that it is
expected that the style checker will have some false positives since
style checks will be enforced on the entire file rather than the lines
changed in a commit.
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Change-Id: Ibbb1dc641e2979509f3259cd951c7272094d69f5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61111
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>