The InstResult class is always used to store a register value, and also
only used to store a RegVal and not any more complex type like a
VecRegContainer. This is partially because the methods that *would*
store a complex result only have a pointer to work with, and don't have
a type to cast to to store the result in the InstResult.
This change reworks the InstResult class to hold the RegClass the
register goes with, and also either a standard RegVal, or a pointer to a
blob of memory holding the actual value if RegVal isn't appropriate. If
the InstResult has no RegClass, it is considered invalid.
To make working with InstResult easier, it also now has an "asString"
method which will just call into the RegClass's valString method with
the appropriate pointer.
By removing the ultimately unnecessary generality of the original class,
this change also simplifies InstResult significantly.
Change-Id: I71ace4da6c99b5dd82757e5365c493d795496fe5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50253
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
test_hello_se.py:
Added "take_params_progs" to store avaliable isa and the binary names.
Added two new parameters in function "verify_config" to take verifier
and input arguments for the binary.
simple_binary_run.py:
Added a new unrequired args called "arguments" to take input arguments
for the binary. Its default value is [ ] so the
"arguments = args.arguments" in the "set_se_workload" can run without
inputting any args.arguments.
Change-Id: Ib99dc92aa97060de5e1d34d9cac5800b82dab9e6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61771
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Most of the invalidation methods in the TLB class are
doing the same thing: looping over all entries, checking if
the entry matches a certain criteria, and invalidating it
in case it does.
The only specific bit is the matching function, therefore
we add a virtual TLBIOp::match method which allows us
to specialize different TLBIs and to provide a single
flush method in the TLB class
Change-Id: I0672ff958742ac7ebff8d30218f75127343f1a58
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61753
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
The method is no longer calling the lookup method which
had been complicated by the introduction of partial translations.
(which is now called during address translation only)
The lookup method is iterating over all TLB Entries until a non
partial translation is found. Using lookup in flushMva makes it
O(n^2). With this patch we iterate over the TLB entries only once
(making flushMva O(n))
Change-Id: I8f2ae56192812cee231baf6943068abea4d7ef91
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61752
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Fixing invalidation behaviour for the following stage 2 TLB maintainance
instructions
MISCREG_TLBI_IPAS2E1_Xt
MISCREG_TLBI_IPAS2LE1_X
MISCREG_TLBI_IPAS2E1_Xt
MISCREG_TLBI_IPAS2LE1_Xt
1) Do nothing if EL2 is not enabled in the current security state
2) If we are in secure state, the 63 bit of the Xt register selects
the security domain (s/ns) of the invalidated entries
Change-Id: I4573ed60ce619bcefd9cb05f00c5d3fcfa8d3199
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61751
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Replace the two constructors with one that takes the truly mandantory
parameters, and then a function to derive a new RegClass with some sort
of adjustment, currently by adding custom ops, or setting a non-standard
register size.
Because the constructor and the modifier function are constexpr, they
should fold away and not actually create extra temporary copies of the
RegClass in the modifier functions.
Change-Id: I8acb755eb28fc8474ec453c51ad205a52eed9a8e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50249
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
The following AArch64 CMOs were flagged as warnNotFail even
if they are actually implemented and there is no reason
for them to fail:
MISCREG_DC_IVAC_Xt
MISCREG_DC_ZVA_Xt
MISCREG_DC_CVAC_Xt
MISCREG_DC_CVAU_Xt
MISCREG_DC_CIVAC_Xt
This is likely coming from AArch32 (those CMOs are unimplemented in
AArch32).
Please note: this patch is not changing anything behaviorally; the
warnOnFail flag is not considered in AArch64 unless the unimplemented
flag is also set (and this was not the case for those CMOs)
Change-Id: I40396016703b9eb48f69b0eb710d077f8c2b146b
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61685
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
The LDS and scratch aperture base and limits are hardcoded to some
values that are useful for SE mode. In reality, these are chosen by the
driver so we need to honor whatever values the driver passes so that
when addresses are calculated they fall into the correct aperture to
route flat instructions to those apertures.
This overwrites the default hardcoded values for LDS and scratch base
and limit using the values providing by the driver in a MAP_PROCESS
packet.
Change-Id: I0e194a26631f697819d8aaecf1bf346a7b7c7026
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61656
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
These instructions are supposed to be read/writing special shader
hardware registers. Currently they are getting/setting to an SGPR. This
results in getting incorrect registers at best and clobbering an SGPR
being used by an application at worst. Furthermore, some registers need
to be set in the shader and the application will never (can never) set
them.
This patch overhauls the getreg/setreg instructions to use different
storage in the shader. The values will be updated either via setreg from
an application (e.g., mode register) or set by a PM4 MAP_PROCESS.
Change-Id: Ie5e5d552bd04dc47f5b35b5ee40a569ae345abac
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61655
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Here the mask should not be inverted. We also need to shift by the
offset to remove the padding as the consumer of the value expects the
offset to be removed.
This can be easily tested by running a GPU kernel with __shared__
variables. This will generate the following assembly:
s_getreg_b32 s6, hwreg(HW_REG_SH_MEM_BASES, 16, 16)
The current implementation returns the lower 16 bits (private memory
aperture) while the correct behavior is the uppter 16 bits (shared/LDS
memory aperture).
Change-Id: Iea8f0adceeadb24cdcf46ef4183fcaa8262ab9e7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61654
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
The driver uses the pasid to look up events that need to be set in
kfd_signal_event_interrupt (amdkfd/kfd_events.c). Currently this is
uninitialized which causes the function in the driver to return without
doing anything useful.
This changeset initializes the cookie PASID to 0x8000. 0x8000 is always
the first PASID assigned by the driver. This works since gem5 only
supports one GPU process in FS mode. This would have to be changed for
multi-process support, so a comment is added as a reminder.
Change-Id: I7074b581f2f2f346bd910eef15d5f9253ce17e2c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61653
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The environment variable HSA_ENABLE_INTERRUPT controls if Interrupt or
busy wait signals are used in the ROCm runtime. Interrupts are not being
sent in gem5 causing simulations to hang indefinitely in certain
situations. To fix this, always disable interrupts to fall back to busy
wait signals. Using interrupts is an old and simple optimization to not
waste CPU cycles, but from the perspective of simulation this is not
important. Disabling interrupt-based HSA signals therefore increases the
number of applications working within gem5.
Change-Id: I1ae21d7ee01548a4d00a8972642079b90278f9a2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61652
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
When GPU needs more scratch it requests from the runtime. In the
method to wait for response, a dmaReadVirt is called with the same
method as the callback with zero delay. This means that effectively
there is an infinite loop in the event queue if the scratch setup is not
successful on the first attempt. In the case of GPUFS, it is never
successfully instantly so a delay must be added. Without added delay,
the host CPU is never scheduled to make progress setting up more scratch
space.
The value 1e9 is choosen to match the KVM quantum and hopefully give KVM
a chance to schedule an event. For reference, the driver timeout is
200ms so this is still fairly aggressive checking of the signal response.
This value is also balanced around the GPUCommandProc DPRINTF to
prevent the print in this method from overwhelming debug output.
Change-Id: I0e0e1d75cd66f7c47815b13a4bfc3c0188e16220
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61651
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
This code is unnecessary as the read index is already correct.
Furthermore, it can cause hangs in some situations where the packet
SHOULD be marked as not complete. This causes a bug where the read index
is incremented by 1 multiple times, causing the packet processor to read
an invalid packet, followed by a hang after it does nothing.
Change-Id: Iceda3c9606e018f60f8902770a2d9762c1c14304
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61650
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
This instruction appears to be the only VOP1 instruction that has a
scalar destination using VDST as the destination register number.
However, since VDST is only 8 bits it cannot encode all possible
registers. Therefore, use the opcode to determine if the destination is
a scalar or vector destination.
This issue manifests as a VGPR dest being out of range for a kernel
where the number of SGPRs is more than the number of VGPRs and the
intended SGPR dest is larger than the count of VGPRs
Change-Id: I95a7de1ddb97f7171f48331fed36aef776fa0cb4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61649
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These VHE flags are not needed anymore.
They were used to trap EL2 access to VHE only registers (like CPACR_EL12)
when VHE was disabled (hcr.e2h = 0)
With the new faulting logic, we can just introduce VHE specific
callbacks checking for the hcr.e2h bitfield and returning an undefined
instruction if VHE is disabled.
In this way we don't have to add VHE only bits to every system register
Change-Id: I07bf9a9adc7a089bd45e718fb06d88488a2b7ed5
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61678
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch is adding per-EL read/write callbacks to the MiscRegLUTEntry
class. The goal is to merge access permission and trapping logic into
these unified callbacks
As of now the default callbacks are simply reimplementing the access
permission code, checking for MiscRegLUTEntry flags. This is the default
behaviour for all registers.
Trapping code (from MiscRegOp64::trap) will be moved with a later patch
Change-Id: Ib4bb1b5d95319548de5e77e00258fd65c11d88d7
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61675
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
The iss field is only used when the MSR/MRS instruction
gets trapped. Rather than generating it at decode time,
we generate the value within the trap method instead
This avoids the confusion of having a MSR/MRS register
instruction storing an immediate field
Later patches will change this even further by generating the
iss field on the fly ONLY if the instruction gets trapped
Change-Id: I97fdcf54d9643ea79a1f9d052073320ee68109fd
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61670
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Now that we have a pointer to the actual RegClass the RegId is
associated with, we can use it's regName method to pretty print the
RegId for us. This gets rid of the redundant print method for RegId.
Also, replace the default register printing method with the
implementation in the << operator, which is more descriptive.
Change-Id: I00e93032ddea77e167ca13e54b370de7210f1a2b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/49808
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This commit added two paramaters in the set_se_binary_workload to pass
input parameters for the binary.
The "arguments" object allows users to pass in arugments in a list.
The "stdin_file" object allows users to pass in input file as a
Resource.
This commit also created a local variable "binary_path" to save the
return object of "binary.get_local_path()".
Note:
These new parameters were tested and passed in 4 cases:
1. only passing in (Resource/CustomResource) binary
2. passing in (CustomResource) binary and input_file
3. passing in (CustomResource) binary and argument(no input file
directory included)
4. passing in (CustomResource) binary and argument(with input file
directory included)
Jira Issue: https://gem5.atlassian.net/browse/GEM5-1242
Change-Id: I6433a349f7ecb5d630c7cdbe7268ff18915bf23f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/61609
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>