Before this commit:
* SEV events were not waking neither WFE (wrong) nor futex WAIT (correct)
* locked memory events (LLSC) due to LDXR and STXR were waking up both
WFE (correct) and futex WAIT (wrong)
This commit fixes all wrong behaviours mentioned above.
The fact that LLSC events were waking up futexes leads to deadlocks,
as shown in the test case described at:
https://gem5.atlassian.net/browse/GEM5-537
because threads woken up by SVE are not removed from the waiter list
for the futex address they are sleeping on.
A previous fix atttempt was done at:
1531b56d605d47252dc0620bb3e755b7cf84df97
in which only sleeping threads are woken up. But that is not sufficient,
because the futex sleeping thread that was being wrongly woken up on SEV
can start to sleep on a second futex.
As an example, consider the case where 4 threads are fighting over two
critical sections protected by futex1 and futex2 addresses. In this case,
one thread wakes up the other thread after it is done with the section.
Suppose the following sequence of events:
* thread1 is awake and all others are suspended on futex1
* thread1 SEV wakes thread2 from the futex1 while in the critical region 1.
This is the wrong behaviour that this patch prevents, because
now thread2 is still in the sleeper list for futex1
* thread1 then futex wakes tread3, then proceeds to critical region 2.
* thread3 wakes up, but because thread2 has critical region, it sleeps
again.
* thread2 finishes its work, futex wakes thread3, and then proceeds to
futex2
When it reaches futex2, thread1 is still working there, so it sleeps on
futex2.
* thread3 futex wakes thread2, because it is still wrongly on the sleeper
list of futex1. But thread2 is in futex2 now.
If it weren't for this mistake, it should have awaken the final thread4
instead.
Outcome: thread4 sleeps forever, no other thread ever wakes it, because all
other threads have woken from futex1 and awoken another thread.
The problem is fixed by adding the waitingTcs unordered_set FutexMap,
which is basically an inverse map to FutexMap, which tracks (addr,
tgid) -> ThreadContext. This allows us allow to quickly check
if a given ThreadContext is waiting on a futex in any address.
Then the SEV wakeup code path
now checks if the thread is k
Change-Id: Icec5e30b041f53e5aa3b6e0d291e77bc0e865984
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29777
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Brandon Potter <Brandon.Potter@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This parameter is associated with a periodic event which would take a
sample for a kernel profile in FS mode. Unfortunately the only ISA which
had working versions of the necessary classes was alpha, and that has
been deleted. That means that without additional work for any given ISA,
the profile parameter has no chance of working.
Ideally, this parameter should be moved to the Workload classes. There
it can intrinsically be tied to a particular kernel, rather than having
to assume a particular kernel and gate everything on whether you're in
FS mode.
Because this isn't (IMHO) where this parameter should live in the long
term, and because it's currently unusable without additional development
for each of the ISAs, I think it makes the most sense to remove the
front end for this mechanism from the CPU.
Since the sampling/profiling mechanism itself could be useful and could
be re-plumbed somewhere else, the back end and its classes are left alone.
Change-Id: I2a3319c1d5ad0ef8c99f5d35953b93c51b2a8a0b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32214
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Each instance of simgen uses a license. If there are only so many to
go around, running many instances at once could exhaust the pool of
licenses and break the build.
The number of licenses may be less than the number of regular build
steps we want to do in parallel, but may be greater than zero. To
limit them to at most n in parallel where n might be less than j
and/or more than 1, we create a group of license slots, assign simgen
invocations to a slot, and then use scons's side effect mechanism to
ensure no two invocations in the same slot run at the same time.
This may be a suboptimal packing if the commands take significantly
different amounts of time to run since the slots are preallocated and
not demand allocated, but the difference shouldn't normally matter in
practice, and scons doesn't provide a better mechanism for partially
serializing certain build steps.
Change-Id: Ifae58b48ae1b989c1915444bf7564f352f042305
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32124
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
In disassembling of float register instructions, Gem5 always gives 2
source registers rs1 and rs2. However, this is not correct for Mul-Add
instructions which have three rs1, rs2, and rs3, and for Move, Convert
instructions which have only rs1.
For example: (Gem5 output vs Expected)
- fmadd.d fa0,fa0,fa4 vs fmadd.d fa0,fa0,fa4,fa5
- fcvt.d.l fa4,a6,zero vs fcvt.d.l fa4,a6
This patch fixes the problem.
Change-Id: I02d840eab602ac4a9782911b3cdff2935dfe5e68
Signed-off-by: Ian Jiang <ianjiang.ict@gmail.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32054
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch adds Secure EL2 feature. This allows stage1
EL2/EL&0 and stage2 secure translation.
The changes are organized as follow:
+ insts/static_inst.cc: Modify checks for illegalInstruction on eret
+ isa.cc/hh: Enabling contorl bits
+ isa/insts/misc.hh/64.hh: Smc fault trigger.
+ miscregs.cc/hh: Declaration and initialization of new registers
+ self_debug.cc/hh: Add secureEL2 types for breakpoints
+ stage2_lookup.cc/hh: Allow stage2 in secure state.
+ tlb.cc/table_walker.cc: Allow secure state for stage2 and stage 1 EL2&0
translation regime
+ utility.cc/hh: New function InSecure and refactor of other helpers
to enable secure state
JIRA: https://gem5.atlassian.net/browse/GEM5-686
Change-Id: Ie59438b1828508e944334420da1d8f4745649056
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31394
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This very simple and mostly useless operation has no side effects, and
can be used to verify that arguments are making it into gem5, being
operated on, and then that a result can be returned into the simulation.
Change-Id: I29bce824078526ff77513c80365f8fad88fef128
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/27557
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
`bitfield::replaceBits` has two parameters, `first` and `last`, which
relate to the position of the MSB and the LSB of the bits to be replaced
respectively. Therefore `first` >= `last`. In some areas of the
codebase, this assumption has been flipped with `first` <= `last`. This
caused at least one known error, recorded here:
https://gem5.atlassian.net/browse/GEM5-695. These inconsistencies have
therefore been rectified.
A note has been added to the `bitfield::replaceBits` Doxygen to make
the usage of this function clearer.
Change-Id: Ie75856161d9a5684066430ecbdcc52e04e1e77bf
Issue-on: https://gem5.atlassian.net/browse/GEM5-696
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31674
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Affected instructions: V_DIV_SCALE_F64, V_CMP_CLASS_F64,
V_CMPX_CLASS_F64 and their VOPC, VOP3, F32 variants.
These instances of std::isnormal were being used to check for
subnormal (denorms) values. std::isnormal is not specific enough.
It returns true for normal values but false for NaN, Inf, 0.0, and
subnormals. std::fpclassify returns macros for each category of
floating point numbers. Now we only catch subnormals.
Change-Id: I8d8f4452ff58de71e7c8e0b2b5e73467b532e196
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29967
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Replaced !std::isnormal with std::fpclassify because std::isnormal
is not specific enough. !std::isnormal was incorrectly catching
NaN, Inf, 0.0, and subnormals (aka denormals), where as it was only
suppose to catch subnormals.
The return value and error handling spec of std::ldexp listed on
cppreference.com appears to match up in nearly all cases after
making these changes. If std::ldexp handled subnormals as described
in the GCN3 2016 guide, we could have used vdst[lane] = std::ldexp
and not need to check for any corner cases.
Change-Id: I4c77af77c3b7798f86d40442610cef1296a28441
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29966
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
roundNearestEven is an inst_util function that RNDNE_F64 and F32
call, including both VOP1 and VOP3 formats. IEEE 754 spec says this
function should round inputs to the nearest integer but round ties
to the nearest even integer. Prior to this patch it was rounding all
inputs to nearest even, not just the ties. It was probably implemented
this way originally because the language in the ISA manual is ambiguous
although it provided the correct logic.
Fixed roundNearestEven to use the semantics originally described in
the GCN3 ISA manual.
Change-Id: I83ecb1d516fcf5bdf17e54ddf409b447a129a9a7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29964
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
s_getpc was currently reporting only a single operand,
and was only considering the SSRC operand. However,
this instruction' source is implicitly the PC.
Because its destination register was never tracked for
dependence checking purposes, dependence violations
are possible.
Change-Id: Ia80b8b3e24d5885f646a9ee41212a2cb35b9ffe6
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29954
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Instruction s_setreg_b32 was unimplemented, but is used by hipified
rodinia 'srad'. The instruction sets values of hardware internal
registers. If the instruction is writing into MODE to control
single-precision FP round and denorm modes, a simple warn will be
printed; for all other cases (non-MODE hw register or other
precisions), panic will happen.
Change-Id: Idb1cd5f60548a146bc980f1a27faff30259e74ce
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29949
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Xianwei Zhang <xianwei.zhang@amd.com>
Instructions that use the DPP field need to use the extra SRC0
register associated with the DPP instruction instead of the
"default" SRC0 register, since the default SRC0 register contains
the DPP information when DPP is being used. This commit fixes
2735c3bb88 to take this into account. Additionally, this commit
removes write of the src register from the DPP helper functions,
to avoid overwriting any changes made to the destination register.
Finally, this change modifies the instructions that use DPP to
simplify the flow through the execute() functions.
Change-Id: I80fd0af1f131f287f18ff73b3c1c9122d8c60823
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29947
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Barriers were not modeled properly. Firstly, barriers were
allocated to each WG that was launched, which is not
correct, and the CU would provide an infinite number
of barrier slots. There are a limited number of barrier slots
per CU in reality. In addition, the CU will not allocate
barrier slots to WGs with a single WF (nothing to sync if
only one WF).
Beyond modeling problems, there also the issue of deadlock.
The barrier could deadlock because not all WFs are freed
from the barrier once it has been satisfied. Instead, we
relied on the scoreboard stage to release them lazily,
one-by-one.
Under this implementation the scoreboard may not fully release
all WFs participating in a barrier; this happens because the
first WF to be freed from the barrier could reach an s_barrier
instruction again, forever causing the barrier counts across
WFs to be out-of-sync.
This change refactors the barrier logic to:
1) Create a proper barrier slot implementation
2) Enforce (via a parameter) the number of barrier
slots on the CU.
3) Simplify the logic and cleanup the code (i.e., we
no longer iterate through the entire WF list each
time we check if a barrier is satisfied).
4) Fix deadlock issues.
Change-Id: If53955b54931886baaae322640a7b9da7a1595e0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29943
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Motivation:
An AddressSizeFault on AArch32 can only happen during a table walk
since the register used as a base by LD/ST is always 32 bit wide.
On AArch64 on the other hand, addresses can be 64bit wide;
when MMU is off (no virtual memory) an invalid physical address
can be specified
Change-Id: Id3ef170e99202c6b0b511fa7205c754956861720
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31274
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>