During squash of branch predictor history, RAS recovery mess up the
stack because of function "restore" in RAS (src/cpu/pred/ras.cc). In
restore function, it does not update "usedEntries" variable resulting in
restore failure.
To be specific, in order to remove mispredicted call, it uses pop() and
it updates tos. However in order to restore mispredicted ret
instruction, it uses restore() but it does not update tos. This pair of
function call mess up the RAS resulting in many misspeculation.
The solution is to update usedEntries variable as “push” function does.
This is possible because restoration is done with reverse order of push
and pop.
Jira Issue: https://gem5.atlassian.net/browse/GEM5-732
Change-Id: Ia14e71c26d20b2795fd55a6a0dd3284c03570614
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33214
Reviewed-by: Trivikram Reddy <tvreddy@ucdavis.edu>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There isn't anything special about using the page size, and it creates
an artificial dependence on the ISA. Instead of being based on pages,
the cache is now based on "chunks" who's size is a template parameter.
It defaults to 4K which is a common page size, but it can be tuned
arbitrarily if necessary.
Some unnecessary includes have been trimmed out as well.
Change-Id: I9fe59a5668d702433a800884fbfbdce20e0ade97
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33204
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
isa_traits.hh used to have much more in it, but now it only has
PageShift, PageBytes, and (for now) the guest endianness. These values
should only be retrieved from the System class generally speaking, so
only the system class should include arch/isa_traits.hh.
Some gpu compute related files need PageBytes or PageShift. Even though
those files don't advertise their ISA dependence, they are tied to x86.
In those files, they can include arch/x86/isa_traits.hh.
The only other file which legitimately needs arch/isa_traits.hh is the
decoder cache since it uses PageBytes to size an array.
Change-Id: I12686368715623e3140a68a7027c136bd52567b1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33203
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
In most cases, the microcode ROM doesn't actually do anything. The
structural existence of a microcode ROM doesn't make sense in the
general case, and in architectures that know they have one and need to
interact with it, they can cast their decoder into an arch specific type
and access the ROM that way.
Change-Id: I25b67bfe65df1fdb84eb5bc894cfcb83da1ce64b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32898
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
In sim/vma.hh, the include was indirectly getting the definition of
DPRINTF. It was replaced with an include of base/trace.hh which actually
provides that definition.
In the indirect branch predictor, it was being used to get the
definition of TheISA::PCState. This should come from arch/types.hh
instead.
Change-Id: I6de08f196499c85b54edde09d654902cc766c2eb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33195
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Original Creator: Adria Armejach.
Branch instructions needed to be annotated in x86 as direct/indirect and conditional/unconditional. These annotations where not present causing the branch predictor to misbehave, not using the BTB. In addition, logic to determine the real branch target at decode needed to be added as it was also missing.
Change-Id: I91e707452c1825b9bb4ae75c3f599da489ae5b9a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29154
Reviewed-by: Alexandru Duțu <alexandru.dutu@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This code was at least a little Alpha specific, and now that Alpha is
gone it can no longer be compiled. We could either fix it up to work
with other/all ISAs or delete it, and the consensus was to delete it. It
could potentially be revived in the future by retrieving it from version
control.
Change-Id: Ied073f2b9b166951ecba3442cd762eb19bc690b3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32954
Reviewed-by: Steve Reinhardt <stever@gmail.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Before this commit:
* SEV events were not waking neither WFE (wrong) nor futex WAIT (correct)
* locked memory events (LLSC) due to LDXR and STXR were waking up both
WFE (correct) and futex WAIT (wrong)
This commit fixes all wrong behaviours mentioned above.
The fact that LLSC events were waking up futexes leads to deadlocks,
as shown in the test case described at:
https://gem5.atlassian.net/browse/GEM5-537
because threads woken up by SVE are not removed from the waiter list
for the futex address they are sleeping on.
A previous fix atttempt was done at:
1531b56d605d47252dc0620bb3e755b7cf84df97
in which only sleeping threads are woken up. But that is not sufficient,
because the futex sleeping thread that was being wrongly woken up on SEV
can start to sleep on a second futex.
As an example, consider the case where 4 threads are fighting over two
critical sections protected by futex1 and futex2 addresses. In this case,
one thread wakes up the other thread after it is done with the section.
Suppose the following sequence of events:
* thread1 is awake and all others are suspended on futex1
* thread1 SEV wakes thread2 from the futex1 while in the critical region 1.
This is the wrong behaviour that this patch prevents, because
now thread2 is still in the sleeper list for futex1
* thread1 then futex wakes tread3, then proceeds to critical region 2.
* thread3 wakes up, but because thread2 has critical region, it sleeps
again.
* thread2 finishes its work, futex wakes thread3, and then proceeds to
futex2
When it reaches futex2, thread1 is still working there, so it sleeps on
futex2.
* thread3 futex wakes thread2, because it is still wrongly on the sleeper
list of futex1. But thread2 is in futex2 now.
If it weren't for this mistake, it should have awaken the final thread4
instead.
Outcome: thread4 sleeps forever, no other thread ever wakes it, because all
other threads have woken from futex1 and awoken another thread.
The problem is fixed by adding the waitingTcs unordered_set FutexMap,
which is basically an inverse map to FutexMap, which tracks (addr,
tgid) -> ThreadContext. This allows us allow to quickly check
if a given ThreadContext is waiting on a futex in any address.
Then the SEV wakeup code path
now checks if the thread is k
Change-Id: Icec5e30b041f53e5aa3b6e0d291e77bc0e865984
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29777
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Brandon Potter <Brandon.Potter@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This parameter is associated with a periodic event which would take a
sample for a kernel profile in FS mode. Unfortunately the only ISA which
had working versions of the necessary classes was alpha, and that has
been deleted. That means that without additional work for any given ISA,
the profile parameter has no chance of working.
Ideally, this parameter should be moved to the Workload classes. There
it can intrinsically be tied to a particular kernel, rather than having
to assume a particular kernel and gate everything on whether you're in
FS mode.
Because this isn't (IMHO) where this parameter should live in the long
term, and because it's currently unusable without additional development
for each of the ISAs, I think it makes the most sense to remove the
front end for this mechanism from the CPU.
Since the sampling/profiling mechanism itself could be useful and could
be re-plumbed somewhere else, the back end and its classes are left alone.
Change-Id: I2a3319c1d5ad0ef8c99f5d35953b93c51b2a8a0b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32214
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These classes are all basically empty now that Alpha has been deleted,
except in cases where the arch versions had copied versions of the Alpha
code.
This change pulls all the generic logic out of the arch versions, making
the arch versions much simpler and making it clearer what the core
functionality of the class is, and what parts are architecture specific
details.
In the future, the way the StackTrace class is instantiated should be
delegated to the Workload class so that ISA agnostic code doesn't need
to know about a particular ISA's StackTrace class, and so that
StackTrace logic can, at least theoretically, be specialized for a
particular workload. The way a stack trace is collected could vary from
OS to OS, for example.
Change-Id: Id8108f94e9fe8baf9b4056f2b6404571e9fa52f1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30961
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The memory-mapped timer emulated by gem5 is driven by the underlying
gem5 tick, which means that we must align the tick with the host time
to make the timer interrupt fire at a nearly native rate.
In each KVM execution round, the number of ticks incremented is
directly calculated from the number of instructions executed. However,
when a guest CPU switches to idle state, KVM seems to stay in kernel-
space until the POSIX timer set up in user-space raises an expiration
signal, instead of trapping to user-space immediately; and somehow the
instruction count is just too low to match the elapsed host time. This
makes the gem5 tick increment very slowly when the guest is idle and
drastically slow down workloads being sensitive to the guest time which
is driven by timer interrupt.
Before switching to KVM to execute the guest code, gem5 programs the
POSIX timer to expire according to the remaining ticks before the next
event in the event queue. Based on this, we can come up with the
following solution: If KVM returns to user-space due to POSIX timer
expiration, it must be time to process the next gem5 event, so we just
fast-forward the tick (by scheduling the next CPU tick event) to that
event directly without calculating from the instruction count.
There is one more related issue needed to be solved. The KVM exit
reason, KVM_EXIT_INTR, was treated as the case where the KVM execution
was disturbed by POSIX timer expiration. However, there exists a case
where the exit reason is KVM_EXIT_INTR but the POSIX timer has not
expired. Its cause is still unknown, but it can be observed via the
"old_value" argument returned by timer_settime() when disarming the
POSIX timer. In addition, it seems to happen often when a guest CPU is
not in idle state. When this happens, the above tick event scheduling
incorrectly treats KVM_EXIT_INTR as POSIX timer expiration and fast-
forwards the tick to process the next event too early. This makes the
guest feel external events come too fast, and will sometimes cause
trouble. One example is the VSYNC interrupt from HDLCD. The guest seems
to get stuck in VSYNC handling if the KVM CPU is not given enough time
between each VSYNC interrupt to complete a service. (Honestly I did not
dig in to see how the guest handled the VSYNC interrupt and how the
above situation became trouble. I just observed from the debug trace of
GIC & HDLCD & timer, and made this conclusion.) This change also uses
a workaround to detect POSIX timer expiration correctly to make the
guest work with HDLCD.
JIRA: https://gem5.atlassian.net/browse/GEM5-663
Change-Id: I6159238a36fc18c0c881d177a742d8a7745a23ca
Signed-off-by: Hsuan Hsu <hsuan.hsu@mediatek.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30919
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The O3 model uses ReExec faults to flush the pipeline and restart
after a memory ordering violation, e.g. due to an incoming snoop.
These, just like branch mispredict flushes, are not architectural
faults but micro-architectural events, and should therefore not
show up on the instruction tracing interface.
This adds a check on faulting instructions in commit, to verify
if the instruction faulted due to ReExec, to avoid tracing it.
Change-Id: I1d3eaffb0ff22411e0e16a69ef07961924c88c10
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30554
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
This patch sets the faulting flag in atomic, timing, minor and o3 CPU
models.
It also fixes the minor/timing CPU models which were not respecting the
ExecFaulting flag. This is now checked before calling dump() on the
tracing object, to bring it in line with the other CPU models.
Change-Id: I9c7b64cc5605596eb7fcf25fdecaeac5c4b5e3d7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30135
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These defaults are never used. There was an assert in the predictors
until recently which was asserting that one of the arguments didn't
have the default value, I think to verify that the default wasn't used
by accident(?), but it could be used purposefully. That would cause
gem5 to crash and has been removed.
Beyond that, there's no reason to have default values for those
arguments in the first place, so this change removes them. That makes
the code slightly simpler, and avoids them being used by accident.
Additionally, the defalt values of the arguments made the function
signatures inconsistent, even though they were supposed to override
each other.
JIRA: https://gem5.atlassian.net/browse/GEM5-483
Change-Id: I28f8d2048985c12ec9cac018a868a32bfa20dc6c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30375
Reviewed-by: Hsuan Hsu <hsuan.hsu@mediatek.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
These are the stats in the base class, not in any derived classes. Only
Alpha has an additional stats. These were not really "kernel"
statistics, they were just applicable primarily in FS. They are
potentially applicable to any simulation, but will probably not be
incremented in SE simulations.
Also this merges these stats from being per thread to being per
workload, ie operating system instance. This is probably more relevant
since exactly what thread within a workload runs which particular
instruction is not very important/predictable, but the aggregate
behavior is. If necessary, this could be adjusted in the future to
split things back out again into stats per thread while keeping them
inside the single workload object.
Change-Id: I130e11a9022bdfcadcfb02c7995871503114cd53
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/25147
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The ThreadContext can be used to access the cpu if needed, and is a
more representative interface to various pieces of state than the CPU
itself. Also convert some of the methods in Interupts to use the
locally stored ThreadContext pointer instead of taking one as an
argument. This makes calling those methods simpler and less error
prone.
Change-Id: I740bd99f92e54e052a618a4ae2927ea1c4ece193
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28988
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The System class has a few different arrays of values which each
correspond to a thread of execution based on their position. This
change collects them together into a single class to make managing them
easier and less error prone. It also collects methods for manipulating
those threads as an API for that class.
This class acts as a collection point for thread based state which the
System class can look into to get at all its state. It also acts as an
interface for interacting with threads for other classes. This forces
external consumers to use the API instead of accessing the individual
arrays which improves consistency.
Change-Id: Idc4575c5a0b56fe75f5c497809ad91c22bfe26cc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/25144
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Instead of calling into object files after the fact and asking them to
put symbols into a target symbol table, this change makes object files
fill in a symbol table themselves at construction. Then, that table can
be retrieved and used to fill in aggregate tables, masked, moved,
and/or filtered to have only one type of symbol binding.
This simplifies the symbol management API of the object file types
significantly, and makes it easier to deal with symbol tables alongside
binaries in the FS workload classes.
Change-Id: Ic9006ca432033d72589867c93d9c5f8a1d87f73c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/24787
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
After commit e2a5063e5f some
memory references now tracked as barriers were not having
their completion properly notified to the MemDepUnit.
This patch fixes InstructionQueue and changes MemDepUnit's
completeBarrier to completeInst, which now should be called
for both memory references and barrier instructions.
Change-Id: I28b5f112b45778f6272e71bb3766b364c3d2e7db
Signed-off-by: Tiago Mück <tiago.muck@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/29654
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>