In this context, the decoder width is the number of bytes that are fed
into the decoder at once. This is frequently the same as the size of an
instruction, but in instructions with occasionally variable instruction
sizes (ARM, RISCV), or extremely variable instruction sizes (x86) there
may be no relation.
Rather than determining the amount of data to feed to the decoder based
on a MachInst type defined by each ISA, this new interface adds some new
properties to the base InstDecoder class each arch specific decoder
inherits from. These are the size of the incoming buffer, a pointer to
wherever that data should end up, and a mask for masking a PC value so
it aligns with the instruction size.
These values are filled in by a templated InstDecoder constructor which
is templated based on what would have historically been the MachInst
type.
Because the "moreBytes" method would historically accept a parameter of
type MachInst, this parameter has also been eliminated. Now, the
decoder's parent object should use the pointer and size values to fill
in the buffer moreBytes reads. Then when moreBytes is called, it just
uses the buffer without having to show what its type is externally.
Change-Id: I0642cdb6a61e152441ca4ce47d748639175cda90
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40175
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There is a design which has been put forward which eliminates the idea
of a zero register entirely, but in the mean time, to get rid of one
more ISA specific constant, this change moves the ZeroReg constant into
the RegClassInfo class, specifically the IntRegClass instance which is
published by each ISA.
When the idea of zero registers has been eliminated entirely from
non ISA specific code, this and the existing machinery can be
eliminated.
Change-Id: I4302a53220dd5ff6b9b47ecc765bddc6698310ca
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42685
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The nullStaticInstPtr was low overhead, but the nopStaticInstPtr needed
an actual StaticInst implementation it could point to, and that brought
with it some (minor) additional dependencies. Specifically, the
implementation of advancePC needs the definition of TheISA::PCState,
while all other signatures/impementations in StaticInst are already
passing around that type by reference or could be made to, reducing
dependencies further.
Change-Id: I9ac6a6e5a3106858ea1fc727648f61dc39738a59
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42968
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
In most ISAs except MIPS and Power, this was implemented as
inst->advancePC(). It works just fine to call this function all the
time, but the idea had originally been that for ISAs which could simply
advance the PC using the PC itself, they could save the virtual function
call. Since the only ISAs which could skip the call were MIPS and Power,
and neither is at the point where that level of performance tuning
matters, this function can be collapsed with little downside.
If this turns out to be a performance bottleneck in the future, the way
the PC is managed could be revisited to see if we can factor out this
trip to the instruction object in the first place.
Change-Id: I533d1ad316e5c936466c529b7f1238a9ab87bd1c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39335
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Alex Dutu <alexandru.dutu@amd.com>
The create() method on Params structs usually instantiate SimObjects
using a constructor which takes the Params struct as a parameter
somehow. There has been a lot of needless variation in how that was
done, making it annoying to pass Params down to base classes. Some of
the different forms were:
const Params &
Params &
Params *
const Params *
Params const*
This change goes through and fixes up every constructor and every
create() method to use the const Params & form. We use a reference
because the Params struct should never be null. We use const because
neither the create method nor the consuming object should modify the
record of the parameters as they came in from the config. That would
make consuming them not idempotent, and make it impossible to tell what
the actual simulation configuration was since it would change from any
user visible form (config script, config.ini, dot pdf output).
Change-Id: I77453cba52fdcfd5f4eec92dfb0bddb5a9945f31
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35938
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This patch enables all 4 CPU models (AtomicSimpleCPU, TimingSimpleCPU,
MinorCPU and DerivO3CPU) to issue atomic memory (AMO) requests to memory
system.
Atomic memory instruction is treated as a special store instruction in
all CPU models.
In simple CPUs, an AMO request with an associated AtomicOpFunctor is
simply sent to L1 dcache.
In MinorCPU, an AMO request bypasses store buffer and waits for any
conflicting store request(s) currently in the store buffer to retire
before the AMO request is sent to the cache. AMO requests are not buffered
in the store buffer, so their effects appear immediately in the cache.
In DerivO3CPU, an AMO request is inserted in the store buffer so that it
is delivered to the cache only after all previous stores are issued to
the cache. Data forwarding between between an outstanding AMO in the
store buffer and a subsequent load is not allowed since the AMO request
does not hold valid data until it's executed in the cache.
This implementation assumes that a target ISA implementation must insert
enough memory fences as micro-ops around an atomic instruction to
enforce a correct order of memory instructions with respect to its
memory consistency model. Without extra memory fences, this implementation
can allow AMOs and other memory instructions that do not conflict
(i.e., not target the same address) to reorder.
This implementation also assumes that atomic instructions execute within
a cache line boundary since the cache for now is not able to execute an
operation on two different cache lines in one single step. Therefore,
ISAs like x86 that require multi-cache-line atomic instructions need to
either use a pair of locking load and unlocking store or change the
cache implementation to guarantee the atomicity of an atomic
instruction.
Change-Id: Ib8a7c81868ac05b98d73afc7d16eb88486f8cf9a
Reviewed-on: https://gem5-review.googlesource.com/c/8188
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
When a thread is activated by another thread calling a clone system
call, the child thread's context is initialized in the middle of the
clone system call and before the context is fully initialized.
Therefore, the child thread starts fetching an unitialized PC, which
could lead to a page fault.
This patch adds a pipeline wakeup event that is scheduled later in the
cycle when the thread is activated. This event ensures that the first
fetch only happens after the thread context is fully initialized
(e.g., in case of clone syscall, it is when the parent thread copies
its context over to the child thread).
When a thread first starts or wakes up, input queue to the Fetch2 stage
needs to be drained since the execution flow is likely to change and
previously fetched instructions in the queue may no longer be in the
correct flow. This patch dumps/drains all inputs in the input queue
of a thread context in the Fetch2 stage when the associated thread wakes
up.
Change-Id: Iad970638e435858b7289cd471158cc0afdbbb0e5
Reviewed-on: https://gem5-review.googlesource.com/c/8182
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Brandon Potter <Brandon.Potter@amd.com>
The Minor and o3 cpu models share the branch prediction
code. Minor relies on the BPredUnit::squash() function
to update the branch predictor tables on a branch mispre-
diction. This is fine because Minor executes in-order, so
the update is on the correct path. However, this causes the
branch predictor to be updated on out-of-order branch
mispredictions when using the o3 model, which should not
be the case.
This patch guards against speculative update of the branch
prediction tables. On a branch misprediction, BPredUnit::squash()
calls BpredUnit::update(..., squashed = true). The underlying
branch predictor tests against the value of squashed. If it is
true, it restores any speculatively updated internal state
it might have (e.g., global/local branch history), then returns.
If false, it updates its prediction tables. Previously, exist-
ing predictors did not test against the "squashed" parameter.
To accomodate for this change, the Minor model must now call
BPredUnit::squash() then BPredUnit::update(..., squashed = false)
on branch mispredictions. Before, calling BpredUnit::squash()
performed the prediction tables update.
The effect is a slight MPKI improvement when using the o3
model. A further patch should perform the same modifications
for the indirect target predictor and BTB (less critical).
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
The behavior of WFI is to cause minor to cease evaluating
pipeline logic until an interrupt is observed, however
a user may wish to drain the system while a core is sleeping
due to a WFI. This patch makes WFI drain. If an actual
drain occurs during a WFI, the CPU is already drained and will
immediately be ready for swapping, checkpointing, etc. This
should not negatively impact performance as WFI instructions
are 'stream-changing' (treated like unpredicted branches), so
all remaining instructions are wrong-path and will be squashed
rapidly.
Change-Id: I63833d5acb53d8dde78f9f0c9611de0ece385e45
This patch adds SMT support to the MinorCPU. Currently
RoundRobin or Random thread scheduling are supported.
Change-Id: I91faf39ff881af5918cca05051829fc6261f20e3
Another churn to clean up undefined behaviour, mostly ARM, but some
parts also touching the generic part of the code base.
Most of the fixes are simply ensuring that proper intialisation. One
of the more subtle changes is the return type of the sign-extension,
which is changed to uint64_t. This is to avoid shifting negative
values (undefined behaviour) in the ISA code.
This patch contains a new CPU model named `Minor'. Minor models a four
stage in-order execution pipeline (fetch lines, decompose into
macroops, decompose macroops into microops, execute).
The model was developed to support the ARM ISA but should be fixable
to support all the remaining gem5 ISAs. It currently also works for
Alpha, and regressions are included for ARM and Alpha (including Linux
boot).
Documentation for the model can be found in src/doc/inside-minor.doxygen and
its internal operations can be visualised using the Minorview tool
utils/minorview.py.
Minor was designed to be fairly simple and not to engage in a lot of
instruction annotation. As such, it currently has very few gathered
stats and may lack other gem5 features.
Minor is faster than the o3 model. Sample results:
Benchmark | Stat host_seconds (s)
---------------+--------v--------v--------
(on ARM, opt) | simple | o3 | minor
| timing | timing | timing
---------------+--------+--------+--------
10.linux-boot | 169 | 1883 | 1075
10.mcf | 117 | 967 | 491
20.parser | 668 | 6315 | 3146
30.eon | 542 | 3413 | 2414
40.perlbmk | 2339 | 20905 | 11532
50.vortex | 122 | 1094 | 588
60.bzip2 | 2045 | 18061 | 9662
70.twolf | 207 | 2736 | 1036