There is a design which has been put forward which eliminates the idea
of a zero register entirely, but in the mean time, to get rid of one
more ISA specific constant, this change moves the ZeroReg constant into
the RegClassInfo class, specifically the IntRegClass instance which is
published by each ISA.
When the idea of zero registers has been eliminated entirely from
non ISA specific code, this and the existing machinery can be
eliminated.
Change-Id: I4302a53220dd5ff6b9b47ecc765bddc6698310ca
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42685
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
In most ISAs except MIPS and Power, this was implemented as
inst->advancePC(). It works just fine to call this function all the
time, but the idea had originally been that for ISAs which could simply
advance the PC using the PC itself, they could save the virtual function
call. Since the only ISAs which could skip the call were MIPS and Power,
and neither is at the point where that level of performance tuning
matters, this function can be collapsed with little downside.
If this turns out to be a performance bottleneck in the future, the way
the PC is managed could be revisited to see if we can factor out this
trip to the instruction object in the first place.
Change-Id: I533d1ad316e5c936466c529b7f1238a9ab87bd1c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39335
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Alex Dutu <alexandru.dutu@amd.com>
These currently only hold the number of registers in a particular class,
but can be extended in the future to hold other information about each
class. The ISA class holds a vector of descriptors which other parts of
gem5 can retrieve to set up storage for each class, etc.
Currently, the RegClass enum is used to explicitly index into the vector
of descriptors to get information about a particular class. Once enough
information is stored in the descriptors, the other parts of gem5 should
be able to set up for each register class generically, and the ISAs will
be able to leave out or create new register classes without having to
set up global plumbing for it.
The more immediate benefit is that this should (mostly) parameterize
away the ISA register constants to break another TheISA style
dependency. Currently a global set of descriptors are set up in the
BaseISA class using the old TheISA constants, but it should be easy to
break those out and make the ISAs set up their own descriptors. That
will bring arch/registers.hh significantly closer to being eliminated.
Change-Id: I6d6d1256288f880391246b71045482a4a03c4198
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41733
Reviewed-by: Gabe Black <gabe.black@gmail.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The create() method on Params structs usually instantiate SimObjects
using a constructor which takes the Params struct as a parameter
somehow. There has been a lot of needless variation in how that was
done, making it annoying to pass Params down to base classes. Some of
the different forms were:
const Params &
Params &
Params *
const Params *
Params const*
This change goes through and fixes up every constructor and every
create() method to use the const Params & form. We use a reference
because the Params struct should never be null. We use const because
neither the create method nor the consuming object should modify the
record of the parameters as they came in from the config. That would
make consuming them not idempotent, and make it impossible to tell what
the actual simulation configuration was since it would change from any
user visible form (config script, config.ini, dot pdf output).
Change-Id: I77453cba52fdcfd5f4eec92dfb0bddb5a9945f31
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35938
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
There were three different StaticInst flags for memory barriers,
IsMemBarrier, IsReadBarrier, and IsWriteBarrier. IsReadBarrier was never
used, and IsMemBarrier was for both loads and stores, so a composite of
IsReadBarrier and IsWriteBarrier.
This change gets rid of IsMemBarrier and replaces by setting
IsReadBarrier and IsWriteBarrier at the same time. An isMemBarrier
accessor is left, but is now implemented by checking if both of the
other flags are set, and renamed to isFullMemBarrier to make it clear
that it's checking both for both types of barrier, not one or the other.
Change-Id: I702633a047f4777be4b180b42d62438ca69f52ea
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33743
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This was set by MIPS in two places, I think largely just because it was
available. This flag refers to IPRs which are an Alpha concept. In the
O3 CPU, IsIprAccess was used as a possible indicator to determine if an
instruction IsSerializeBefore, but we've already got a flag for that. In
the minor CPU, which hasn't been made to work with MIPS as far as I
know, it was used in a condition but not mentioned in the comment
alongside the condition. I think there it was added for the sake of
Alpha.
This change eliminates that flag and removes it from the O3 and minor
CPUs. In the MIPS ISA description, the instructions that were marked as
IsIprAccess have now been marked as IsSerializeBefore since, if there
was a real reason for them to be marked as IsIprAccess, it would have
been to get it them to work in O3, and there IsSerializeBefore gets
equivalent behavior.
Change-Id: Ia874cde12fa70b998d3e638458f13d69798d40b7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33739
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
This patch sets the faulting flag in atomic, timing, minor and o3 CPU
models.
It also fixes the minor/timing CPU models which were not respecting the
ExecFaulting flag. This is now checked before calling dump() on the
tracing object, to bring it in line with the other CPU models.
Change-Id: I9c7b64cc5605596eb7fcf25fdecaeac5c4b5e3d7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30135
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The ThreadContext can be used to access the cpu if needed, and is a
more representative interface to various pieces of state than the CPU
itself. Also convert some of the methods in Interupts to use the
locally stored ThreadContext pointer instead of taking one as an
argument. This makes calling those methods simpler and less error
prone.
Change-Id: I740bd99f92e54e052a618a4ae2927ea1c4ece193
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/28988
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The code block was relying on passed_predicate only (conditional
execution). This was not covering the case where the instruction
gets executed, but the predicate register is false. Using the inLSQ
variable is covering both cases and it makes more sense in terms of
readibility.
Change-Id: Ie1954f37968379a5bda9d0dc9f824a68304cc229
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/23280
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This was useful when transitioning away from the CPU based
comInstEventQueue, but now that objects backing the ThreadContexts have
access to the underlying comInstEventQueue and can manipulate it
directly, they don't need to do so through a generic interface.
Getting rid of this function narrows and simplifies the interface.
Change-Id: I202d466d266551675ef6792d38c658d8a8f1cb8b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/22113
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This queue was set up to allow triggering events based on the total
number of instructions executed at the system level, and was added in
a change which added a number of things to support McPAT. No code
checked into gem5 actually schedules an event on that queue, and no
code in McPAT (which seems to have gone dormant) either downloadable
from github or found in ext modify gem5 in a way that makes it use
the instEventQueue.
Also, the KVM CPU does not interact with the instEventQueue correctly.
While it does check the per-thread instruction event queue when
deciding how long to run, it does not check the instEventQueue. It will
poke it to run events when it stops for other reasons, but it may (and
likely will) have run beyond the point where it was supposed to stop.
Since this queue doesn't seem to actually be used for anything, isn't
being used properly in all cases anyway, and adds overhead to all the
CPU models, this change eliminates it.
Change-Id: I0e126df14788c37a6d58ca9e1bb2686b70e60d88
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/21783
Maintainer: Gabe Black <gabeblack@google.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Tiago Mück <tiago.muck@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This changeset adds support for partial (or masked) loads/stores, i.e.
loads/stores that can disable accesses to individual bytes within the
target address range. In addition, this changeset extends the code to
crack memory accesses across most CPU models (TimingSimpleCPU still
TBD), so that arbitrarily wide memory accesses are supported. These
changes are required for supporting ISAs with wide vectors.
Additional authors:
- Gabor Dozsa <gabor.dozsa@arm.com>
- Tiago Muck <tiago.muck@arm.com>
Change-Id: Ibad33541c258ad72925c0b1d5abc3e5e8bf92d92
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13518
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
This patch enables all 4 CPU models (AtomicSimpleCPU, TimingSimpleCPU,
MinorCPU and DerivO3CPU) to issue atomic memory (AMO) requests to memory
system.
Atomic memory instruction is treated as a special store instruction in
all CPU models.
In simple CPUs, an AMO request with an associated AtomicOpFunctor is
simply sent to L1 dcache.
In MinorCPU, an AMO request bypasses store buffer and waits for any
conflicting store request(s) currently in the store buffer to retire
before the AMO request is sent to the cache. AMO requests are not buffered
in the store buffer, so their effects appear immediately in the cache.
In DerivO3CPU, an AMO request is inserted in the store buffer so that it
is delivered to the cache only after all previous stores are issued to
the cache. Data forwarding between between an outstanding AMO in the
store buffer and a subsequent load is not allowed since the AMO request
does not hold valid data until it's executed in the cache.
This implementation assumes that a target ISA implementation must insert
enough memory fences as micro-ops around an atomic instruction to
enforce a correct order of memory instructions with respect to its
memory consistency model. Without extra memory fences, this implementation
can allow AMOs and other memory instructions that do not conflict
(i.e., not target the same address) to reorder.
This implementation also assumes that atomic instructions execute within
a cache line boundary since the cache for now is not able to execute an
operation on two different cache lines in one single step. Therefore,
ISAs like x86 that require multi-cache-line atomic instructions need to
either use a pair of locking load and unlocking store or change the
cache implementation to guarantee the atomicity of an atomic
instruction.
Change-Id: Ib8a7c81868ac05b98d73afc7d16eb88486f8cf9a
Reviewed-on: https://gem5-review.googlesource.com/c/8188
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
When a thread is suspended, all instructions after the suspension need
to be discarded since the thread will take a different execution stream
when it wakes up.
To do that, in MinorCPU, whenever a thread gets suspended, we change the
current execution stream by updating the current branch with
BranchData::SuspendThread reason.
Change-Id: I7cdcda22c1cf6e8ac8db8800b7d9ec052433fdf3
Reviewed-on: https://gem5-review.googlesource.com/c/9626
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@gmail.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
When a thread is activated by another thread calling a clone system
call, the child thread's context is initialized in the middle of the
clone system call and before the context is fully initialized.
Therefore, the child thread starts fetching an unitialized PC, which
could lead to a page fault.
This patch adds a pipeline wakeup event that is scheduled later in the
cycle when the thread is activated. This event ensures that the first
fetch only happens after the thread context is fully initialized
(e.g., in case of clone syscall, it is when the parent thread copies
its context over to the child thread).
When a thread first starts or wakes up, input queue to the Fetch2 stage
needs to be drained since the execution flow is likely to change and
previously fetched instructions in the queue may no longer be in the
correct flow. This patch dumps/drains all inputs in the input queue
of a thread context in the Fetch2 stage when the associated thread wakes
up.
Change-Id: Iad970638e435858b7289cd471158cc0afdbbb0e5
Reviewed-on: https://gem5-review.googlesource.com/c/8182
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Brandon Potter <Brandon.Potter@amd.com>
MinorCPU was not handling IsSquashAfter flagged instructions. The
behaviour was to force a branch (hence enforcing refetching) for
SerializeAfter instructions only. This has now been extended to
SquashAfter in order to correctly support ISB barrier instruction
behaviour.
Change-Id: Ie525b23350b0de121372d3b98b433e36b097d5c4
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/5702
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
The behavior of WFI is to cause minor to cease evaluating
pipeline logic until an interrupt is observed, however
a user may wish to drain the system while a core is sleeping
due to a WFI. This patch makes WFI drain. If an actual
drain occurs during a WFI, the CPU is already drained and will
immediately be ready for swapping, checkpointing, etc. This
should not negatively impact performance as WFI instructions
are 'stream-changing' (treated like unpredicted branches), so
all remaining instructions are wrong-path and will be squashed
rapidly.
Change-Id: I63833d5acb53d8dde78f9f0c9611de0ece385e45
This patch adds SMT support to the MinorCPU. Currently
RoundRobin or Random thread scheduling are supported.
Change-Id: I91faf39ff881af5918cca05051829fc6261f20e3
The MinorCPU would count bubbles in Execute::issue as part of
the num_insts_issued and so sometimes reach the instruction
issue limit incorrectly.
Fixed by checking for a bubble in one new place.
This patch fixes a recent issue with gcc 4.9 (and possibly more) being
convinced that indices outside the array bounds are used when
initialising the FUPool members.
The totalInstructions counter is only incremented when the whole instruction is
commited and not on every microop. It was incorrectly reset in atomic and
timing cpus.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>"
This patch takes a first step in tightening up how we use the data
pointer in write packets. A const getter is added for the pointer
itself (getConstPtr), and a number of member functions are also made
const accordingly. In a range of places throughout the memory system
the new member is used.
The patch also removes the unused isReadWrite function.
Fixes a bug where Minor drains in the midst of committing a
conditional store.
While committing a conditional store, lastCommitWasEndOfMacroop is true
(from the previous instruction) as we still haven't finished the conditional
store. If a drain occurs before the cache response, Minor would check just
lastCommitWasEndOfMacroop, which was true, and set drainState=DrainHaltFetch,
which increases the streamSeqNum. This caused the conditional store to be
squashed when the memory responded and it completed. However, to the memory
the store succeeded, while to the instruction sequence it never occurred.
In the case of an LLSC, the instruction sequence will replay the squashed
STREX, which will fail as the cache is no longer in LLSC. Then the
instruction sequence will loop back to a LDREX, which receives the updated
(incorrect) value.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
This changeset adds probe points that can be used to implement PMU
counters for CPU stats. The following probes are supported:
* BaseCPU::ppCycles / Cycles
* BaseCPU::ppRetiredInsts / RetiredInsts
* BaseCPU::ppRetiredLoads / RetiredLoads
* BaseCPU::ppRetiredStores / RetiredStores
* BaseCPU::ppRetiredBranches RetiredBranches
This patch contains a new CPU model named `Minor'. Minor models a four
stage in-order execution pipeline (fetch lines, decompose into
macroops, decompose macroops into microops, execute).
The model was developed to support the ARM ISA but should be fixable
to support all the remaining gem5 ISAs. It currently also works for
Alpha, and regressions are included for ARM and Alpha (including Linux
boot).
Documentation for the model can be found in src/doc/inside-minor.doxygen and
its internal operations can be visualised using the Minorview tool
utils/minorview.py.
Minor was designed to be fairly simple and not to engage in a lot of
instruction annotation. As such, it currently has very few gathered
stats and may lack other gem5 features.
Minor is faster than the o3 model. Sample results:
Benchmark | Stat host_seconds (s)
---------------+--------v--------v--------
(on ARM, opt) | simple | o3 | minor
| timing | timing | timing
---------------+--------+--------+--------
10.linux-boot | 169 | 1883 | 1075
10.mcf | 117 | 967 | 491
20.parser | 668 | 6315 | 3146
30.eon | 542 | 3413 | 2414
40.perlbmk | 2339 | 20905 | 11532
50.vortex | 122 | 1094 | 588
60.bzip2 | 2045 | 18061 | 9662
70.twolf | 207 | 2736 | 1036