Commit Graph

91 Commits

Author SHA1 Message Date
Gabe Black
08913caec2 arch,cpu,kern,sim: Eliminate the utility.hh switching header.
This header is no longer used. Remove the places where it's included,
and stop generating it. Also eliminate the now empty SPARC and Power
versions of the header.

Change-Id: I6ee66d39bc0218d1d9b9b7db3b350134ef03251d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39337
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
2021-03-22 21:01:58 +00:00
Daniel R. Carvalho
2922f763e1 misc: Fix coding style for struct's opening braces
The systemc dir was not included in this fix.

First it was identified that there were only occurrences
at 0, 1, 2 and 3 levels of indentation (and a single
occurrence of 2 and 3 spaces), using:

    grep -nrE --exclude-dir=systemc \
        "^ *struct [A-Za-z].* {$" src/

Then the following commands were run to replace:

<indent level>struct X ... {

by:

<indent level>struct X ...
<indent level>{

Level 0:
    grep -nrl --exclude-dir=systemc
        "^struct [A-Za-z].* {$" src/ | \
        xargs sed -Ei \
        's/^struct ([A-Za-z].*) \{$/struct \1\n\{/g'

Level 1:
    grep -nrl --exclude-dir=systemc \
        "^    struct [A-Za-z].* {$" src/ | \
        xargs sed -Ei \
        's/^    struct ([A-Za-z].*) \{$/    struct \1\n    \{/g'

and so on.

Change-Id: I362ef58c86912dabdd272c7debb8d25d587cd455
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39017
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Giacomo Travaglini <giacomo.travaglini@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-03-19 20:57:24 +00:00
Gabe Black
ce20b07351 arch-x86,cpu: Don't use aliases to hide TheISA::.
We need to gradually eliminate TheISA, and so it's helpful to know where
it's actually being used. This change stops hiding it behind using-s
and, in one case, a placeholder constant.

Change-Id: I391a3129256a9f7bd3b4002d0a46fb06b3068468
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39656
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
2021-01-27 00:40:30 +00:00
Gabe Black
91d83cc8a1 misc: Standardize the way create() constructs SimObjects.
The create() method on Params structs usually instantiate SimObjects
using a constructor which takes the Params struct as a parameter
somehow. There has been a lot of needless variation in how that was
done, making it annoying to pass Params down to base classes. Some of
the different forms were:

const Params &
Params &
Params *
const Params *
Params const*

This change goes through and fixes up every constructor and every
create() method to use the const Params & form. We use a reference
because the Params struct should never be null. We use const because
neither the create method nor the consuming object should modify the
record of the parameters as they came in from the config. That would
make consuming them not idempotent, and make it impossible to tell what
the actual simulation configuration was since it would change from any
user visible form (config script, config.ini, dot pdf output).

Change-Id: I77453cba52fdcfd5f4eec92dfb0bddb5a9945f31
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35938
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-10-14 12:06:44 +00:00
Gabe Black
539247a4c7 cpu: Remove the "SingleThreaded" fetch policy from the O3 CPU.
The fetch policy is only meaningful for SMT simulations. The
"SingleThreaded" value is a placeholder which is the default, and is
only supposed to be used in non-SMT simulations.

Rather than have this enum value and have special checks for it in
various places in O3, we can just eliminate it and set the default,
which is still only meaningful in SMT simulations, be an SMT fetch
policy.

The DerivO3CPUParams::create() function would forcefully change the
the fetch policy from "SingleThreaded" to "RoundRobin" anyway if there
were more than one thread, so that can be the actual default instead of
the shadow effective default.

Change-Id: I458fda00b5bcc246b0957e6c937eab0c5b4563c3
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35935
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-10-13 20:09:21 +00:00
eavivi
30dbd90783 cpu-o3: convert fetch to new style stats
Change-Id: Ib50a303570ac1dd45ff11a32a823f47a6c4c02cd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/33815
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-09-02 21:53:53 +00:00
Emily Brickey
1447017039 cpu: update port terminology
Change-Id: I891e7a74683c1775c75a62454fcfdecb7511b7e9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32312
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
2020-08-26 16:48:13 +00:00
Gabe Black
6687265fe2 cpu: Delete authors lists from the cpu directory.
Change-Id: Icfba8e23b5f6820a6ddefe1a50abbe5f8825b7b5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/25444
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
2020-02-17 21:51:23 +00:00
Gabe Black
d1fd4311b4 cpu: Remove alpha specialized code.
Change-Id: I770132af2f11ed232a100ab8bef942f17789ef36
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/24648
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
2020-02-13 23:25:03 +00:00
Gabe Black
b16e525e40 cpu: Move the instruction port into o3's fetch stage.
That's where it's used, and that avoids having to pass it around using
the top level getInstPort accessor.

Change-Id: I489a3f3239b3116292f3dcd78a3945fb468c6311
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20239
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
2019-08-28 02:14:53 +00:00
Tuan Ta
e5415671bd cpu: fixed how O3 CPU executes an exit system call
When a thread executed an exit syscall in SE mode, the thread context
was removed immediately in the same cycle, which left inflight squash
operations and trap event incomplete. The problem happened when a new
thread was assigned to the CPU later. The new thread started with some
incomplete transactions of the previous thread (e.g., squashing). This
problem could cause incorrect execution flow for the new thread (i.e.,
pc was not reset properly at the exit point), deadlock (i.e., some
stage-to-stage signals were not reset) and incorrect rename map between
logical and physical registers.

This patch adds a new state called 'Halting' to the thread context and
defers removing thread context from a CPU until a trap event initiated
by an exit syscall execution is processed. This patch also makes sure
that the removal of a thread context happens after all inflight
transactions of the to-be-removed thread in the pipeline complete.

Change-Id: If7ef1462fb8864e22b45371ee7ae67e2a5ad38b8
Reviewed-on: https://gem5-review.googlesource.com/c/8184
Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
2019-02-08 15:18:22 +00:00
Nikos Nikoleris
38339e0a7f cpu-o3: Make the smtFetchPolicy a Param.ScopedEnum
The smtFetchPolicy is a parameter in the o3 cpu that can have 5
different values. Previously this setting was done through a string
and a parser function would turn it into a c++ enum value. This
changeset turns the string into a python Param.ScopedEnum.

Change-Id: Iafb4b4b27587541185ea912e5ed581bce09695f5
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/15396
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
2019-01-17 11:09:08 +00:00
Rekai Gonzalez-Alberquilla
3bb49cb2b0 cpu,arch-arm: Initialise data members
The value that is not initialized has a bogus value that manifests when
using some debug-flags what makes the usage of tracediff a bit more
challenging.

In addition, while debugging with other techniques, it introduces the
problem of understanding if the value of a field is 'intended' or just
an effect of the lack of initialisation.

Change-Id: Ied88caa77479c6f1d5166d80d1a1a057503cb106
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/13125
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
2018-11-28 14:12:35 +00:00
Rekai Gonzalez-Alberquilla
0c50a0b4fe cpu: Fix the usage of const DynInstPtr
Summary: Usage of const DynInstPtr& when possible and introduction of
move operators to RefCountingPtr.

In many places, scoped references to dynamic instructions do a copy of
the DynInstPtr when a reference would do. This is detrimental to
performance. On top of that, in case there is a need for reference
tracking for debugging, the redundant copies make the process much more
painful than it already is.

Also, from the theoretical point of view, a function/method that
defines a convenience name to access an instruction should not be
considered an owner of the data, i.e., doing a copy and not a reference
is not justified.

On a related topic, C++11 introduces move semantics, and those are
useful when, for example, there is a class modelling a HW structure that
contains a list, and has a getHeadOfList function, to prevent doing a
copy to an internal variable -> update pointer, remove from the list ->
update pointer, return value making a copy to the assined variable ->
update pointer, destroy the returned value -> update pointer.

Change-Id: I3bb46c20ef23b6873b469fd22befb251ac44d2f6
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/13105
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
2018-11-16 10:39:03 +00:00
Giacomo Travaglini
f54020eb81 misc: Using smart pointers for memory Requests
This patch is changing the underlying type for RequestPtr from Request*
to shared_ptr<Request>. Having memory requests being managed by smart
pointers will simplify the code; it will also prevent memory leakage and
dangling pointers.

Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/10996
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
2018-06-11 16:55:30 +00:00
Gabe Black
b18f7078c3 cpu: Remove ExtMachInst typedefs from the O3 CPU model.
These typedefs aren't used, and they expose ISA specific types outside
the ISA implementations.

Change-Id: I64b9cec18d6f92765eebbdf8c8f1de15c0deba34
Reviewed-on: https://gem5-review.googlesource.com/9404
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabeblack@google.com>
2018-03-27 10:58:01 +00:00
Radhika Jagtap
eb19fc2976 probe: Add probe in Fetch, IEW, Rename and Commit
This patch adds probe points in Fetch, IEW, Rename and Commit stages as follows.

A probe point is added in the Fetch stage for probing when a fetch request is
sent. Notify is fired on the probe point when a request is sent succesfully in
the first attempt as well as on a retry attempt.

Probe points are added in the IEW stage when an instruction begins to execute
and when execution is complete. This points can be used for monitoring the
execution time of an instruction.

Probe points are added in the Rename stage to probe renaming of source and
destination registers and when there is squashing. These probe points can be
used to track register dependencies and remove when there is squashing.

A probe point for squashing is added in Commit to probe squashed instructions.
2015-12-07 16:42:15 -06:00
Andreas Hansson
f26a289295 mem: Split port retry for all different packet classes
This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.

The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.

The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.
2015-03-02 04:00:35 -05:00
Andreas Hansson
41fc8a573e arch: Pass faults by const reference where possible
This patch changes how faults are passed between methods in an attempt
to copy as few reference-counting pointer instances as possible. This
should avoid unecessary copies being created, contributing to the
increment/decrement of the reference counters.
2014-09-19 10:35:18 -04:00
Mitch Hayenga
4f26bedc18 cpu: Fix SMT scheduling issue with the O3 cpu
The o3 cpu could attempt to schedule inactive threads under round-robin SMT
mode.

This is because it maintained an independent priority list of threads from the
active thread list.  This priority list could be come stale once threads were
inactive, leading to the cpu trying to fetch/commit from inactive threads.


Additionally the fetch queue is now forcibly flushed of instrctuctions
from the de-scheduled thread.

Relevant output:

24557000: system.cpu: [tid:1]: Calling deactivate thread.
24557000: system.cpu: [tid:1]: Removing from active threads list

24557500: system.cpu:
FullO3CPU: Ticking main, FullO3CPU.
24557500: system.cpu.fetch: Running stage.
24557500: system.cpu.fetch: Attempting to fetch from [tid:1]
2014-09-03 07:42:37 -04:00
Mitch Hayenga
ecd5300971 cpu: Add a fetch queue to the o3 cpu
This patch adds a fetch queue that sits between fetch and decode to the
o3 cpu.  This effectively decouples fetch from decode stalls allowing it
to be more aggressive, running futher ahead in the instruction stream.
2014-09-03 07:42:35 -04:00
Mitch Hayenga
1716749c8c cpu: Fix o3 front-end pipeline interlock behavior
The o3 pipeline interlock/stall logic is incorrect.  o3 unnecessicarily stalled
fetch and decode due to later stages in the pipeline.  In general, a stage
should usually only consider if it is stalled by the adjacent, downstream stage.
Forcing stalls due to later stages creates and results in bubbles in the
pipeline.  Additionally, o3 stalled the entire frontend (fetch, decode, rename)
on a branch mispredict while the ROB is being serially walked to update the
RAT (robSquashing). Only should have stalled at rename.
2014-09-03 07:42:34 -04:00
Matt Horsnell
739c6df94e base: add support for probe points and common probes
The probe patch is motivated by the desire to move analytical and trace code
away from functional code. This is achieved by the probe interface which is
essentially a glorified observer model.

What this means to users:
* add a probe point and a "notify" call at the source of an "event"
* add an isolated module, that is being used to carry out *your* analysis (e.g. generate a trace)
* register that module as a probe listener
Note: an example is given for reference in src/cpu/o3/simple_trace.[hh|cc] and src/cpu/SimpleTrace.py

What is happening under the hood:
* every SimObject maintains has a ProbeManager.
* during initialization (src/python/m5/simulate.py) first regProbePoints and
  the regProbeListeners is called on each SimObject.  this hooks up the probe
  point notify calls with the listeners.

FAQs:
Why did you develop probe points:
* to remove trace, stats gathering, analytical code out of the functional code.
* the belief that probes could be generically useful.

What is a probe point:
* a probe point is used to notify upon a given event (e.g. cpu commits an instruction)

What is a probe listener:
* a class that handles whatever the user wishes to do when they are notified
  about an event.

What can be passed on notify:
* probe points are templates, and so the user can generate probes that pass any
  type of argument (by const reference) to a listener.

What relationships can be generated (1:1, 1:N, N:M etc):
* there isn't a restriction. You can hook probe points and listeners up in a
  1:1, 1:N, N:M relationship. They become useful when a number of modules
  listen to the same probe points. The idea being that you can add a small
  number of probes into the source code and develop a larger number of useful
  analysis modules that use information passed by the probes.

Can you give examples:
* adding a probe point to the cpu's commit method allows you to build a trace
  module (outputting assembler), you could re-use this to gather instruction
  distribution (arithmetic, load/store, conditional, control flow) stats.

Why is the probe interface currently restricted to passing a const reference:
* the desire, initially at least, is to allow an interface to observe
  functionality, but not to change functionality.
* of course this can be subverted by const-casting.

What is the performance impact of adding probes:
* when nothing is actively listening to the probes they should have a
  relatively minor impact. Profiling has suggested even with a large number of
  probes (60) the impact of them (when not active) is very minimal (<1%).
2014-01-24 15:29:30 -06:00
Anthony Gutierrez
8a53da22c2 cpu: allow the fetch buffer to be smaller than a cache line
the current implementation of the fetch buffer in the o3 cpu
is only allowed to be the size of a cache line. some
architectures, e.g., ARM, have fetch buffers smaller than a cache
line, see slide 22 at:
http://www.arm.com/files/pdf/at-exploring_the_design_of_the_cortex-a15.pdf

this patch allows the fetch buffer to be set to values smaller
than a cache line.
2013-11-15 13:21:15 -05:00
Andreas Hansson
d4273cc9a6 mem: Set the cache line size on a system level
This patch removes the notion of a peer block size and instead sets
the cache line size on the system level.

Previously the size was set per cache, and communicated through the
interconnect. There were plenty checks to ensure that everyone had the
same size specified, and these checks are now removed. Another benefit
that is not yet harnessed is that the cache line size is now known at
construction time, rather than after the port binding. Hence, the
block size can be locally stored and does not have to be queried every
time it is used.

A follow-on patch updates the configuration scripts accordingly.
2013-07-18 08:31:16 -04:00
Nilay Vaish ext:(%2C%20Timothy%20Jones%20%3Ctimothy.jones%40cl.cam.ac.uk%3E)
dbeabedaf0 branch predictor: move out of o3 and inorder cpus
This patch moves the branch predictor files in the o3 and inorder directories
to src/cpu/pred. This allows sharing the branch predictor across different
cpu models.

This patch was originally posted by Timothy Jones in July 2010
but never made it to the repository.

--HG--
rename : src/cpu/o3/bpred_unit.cc => src/cpu/pred/bpred_unit.cc
rename : src/cpu/o3/bpred_unit.hh => src/cpu/pred/bpred_unit.hh
rename : src/cpu/o3/bpred_unit_impl.hh => src/cpu/pred/bpred_unit_impl.hh
rename : src/cpu/o3/sat_counter.hh => src/cpu/pred/sat_counter.hh
2013-01-24 12:28:51 -06:00
Andreas Sandberg
1814a85a05 cpu: Rewrite O3 draining to avoid stopping in microcode
Previously, the O3 CPU could stop in the middle of a microcode
sequence. This patch makes sure that the pipeline stops when it has
committed a normal instruction or exited from a microcode
sequence. Additionally, it makes sure that the pipeline has no
instructions in flight when it is drained, which should make draining
more robust.

Draining is controlled in the commit stage, which checks if the next
PC after a committed instruction is in microcode. If this isn't the
case, it requests a squash of all instructions after that the
instruction that just committed and immediately signals a drain stall
to the fetch stage. The CPU then continues to execute until the
pipeline and all associated buffers are empty.
2013-01-07 13:05:46 -05:00
Andreas Sandberg
6daada2701 cpu: Initialize the O3 pipeline from startup()
The entire O3 pipeline used to be initialized from init(), which is
called before initState() or unserialize(). This causes the pipeline
to be initialized from an incorrect thread context. This doesn't
currently lead to correctness problems as instructions fetched from
the incorrect start PC will be squashed a few cycles after
initialization.

This patch will affect the regressions since the O3 CPU now issues its
first instruction fetch to the correct PC instead of 0x0.
2013-01-07 13:05:44 -05:00
Andreas Hansson
287ea1a081 Param: Transition to Cycles for relevant parameters
This patch is a first step to using Cycles as a parameter type. The
main affected modules are the CPUs and the Ruby caches. There are
definitely plenty more places that are affected, but this patch serves
as a starting point to making the transition.

An important part of this patch is to actually enable parameters to be
specified as Param.Cycles which involves some changes to params.py.
2012-09-07 12:34:38 -04:00
Gabe Black
0cba96ba6a CPU: Merge the predecoder and decoder.
These classes are always used together, and merging them will give the ISAs
more flexibility in how they cache things and manage the process.

--HG--
rename : src/arch/x86/predecoder_tables.cc => src/arch/x86/decoder_tables.cc
2012-05-26 13:44:46 -07:00
Gabe Black
82a228bd43 Decode: Make the Decoder class defined per ISA.
--HG--
rename : src/cpu/decode.cc => src/arch/generic/decoder.cc
rename : src/cpu/decode.hh => src/arch/generic/decoder.hh
2012-05-25 00:53:37 -07:00
Koan-Sin Tan
7d4f187700 clang: Enable compiling gem5 using clang 2.9 and 3.0
This patch adds the necessary flags to the SConstruct and SConscript
files for compiling using clang 2.9 and later (on Ubuntu et al and OSX
XCode 4.2), and also cleans up a bunch of compiler warnings found by
clang. Most of the warnings are related to hidden virtual functions,
comparisons with unsigneds >= 0, and if-statements with empty
bodies. A number of mismatches between struct and class are also
fixed. clang 2.8 is not working as it has problems with class names
that occur in multiple namespaces (e.g. Statistics in
kernel_stats.hh).

clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which
causes confusion between the container std::set and the function
Packet::set, and this is currently addressed by not including the
entire namespace std, but rather selecting e.g. "using std::vector" in
the appropriate places.
2012-01-31 12:05:52 -05:00
Andreas Hansson
b3f930c884 CPU: Moving towards a more general port across CPU models
This patch performs minimal changes to move the instruction and data
ports from specialised subclasses to the base CPU (to the largest
degree possible). Ultimately it servers to make the CPU(s) have a
well-defined interface to the memory sub-system.
2012-01-17 12:55:08 -06:00
Gabe Black
b7b545bc38 Decode: Pull instruction decoding out of the StaticInst class into its own.
This change pulls the instruction decoding machinery (including caches) out of
the StaticInst class and puts it into its own class. This has a few intrinsic
benefits. First, the StaticInst code, which has gotten to be quite large, gets
simpler. Second, the code that handles decode caching is now separated out
into its own component and can be looked at in isolation, making it easier to
understand. I took the opportunity to restructure the code a bit which will
hopefully also help.

Beyond that, this change also lays some ground work for each ISA to have its
own, potentially stateful decode object. We'd be able to include less
contextualizing information in the ExtMachInst objects since that context
would be applied at the decoder. Also, the decoder could "know" ahead of time
that all the instructions it's going to see are going to be, for instance, 64
bit mode, and it will have one less thing to check when it decodes them.
Because the decode caching mechanism has been separated out, it's now possible
to have multiple caches which correspond to different types of decoding
context. Having one cache for each element of the cross product of different
configurations may become prohibitive, so it may be desirable to clear out the
cache when relatively static state changes and not to have one for each
setting.

Because the decode function is no longer universally accessible as a static
member of the StaticInst class, a new function was added to the ThreadContexts
that returns the applicable decode object.
2011-09-09 02:30:01 -07:00
Gabe Black
0e6dc00497 O3: When squashing, restore the macroop that should be used for fetching. 2011-08-14 17:41:34 -07:00
Geoffrey Blake
c7e7b89058 O3: Fix up pipelining icache accesses in fetch stage to function properly
Fixed up the patch from Yasuko Watanabe that enabled pipelining of fetch accessess to
icache to work with recent changes to main repository.
Also added in ability for fetch stage to delay issuing the fault carrying
nop when a pipeline fetch causes a fault and no fetch bandwidth is available
until the next cycle.
2011-07-10 12:56:08 -05:00
Ali Saidi
60579e8d74 O3: Make sure fetch doesn't go off into the weeds during speculation. 2011-07-10 12:56:08 -05:00
Geoffrey Blake
6dd996aabb O3: Fix issue with interrupts/faults occuring in the middle of a macro-op
This patch fixes two problems with the O3 cpu model. The first is an issue
with an instruction fetch causing a fault on the next address while the
current macro-op is being issued. This happens when the micro-ops exceed
the fetch bandwdith and then on the next cycle the fetch stage attempts
to issue a request to the next line while it still has micro-ops to issue
if the next line faults a fault is attached to a micro-op in the currently
executing macro-op rather than a "nop" from the next instruction block.
This leads to an instruction incorrectly faulting when on fetch when
it had no reason to fault.

A similar problem occurs with interrupts. When an interrupt occurs the
fetch stage nominally stops issuing instructions immediately. This is incorrect
in the case of a macro-op as the current location might not be interruptable.
2011-05-23 10:40:18 -05:00
Nathan Binkert
39a055645f includes: sort all includes 2011-04-15 10:44:06 -07:00
Ali Saidi
799c3da8d0 O3: Send instruction back to fetch on squash to seed predecoder correctly. 2011-03-17 19:20:19 -05:00
Ali Saidi
68bd80794c O3: Fix bug when a squash occurs right before TLB miss returns.
In this case we need to throw away the TLB miss, not assume it was the
one we were waiting for.
2011-02-23 15:10:49 -06:00
Giacomo Gabrielli
e2507407b1 O3: Enhance data address translation by supporting hardware page table walkers.
Some ISAs (like ARM) relies on hardware page table walkers.  For those ISAs,
when a TLB miss occurs, initiateTranslation() can return with NoFault but with
the translation unfinished.

Instructions experiencing a delayed translation due to a hardware page table
walk are deferred until the translation completes and kept into the IQ.  In
order to keep track of them, the IQ has been augmented with a queue of the
outstanding delayed memory instructions.  When their translation completes,
instructions are re-executed (only their initiateAccess() was already
executed; their DTB translation is now skipped).  The IEW stage has been
modified to support such a 2-pass execution.
2011-02-11 18:29:35 -06:00
Ali Saidi
ee9a331fe5 O3: Support timing translations for O3 CPU fetch. 2011-01-18 16:30:02 -06:00
Min Kyu Jeong
96375409ea O3: Fixes fetch deadlock when the interrupt clears before CPU handles it.
When this condition occurs the cpu should restart the fetch stage to fetch from
the original execution path. Fault handling in the commit stage is cleaned up a
little bit so the control flow is simplier. Finally, if an instruction is being
used to carry a fault it isn't executed, so the fault propagates appropriately.
2011-01-18 16:30:01 -06:00
Steve Reinhardt
89cf3f6e85 Move sched_list.hh and timebuf.hh from src/base to src/cpu.
These files really aren't general enough to belong in src/base.
This patch doesn't reorder include lines, leaving them unsorted
in many cases, but Nate's magic script will fix that up shortly.

--HG--
rename : src/base/sched_list.hh => src/cpu/sched_list.hh
rename : src/base/timebuf.hh => src/cpu/timebuf.hh
2011-01-03 14:35:47 -08:00
Gabe Black
8b9b85e92c O3: Make O3 support variably lengthed instructions. 2010-11-15 19:37:03 -08:00
Gabe Black
6f4bd2c1da ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.
This change is a low level and pervasive reorganization of how PCs are managed
in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about,
the PC and the NPC, and the lsb of the PC signaled whether or not you were in
PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
micropc, x86 and ARM introduced variable length instruction sets, and ARM
started to keep track of mode bits in the PC. Each CPU model handled PCs in
its own custom way that needed to be updated individually to handle the new
dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
the complexity could be hidden in the ISA at the ISA implementation's expense.
Areas like the branch predictor hadn't been updated to handle branch delay
slots or micropcs, and it turns out that had introduced a significant (10s of
percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
perpetuate the problem by reworking O3 again to handle the PC features needed
by x86, this change was introduced to rework PC handling in a more modular,
transparent, and hopefully efficient way.


PC type:

Rather than having the superset of all possible elements of PC state declared
in each of the CPU models, each ISA defines its own PCState type which has
exactly the elements it needs. A cross product of canned PCState classes are
defined in the new "generic" ISA directory for ISAs with/without delay slots
and microcode. These are either typedef-ed or subclassed by each ISA. To read
or write this structure through a *Context, you use the new pcState() accessor
which reads or writes depending on whether it has an argument. If you just
want the address of the current or next instruction or the current micro PC,
you can get those through read-only accessors on either the PCState type or
the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
move away from readPC. That name is ambiguous since it's not clear whether or
not it should be the actual address to fetch from, or if it should have extra
bits in it like the PAL mode bit. Each class is free to define its own
functions to get at whatever values it needs however it needs to to be used in
ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
PC and into a separate field like ARM.

These types can be reset to a particular pc (where npc = pc +
sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
appropriate), printed, serialized, and compared. There is a branching()
function which encapsulates code in the CPU models that checked if an
instruction branched or not. Exactly what that means in the context of branch
delay slots which can skip an instruction when not taken is ambiguous, and
ideally this function and its uses can be eliminated. PCStates also generally
know how to advance themselves in various ways depending on if they point at
an instruction, a microop, or the last microop of a macroop. More on that
later.

Ideally, accessing all the PCs at once when setting them will improve
performance of M5 even though more data needs to be moved around. This is
because often all the PCs need to be manipulated together, and by getting them
all at once you avoid multiple function calls. Also, the PCs of a particular
thread will have spatial locality in the cache. Previously they were grouped
by element in arrays which spread out accesses.


Advancing the PC:

The PCs were previously managed entirely by the CPU which had to know about PC
semantics, try to figure out which dimension to increment the PC in, what to
set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
with the PC type itself. Because most of the information about how to
increment the PC (mainly what type of instruction it refers to) is contained
in the instruction object, a new advancePC virtual function was added to the
StaticInst class. Subclasses provide an implementation that moves around the
right element of the PC with a minimal amount of decision making. In ISAs like
Alpha, the instructions always simply assign NPC to PC without having to worry
about micropcs, nnpcs, etc. The added cost of a virtual function call should
be outweighed by not having to figure out as much about what to do with the
PCs and mucking around with the extra elements.

One drawback of making the StaticInsts advance the PC is that you have to
actually have one to advance the PC. This would, superficially, seem to
require decoding an instruction before fetch could advance. This is, as far as
I can tell, realistic. fetch would advance through memory addresses, not PCs,
perhaps predicting new memory addresses using existing ones. More
sophisticated decisions about control flow would be made later on, after the
instruction was decoded, and handed back to fetch. If branching needs to
happen, some amount of decoding needs to happen to see that it's a branch,
what the target is, etc. This could get a little more complicated if that gets
done by the predecoder, but I'm choosing to ignore that for now.


Variable length instructions:

To handle variable length instructions in x86 and ARM, the predecoder now
takes in the current PC by reference to the getExtMachInst function. It can
modify the PC however it needs to (by setting NPC to be the PC + instruction
length, for instance). This could be improved since the CPU doesn't know if
the PC was modified and always has to write it back.


ISA parser:

To support the new API, all PC related operand types were removed from the
parser and replaced with a PCState type. There are two warts on this
implementation. First, as with all the other operand types, the PCState still
has to have a valid operand type even though it doesn't use it. Second, using
syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
syntax for operand type overriding, and the parser can't figure out if you're
reading or writing. Instructions that use the PCS operand (which I've
consistently called it) need to first read it into a local variable,
manipulate it, and then write it back out.


Return address stack:

The return address stack needed a little extra help because, in the presence
of branch delay slots, it has to merge together elements of the return PC and
the call PC. To handle that, a buildRetPC utility function was added. There
are basically only two versions in all the ISAs, but it didn't seem short
enough to put into the generic ISA directory. Also, the branch predictor code
in O3 and InOrder were adjusted so that they always store the PC of the actual
call instruction in the RAS, not the next PC. If the call instruction is a
microop, the next PC refers to the next microop in the same macroop which is
probably not desirable. The buildRetPC function advances the PC intelligently
to the next macroop (in an ISA specific way) so that that case works.


Change in stats:

There were no change in stats except in MIPS and SPARC in the O3 model. MIPS
runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could
likely be improved further by setting call/return instruction flags and taking
advantage of the RAS.


TODO:

Add != operators to the PCState classes, defined trivially to be !(a==b).
Smooth out places where PCs are split apart, passed around, and put back
together later. I think this might happen in SPARC's fault code. Add ISA
specific constructors that allow setting PC elements without calling a bunch
of accessors. Try to eliminate the need for the branching() function. Factor
out Alpha's PAL mode pc bit into a separate flag field, and eliminate places
where it's blindly masked out or tested in the PC.
2010-10-31 00:07:20 -07:00
Gabe Black
943c171480 ISA: Get rid of old, unused utility functions cluttering up the ISAs. 2010-08-23 16:14:20 -07:00
Nathan Binkert
d9f39c8ce7 arch: nuke arch/isa_specific.hh and move stuff to generated config/the_isa.hh 2009-09-23 08:34:21 -07:00
Nathan Binkert
47877cf2db types: add a type for thread IDs and try to use it everywhere 2009-05-26 09:23:13 -07:00