The BaseCPU type had been specializing itself based on the value of
TARGET_ISA, which is not compatible with building more than one ISA at a
time.
This change refactors the CPU models so that the BaseCPU is more
general, and the ISA specific components are added to the CPU when the
CPU types are fully specialized. For instance, The AtomicSimpleCPU has a
version called X86AtomicSimpleCPU which installs the X86 specific
aspects of the CPU.
This specialization is done in three ways.
1. The mmu parameter is assigned an instance of the architecture
specific MMU type. This provides a reasonable default, but also avoids
having having to use the ISA specific type when the parameter is
created.
2. The ISA specific types are made available as class attributes, and
the utility functions (including __init__!) in the BaseCPU class can
refer to them to get the types they need to set up the CPU at run time.
Because SimObjects have strange, unhelpful semantics as far as assigning
to their attributes, these types need to be set up in a non-SimObject
class, which is then brought in as a base of the actual SimObject type.
Because the metaclass of this other type is just "type", things work
like you would expect. The SimObject doesn't do any special processing
of base classes if they aren't also SimObjects, so these attributes
survive and are accessible using normal lookup in the BaseCPU class.
3. There are some methods like addCheckerCPU and properties like
needsTSO which have ISA specific values or behaviors. These are set in
the ISA specific subclass, where they are inherently specific to an ISA
and don't need to check TARGET_ISA.
Also, the DummyChecker which was set up for the BaseSimpleCPU which
doesn't actually do anything in either C++ or python was not carried
forward. The CPU type still exists, but it isn't installed in the
simple CPUs.
To provide backward compatibility, each ISA implements a .py file which
matches the original .py for a CPU, and the original is renamed with a
Base prefix. The ISA specific version creates an alias with the old CPU
name which maps to the ISA specific type. This way, old scripts which
refer to, for example, AtomicSimpleCPU, will get the X86AtomicSimpleCPU
if the x86 version was compiled in, the ArmAtomicSimpleCPU on arm, etc.
Unfortunately, because of how tags on PySource and by extension SimObjects
are implemented right now, if you set the tags on two SimObjects or
PySources which have the same module path, the later will overwrite the
former whether or not they both would be included. There are some
changes in review which would revamp this and make it work like you
would expect, without this central bookkeeping which has the conflict.
Since I can't use that here, I fell back to checking TARGET_ISA to
decide whether to tell SCons about those files at all.
In the long term, this mechanism should be revamped so that these
compatibility types are only available if there is exactly one ISA
compiled into gem5. After the configs have been updated and no longer
assume they can use AtomicSimpleCPU in all cases, then these types can
be deleted.
Also, because ISAs can now either provide subclasses for a CPU or not,
the CPU_MODELS variable has been removed, meaning the non-ISA
specialized versions of those CPU models will always be included in
gem5, except when building the NULL ISA.
In the future, a more granular config mechanism will hopefully be
implemented for *all* of gem5 and not just the CPUs, and these can be
conditional again in case you only need certain models, and want to
reduce build time or binary size by excluding the others.
Change-Id: I02fc3f645c551678ede46268bbea9f66c3f6c74b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/52490
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The LSQSenderState that was attached to Request was not useful.
All the fields were either a duplicate of information in the
LSQRequest or totally unused.
The LSQRequest class now inherits from Packet::SenderState and is
attached to the Packet that are sent to memory. We do not need
anymore the indirection Packet->SenderState->LSQRequest.
This helps making the code clearer as it was sometimes hard to
follow the difference between what the LSQRequest and
LSQSenserState was doing
(ex: number of outstanding requests in the memory).
Change-Id: I5b21e007e6d183c6aa79c27c1787ca56dcbc3fb0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50733
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
Commit c417b76 changed the behaviour of addRequest(),
but did not update documentation or the HTM-related logic that used it.
Updates documentation for addRequest() in light of c417b76,
refactors request class to be idiomatic and use assigned byteEnable,
made HTM cmds pass in a correct byteEnable.
Change-Id: I7aa8c127df896e81caf915fbfea93e7b4bcc53b7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50147
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Apply the gem5 namespace to the codebase.
Some anonymous namespaces could theoretically be removed,
but since this change's main goal was to keep conflicts
at a minimum, it was decided not to modify much the
general shape of the files.
A few missing comments of the form "// namespace X" that
occurred before the newly added "} // namespace gem5"
have been added for consistency.
std out should not be included in the gem5 namespace, so
they weren't.
ProtoMessage has not been included in the gem5 namespace,
since I'm not familiar with how proto works.
Regarding the SystemC files, although they belong to gem5,
they actually perform integration between gem5 and SystemC;
therefore, it deserved its own separate namespace.
Files that are automatically generated have been included
in the gem5 namespace.
The .isa files currently are limited to a single namespace.
This limitation should be later removed to make it easier
to accomodate a better API.
Regarding the files in util, gem5:: was prepended where
suitable. Notice that this patch was tested as much as
possible given that most of these were already not
previously compiling.
Change-Id: Ia53d404ec79c46edaa98f654e23bc3b0e179fe2d
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46323
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Aside from basic code editting, this also moves some methods from the
.hh files to the _impl.hh files. It also changes the Checker CPU
template to take the DynInstPtr type directly instead of through Impl
since that was the only type it used anyway. Finally it sets up a header
file which predeclares the O3DynInstPtr and O3DynInstConstPtr types so
they can be used without having to also include the BaseO3DynInst class
definition to break circular dependencies.
Change-Id: I5ca6af38ec13e6e820abcdb3748412e4f7fc1c78
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42101
Reviewed-by: Nathanael Premillieu <nathanael.premillieu@huawei.com>
Maintainer: Gabe Black <gabe.black@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The create() method on Params structs usually instantiate SimObjects
using a constructor which takes the Params struct as a parameter
somehow. There has been a lot of needless variation in how that was
done, making it annoying to pass Params down to base classes. Some of
the different forms were:
const Params &
Params &
Params *
const Params *
Params const*
This change goes through and fixes up every constructor and every
create() method to use the const Params & form. We use a reference
because the Params struct should never be null. We use const because
neither the create method nor the consuming object should modify the
record of the parameters as they came in from the config. That would
make consuming them not idempotent, and make it impossible to tell what
the actual simulation configuration was since it would change from any
user visible form (config script, config.ini, dot pdf output).
Change-Id: I77453cba52fdcfd5f4eec92dfb0bddb5a9945f31
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/35938
Reviewed-by: Gabe Black <gabeblack@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The byteEnable variable is used for masking bytes in a memory request.
The default behaviour is to provide from the ExecContext to the CPU
(and then to the LSQ) an empty vector, which is the same as providing
a vector where every element is true.
Such vectors basically mean: do not mask any byte in the memory request.
This behaviour adds more complexity to the downstream LSQs, which now
have to distinguish between an empty and non-empty byteEnable.
This patch is simplifying things by transforming an empty vector into
a all true one, making sure the CPUs are always receiving a non empty
byteEnable.
JIRA: https://gem5.atlassian.net/browse/GEM5-196
Change-Id: I1d1cecd86ed64c53a314ed700f28810d76c195c3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/23285
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Tested-by: kokoro <noreply+kokoro@google.com>
The new local access mechanism installs a callback in the request which
implements what the mmapped IPR was doing. That avoids having to have
stubs in ISAs that don't have mmapped IPRs, avoids having to encode
what to do to communicate from the TLB and the mmapped IPR functions,
and gets rid of another global ISA interface function and header files.
Jira Issue: https://gem5.atlassian.net/browse/GEM5-187
Change-Id: I772c2ae2ca3830a4486919ce9804560c0f2d596a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/23188
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
Maintainer: Gabe Black <gabeblack@google.com>
Tested-by: kokoro <noreply+kokoro@google.com>
This change is based on modify the way we move the AtomicOpFunctor*
through gem5 in order to mantain proper ownership of the object and
ensuring its destruction when it is no longer used.
Doing that we fix at the same time a memory leak in Request.hh
where we were assigning a new AtomicOpFunctor* without destroying the
previous one.
This change creates a new type AtomicOpFunctor_ptr as a
std::unique_ptr<AtomicOpFunctor> and move its ownership as needed. Except
for its only usage when AtomicOpFunc() is called.
Change-Id: Ic516f9d8217cb1ae1f0a19500e5da0336da9fd4f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20919
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
The assert() in the LSQ writeback() only allowed ReExec faults.
However, a SplitRequest which completed the translation in
PartialFault state (i.e. any but the very first cacheline
translation failed) may end up here. The assert() condition is
extended accordingly.
The patch also removes the superfluous/unused Complete/Squashed
states from the LSQ request. (The completion of the request is
recorded in the flags still.)
Change-Id: Ie575f4d3b4d5295585828ad8c7d3f4c7c1fe15d0
Signed-off-by: Gabor Dozsa <gabor.dozsa@arm.com>
Reviewed-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19174
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
This changeset adds support for partial (or masked) loads/stores, i.e.
loads/stores that can disable accesses to individual bytes within the
target address range. In addition, this changeset extends the code to
crack memory accesses across most CPU models (TimingSimpleCPU still
TBD), so that arbitrarily wide memory accesses are supported. These
changes are required for supporting ISAs with wide vectors.
Additional authors:
- Gabor Dozsa <gabor.dozsa@arm.com>
- Tiago Muck <tiago.muck@arm.com>
Change-Id: Ibad33541c258ad72925c0b1d5abc3e5e8bf92d92
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/13518
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
This patch enables all 4 CPU models (AtomicSimpleCPU, TimingSimpleCPU,
MinorCPU and DerivO3CPU) to issue atomic memory (AMO) requests to memory
system.
Atomic memory instruction is treated as a special store instruction in
all CPU models.
In simple CPUs, an AMO request with an associated AtomicOpFunctor is
simply sent to L1 dcache.
In MinorCPU, an AMO request bypasses store buffer and waits for any
conflicting store request(s) currently in the store buffer to retire
before the AMO request is sent to the cache. AMO requests are not buffered
in the store buffer, so their effects appear immediately in the cache.
In DerivO3CPU, an AMO request is inserted in the store buffer so that it
is delivered to the cache only after all previous stores are issued to
the cache. Data forwarding between between an outstanding AMO in the
store buffer and a subsequent load is not allowed since the AMO request
does not hold valid data until it's executed in the cache.
This implementation assumes that a target ISA implementation must insert
enough memory fences as micro-ops around an atomic instruction to
enforce a correct order of memory instructions with respect to its
memory consistency model. Without extra memory fences, this implementation
can allow AMOs and other memory instructions that do not conflict
(i.e., not target the same address) to reorder.
This implementation also assumes that atomic instructions execute within
a cache line boundary since the cache for now is not able to execute an
operation on two different cache lines in one single step. Therefore,
ISAs like x86 that require multi-cache-line atomic instructions need to
either use a pair of locking load and unlocking store or change the
cache implementation to guarantee the atomicity of an atomic
instruction.
Change-Id: Ib8a7c81868ac05b98d73afc7d16eb88486f8cf9a
Reviewed-on: https://gem5-review.googlesource.com/c/8188
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
This patch does a large modification of the LSQ in the O3 model. The
main goal of the patch is to remove the 'an operation can be served with
one or two memory requests' assumption that is present in the LSQ
and the instruction with the req, reqLow, reqHigh triplet, and
generalising it to operations that can be addressed with one request,
and operations that require many requests, embodied in the
SingleDataRequest and the SplitDataRequest.
This modification has been done mimicking the minor model to an extent,
shifting the responsibilities of dealing with VtoP translation and
tracking the status and resources from the DynInst to the LSQ via the
LSQRequest. The LSQRequest models the information concerning the
operation, handles the creation of fragments for translation and request
as well as assembling/splitting the data accordingly.
With this modifications, the implementation of vector ISAs, particularly
on the memory side, become more rich, as the new model permits a
dissociation of the ISA characteristics as vector length, from the
microarchitectural characteristics that govern how contiguous loads are
executing, allowing exploration of different LSQ to DL1 bus widths to
understand the tradeoffs in complexity and performance.
Part of the complexities introduced stem from the fact that gem5 keeps a
large amount of metadata regarding, in particular, memory operations,
thus, when an instruction is squashed while some operation as TLB lookup
or cache access is ongoing, when the relevant structure communicates to
the LSQ that the operation is over, it tries to access some pieces of
data that should have died when the instruction is squashed, leading to
asserts, panics, or memory corruption. To ensure the correct behaviour,
the LSQRequest rely on assesing who is their owner, and self-destroying
if they detect their owner is done with the request, and there will be
no subsequent action. For example, in the case of an instruction
squashed whal the TLB is doing a walk to serve the translation, when the
translation is served by the TLB, the LSQRequest detects that the
instruction was squashed, and as the translation is done, no one else
expect to access its information, and therefore, it self-destructs.
Having destroyed the LSQRequest earlier, would lead to wrong behaviour
as the TLB walk may access some fields of it.
Additional authors:
- Gabor Dozsa <gabor.dozsa@arm.com>
Change-Id: I9578a1a3f6b899c390cdd886856a24db68ff7d0c
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/13516
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
The smtLSQPolicy is a parameter in the o3 cpu that can have 3
different values. Previously this setting was done through a string
and a parser function would turn it into a c++ enum value. This
changeset turns the string into a python Param.ScopedEnum.
Change-Id: I82041b88bd914c5dc660058d9e3998e3114e7c35
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/15397
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
This patch changes two members from being raw pointers to being STL
containers. The reason behind, other than cleanlyness and arguable OO
best practices is that containers have more intronspections capabilities
than naked pointers do, as the size is known.
Using STL containers adds little overhead and eases the automation of
process during debugging (gdb).
Change-Id: I4d9d3eedafa8b5e50ac512ea93b458a4200229f2
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/13126
Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Summary: Usage of const DynInstPtr& when possible and introduction of
move operators to RefCountingPtr.
In many places, scoped references to dynamic instructions do a copy of
the DynInstPtr when a reference would do. This is detrimental to
performance. On top of that, in case there is a need for reference
tracking for debugging, the redundant copies make the process much more
painful than it already is.
Also, from the theoretical point of view, a function/method that
defines a convenience name to access an instruction should not be
considered an owner of the data, i.e., doing a copy and not a reference
is not justified.
On a related topic, C++11 introduces move semantics, and those are
useful when, for example, there is a class modelling a HW structure that
contains a list, and has a getHeadOfList function, to prevent doing a
copy to an internal variable -> update pointer, remove from the list ->
update pointer, return value making a copy to the assined variable ->
update pointer, destroy the returned value -> update pointer.
Change-Id: I3bb46c20ef23b6873b469fd22befb251ac44d2f6
Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/13105
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
This patch is changing the underlying type for RequestPtr from Request*
to shared_ptr<Request>. Having memory requests being managed by smart
pointers will simplify the code; it will also prevent memory leakage and
dangling pointers.
Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/10996
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.
This is a re-spin of 20264eb after the revert (bd1c6789) and includes
some fixes of that commit.
The following patches had unexpected interactions with the current
upstream code and have been reverted for now:
e07fd01651f3: power: Add support for power models
831c7f2f9e39: power: Low-power idle power state for idle CPUs
4f749e00b667: power: Add power states to ClockedObject
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
--HG--
extra : amend_source : 0b6fb073c6bbc24be533ec431eb51fbf1b269508
In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.
The read() function merely initiates a memory read operation; the
data doesn't arrive until the access completes and a response packet
is received from the memory system. Thus there's no need to provide
a data pointer; its existence is historical.
Getting this pointer out of this internal o3 interface sets the
stage for similar cleanup in the ExecContext interface. Also
found that we were pointlessly setting the contents at this pointer
on a store forward (the useful memcpy happens just a few lines
below the deleted one).
This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.
The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.
The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.